Re: [HACKERS] Clock with Adaptive Replacement - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Clock with Adaptive Replacement |
Date | |
Msg-id | CA+Tgmoaw-3_3T1sVoiDKCL-zop9j2kqeYChV=hfszkoU1+EenA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Clock with Adaptive Replacement (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] Clock with Adaptive Replacement
|
List | pgsql-hackers |
On Wed, Apr 25, 2018 at 6:54 PM, Peter Geoghegan <pg@bowt.ie> wrote: > Before some of the big shared_buffers bottlenecks were alleviated > several years ago, it was possible to observe shared_buffers evictions > occurring essentially at random. I have no idea if that's still true, > but it could be. I think it is. We haven't done anything to address it. I think if we want to move to direct I/O -- which may be something we need or want to do -- we're going to have to work a lot harder at making good page eviction decisions. Your patch to change the page eviction algorithm didn't help noticeably once we eliminated the contention around buffer eviction, but that's just because the cost of a bad eviction went down, not because we stopped doing bad evictions. I think it would be interesting to insert a usleep() call into mdread() and then test buffer eviction various algorithms with that in place. I'm personally not very excited about making rules like "index pages are more valuable than heap pages". Such rules will in many cases be true, but it's easy to come up with cases where they don't hold: for example, we might run pgbench for a while and then stop running pgbench and start running big sequential scans for reporting purposes. We don't want to artificially pin the index buffers in shared_buffers just because they're index pages; we want to figure out which pages really matter. Now, I realize that you weren't proposing (and wouldn't propose) a rule that index pages never get evicted, but I think that favoring index pages even in some milder way is basically a hack. Index pages aren't *intrinsically* more valuable; they are more valuable because they will, in many workloads, be accessed more often than heap pages. A good algorithm ought to be able to figure that out based on the access pattern, without being explicitly given a hint, I think. I believe the root of the problem here is that the usage count we have today does a very poor job distinguishing what's hot from what's not. There have been previous experiments around making usage_count use some kind of a log scale: we make the maximum, say, 32, and the clock hand divides by 2 instead of subtracting 1. I don't think those experiments were enormously successful and I suspect that a big part of the reason is that it's still pretty easy to get to a state where the counters are maxed out for a large number of buffers, and at that point you can't distinguish between those buffers any more: they all look equally hot. We need something better. If a system like this is working properly, things like interior index pages and visibility map pages ought to show up as fiery hot on workloads where the index or visibility map, as the case may be, is heavily used. A related problem is that user-connected backends end up doing a lot of buffer eviction themselves on many workloads. Maybe the bgreclaimer patch Amit wrote a few years ago could help somehow. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: