Re: [HACKERS] Clock with Adaptive Replacement - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Clock with Adaptive Replacement |
Date | |
Msg-id | CA+TgmoYHZJkOc_gWEyWfB0efiet2PQVje3i0kyWRpjAjVD2_Fg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Clock with Adaptive Replacement (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: [HACKERS] Clock with Adaptive Replacement
Re: [HACKERS] Clock with Adaptive Replacement |
List | pgsql-hackers |
On Sat, Apr 28, 2018 at 7:16 PM, Peter Geoghegan <pg@bowt.ie> wrote: > There could very easily be an enormous range of usefulness among > buffers in shared_buffers. And yet, we only recognize the increments > that usage_count can represent (this is before we even think about the > actual problems with how usage_count is maintained). We're never going > to be able to give *radically* more useful leaf pages a concomitant > advantage during buffer eviction if we have to use some rule of thumb. Right. I think the real problem here is that the usage_count is a terrible predictor of how useful a page really is... > In a way, it's not at all surprising that we can end up with > usage_counts of 5 all around when shared_buffers is large. Blocks end > up "resting on their laurels"; they're impossible to evict based on a > large amount of blind luck, such as being involved in a burst of > activity some time ago (so much for clocksweep approximating an LRU; > this is actually more like an LFU). ARC/CAR moves a "recent" block to > the "frequent" list if the block gets referenced a second time after > an interval that is, in effect, self-correcting (it may or may not be > from the ghost recent list or the real buffer cache recent list -- > either way it goes to the head of the frequent list). Moving a block > from the recent to the frequent list also enlarges the recent list > target size at the expense of the frequent list target size. "This > block in particular deserves to be a frequent block, but as a > consequence all blocks should collectively be considered slightly less > deserving of being frequent blocks." ...and this is exactly the kind of system that can fix that problem. In the current system, bumping the usage count makes the buffer whose count just got bumped look more useful without making any other buffer look less useful. Obviously, that allows for situations where the usefulness of every buffer converges to positive infinity or, as we like to call it, 5. Real life isn't like that, though. Not all children are above average, and not all buffers can be more useful than average. That's not to say that the total of all usage counts must remain equal to a constant or something dumb like that -- there's decisions to be made in terms of how you implement things. For example, you can imagine a system where every time we would bump the usage count of a buffer that already has a usage count of 5, we would instead advance the clock hand by 1 buffer. In such a system the total of the usage counts can fluctuate, but you can't peg everything at 5 because there's back-pressure built into the algorithm. CAR accomplishes a similar goal in a different way: "frequently" used pages are never evicted, but as more and more pages get moved into the frequently-used set, the system becomes more and more willing to shrink the set size. That's a really appealing property. If there's a "hot" set of buffers that accounts for nearly all accesses, the algorithm will be willing to regard that all buffers in that set as frequently-used regardless of whether it is a large or a small percentage of shared_buffers, and they will never get evicted. But the buffers must really be hot enough to justify the space they're taking up. When there are only a few hot buffers, it's sufficient for them to be a little hotter than the typical buffer. But when there are a lot of hot buffers, they have to be *really* hot. Intuitively, that makes sense, because making the set of "hot" -- and thus unevictable -- buffers large means that other buffers are going to get evicted very quickly. That's only justifiable if the hot buffers are very significantly more useful than the non-hot buffers. > If you have extreme cache pressure, I don't think that there is > anything left to worry about other than the scalability of evicting > buffers. That should be a rare enough occurrence for well maintained > systems these days (again, this may only actually be true as an > aspiration). Just look at the ARC paper's statistics; they report that > they're not much better than the competition (including even a classic > LRU) when there is either mild cache pressure or extreme cache > pressure. The middle ground is the interesting part. +1. > All of that having been said, maybe we have an independent low-level > problem: we increase usage_count when we pin a buffer, even if we last > pinned the buffer 5ms ago. IIRC a change to this went in when ARC went > in (and came out with ARC, too). This is more of a problem with > miscounting the number of hits based on accidents around how things > are structured in the executor -- we need to be more careful about > recognizing whether or not block hits are logically distinct. > Particularly with stuff like nested loop joins. I agree that double-counting correlated accesses is a a problem, and I agree that we probably want to do something about it. I am skeptical that using wall-clock time is the right way to solve that problem because it leads to edge effects -- if for example there is a 5ms timeout for bumping the usage count again, then you can probably manufacture a workload where the buffer eviction pattern is dramatically different for a nested loop that hits a given page every 4.5ms than it is for a nested loop that hits a given page every 5.5ms. Note that CART aims to solve this problem in a way that doesn't involve any fixed amount of wall-clock time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: