Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Clock sweep not caching enough B-Tree leaf pages? |
Date | |
Msg-id | CAM3SWZT17cM2GK6T6uyVdJRxPXNt1g=7HPwZttYMy9kKi0EXzQ@mail.gmail.com Whole thread Raw |
In response to | Re: Clock sweep not caching enough B-Tree leaf pages? (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Clock sweep not caching enough B-Tree leaf pages?
|
List | pgsql-hackers |
On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: > I think this is unfortunately completely out of question. For one a > gettimeofday() for every uffer pin will become a significant performance > problem. Even the computation of the xact/stm start/stop timestamps > shows up pretty heavily in profiles today - and they are far less > frequent than buffer pins. And that's on x86 linux, where gettimeofday() > is implemented as something more lightweight than a full syscall. Come on, Andres. Of course exactly what I've done here is completely out of the question as a patch that we can go and commit right now. I've numerous caveats about bloating the buffer descriptors, and about it being a proof of concept. I'm pretty sure we can come up with a scheme to significantly cut down on the number of gettimeofday() calls if it comes down to it. In any case, I'm interested in advancing our understanding of the problem right now. Let's leave the minutiae to one side for the time being. > The other significant problem I see with this is that its not adaptive > to the actual throughput of buffers in s_b. In many cases there's > hundreds of clock cycles through shared buffers in 3 seconds. By only > increasing the usagecount that often you've destroyed the little > semblance to a working LRU there is right now. If a usage_count can get to BM_MAX_USAGE_COUNT from its initial allocation within an instant, that's bad. It's that simple. Consider all the ways in which that can happen almost by accident. You could probably reasonably argue that the trade-off or lack of adaption (between an LRU and an LFU) that this particular sketch of mine represents is inappropriate or sub-optimal, but I don't understand why you're criticizing the patch for doing what I expressly set out to do. I wrote "I think a very real problem that may be that approximating an LRU is bad because an actual LRU is bad". > It also wouldn't work well for situations with a fast changing > workload >> s_b. If you have frequent queries that take a second or so > and access some data repeatedly (index nodes or whatnot) only increasing > the usagecount once will mean they'll continually fall back to disk access. No, it shouldn't, because there is a notion of buffers getting a fair chance to prove themselves. Now, it might well be the case that there are workloads where what I've done to make that happen in this prototype doesn't work out too well - I've already said so. But should a buffer get a usage count of 5 just because the user inserted 5 tuples within a single DML command, for example? If so, why? -- Peter Geoghegan
pgsql-hackers by date: