Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Clock sweep not caching enough B-Tree leaf pages? |
Date | |
Msg-id | 20140416091852.GA16358@awork2.anarazel.de Whole thread Raw |
In response to | Re: Clock sweep not caching enough B-Tree leaf pages? (Peter Geoghegan <pg@heroku.com>) |
Responses |
Re: Clock sweep not caching enough B-Tree leaf pages?
|
List | pgsql-hackers |
On 2014-04-16 01:58:23 -0700, Peter Geoghegan wrote: > On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > I think this is unfortunately completely out of question. For one a > > gettimeofday() for every uffer pin will become a significant performance > > problem. Even the computation of the xact/stm start/stop timestamps > > shows up pretty heavily in profiles today - and they are far less > > frequent than buffer pins. And that's on x86 linux, where gettimeofday() > > is implemented as something more lightweight than a full syscall. > > Come on, Andres. Of course exactly what I've done here is completely > out of the question as a patch that we can go and commit right now. > I've numerous caveats about bloating the buffer descriptors, and about > it being a proof of concept. I'm pretty sure we can come up with a > scheme to significantly cut down on the number of gettimeofday() calls > if it comes down to it. In any case, I'm interested in advancing our > understanding of the problem right now. Let's leave the minutiae to > one side for the time being. *I* don't think any scheme that involves measuring the time around buffer pins is going to be acceptable. It's better than I say that now rather than when you've invested significant time into the approach, no? > > The other significant problem I see with this is that its not adaptive > > to the actual throughput of buffers in s_b. In many cases there's > > hundreds of clock cycles through shared buffers in 3 seconds. By only > > increasing the usagecount that often you've destroyed the little > > semblance to a working LRU there is right now. > > If a usage_count can get to BM_MAX_USAGE_COUNT from its initial > allocation within an instant, that's bad. It's that simple. Consider > all the ways in which that can happen almost by accident. Yes, I agree that that's a problem. It immediately going down to zero is a problem as well though. And that's what will happen in many scenarios, because you have time limits on increasing the usagecount, but not when decreasing. > > It also wouldn't work well for situations with a fast changing > > workload >> s_b. If you have frequent queries that take a second or so > > and access some data repeatedly (index nodes or whatnot) only increasing > > the usagecount once will mean they'll continually fall back to disk access. > > No, it shouldn't, because there is a notion of buffers getting a fair > chance to prove themselves. If you have a workload with > (BM_MAX_USAGE_COUNT + 1) clock cycles/second, how does *any* buffer has a chance to prove itself? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: