Home > mailing lists

Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Clock sweep not caching enough B-Tree leaf pages?
Date	April 16, 2014 09:18:56
Msg-id	20140416091852.GA16358@awork2.anarazel.de Whole thread Raw
In response to	Re: Clock sweep not caching enough B-Tree leaf pages? (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Clock sweep not caching enough B-Tree leaf pages?
List	pgsql-hackers

Tree view

On 2014-04-16 01:58:23 -0700, Peter Geoghegan wrote:
> On Wed, Apr 16, 2014 at 12:53 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > I think this is unfortunately completely out of question. For one a
> > gettimeofday() for every uffer pin will become a significant performance
> > problem. Even the computation of the xact/stm start/stop timestamps
> > shows up pretty heavily in profiles today - and they are far less
> > frequent than buffer pins. And that's on x86 linux, where gettimeofday()
> > is implemented as something more lightweight than a full syscall.
> 
> Come on, Andres. Of course exactly what I've done here is completely
> out of the question as a patch that we can go and commit right now.
> I've numerous caveats about bloating the buffer descriptors, and about
> it being a proof of concept. I'm pretty sure we can come up with a
> scheme to significantly cut down on the number of gettimeofday() calls
> if it comes down to it. In any case, I'm interested in advancing our
> understanding of the problem right now. Let's leave the minutiae to
> one side for the time being.

*I* don't think any scheme that involves measuring the time around
buffer pins is going to be acceptable. It's better than I say that now
rather than when you've invested significant time into the approach, no?

> > The other significant problem I see with this is that its not adaptive
> > to the actual throughput of buffers in s_b. In many cases there's
> > hundreds of clock cycles through shared buffers in 3 seconds. By only
> > increasing the usagecount that often you've destroyed the little
> > semblance to a working LRU there is right now.
> 
> If a usage_count can get to BM_MAX_USAGE_COUNT from its initial
> allocation within an instant, that's bad. It's that simple. Consider
> all the ways in which that can happen almost by accident.

Yes, I agree that that's a problem. It immediately going down to zero is
a problem as well though. And that's what will happen in many scenarios,
because you have time limits on increasing the usagecount, but not when
decreasing.

> > It also wouldn't work well for situations with a fast changing
> > workload >> s_b. If you have frequent queries that take a second or so
> > and access some data repeatedly (index nodes or whatnot) only increasing
> > the usagecount once will mean they'll continually fall back to disk access.
> 
> No, it shouldn't, because there is a notion of buffers getting a fair
> chance to prove themselves.

If you have a workload with > (BM_MAX_USAGE_COUNT + 1) clock
cycles/second, how does *any* buffer has a chance to prove itself?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Andreas 'ads' Scherbaum
Date: 16 April 2014, 09:12:19
Subject: Re: Patch: iff -> if

From: Petr Jelinek
Date: 16 April 2014, 09:27:46
Subject: Dynamic Background Workers and clean exit

Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

Previous

Next