Home > mailing lists

Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From	Greg Stark
Subject	Re: Clock sweep not caching enough B-Tree leaf pages?
Date	April 17, 2014 13:41:08
Msg-id	CAM-w4HMnOJ=ACBjW9GuPMgYpjqD1QH_tcLyjCOquXCetdtWR6A@mail.gmail.com Whole thread Raw
In response to	Re: Clock sweep not caching enough B-Tree leaf pages? (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Clock sweep not caching enough B-Tree leaf pages?
List	pgsql-hackers

Tree view

On Wed, Apr 16, 2014 at 12:44 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> This isn't a fundamental property of the usage-count idea; it's an
> artifact of the fact that usage count decreases are tied to eviction
> pressure rather than access pressure.  For example, suppose we made a
> rule that if the total usage counts of all buffers exceed 3 *
> NBuffers, then every time you bump the usage count of a buffer from N
> to N+1, you're required to advance the clock sweep far enough to
> decrease the reference count of a buffer by one.

This sounds like the right way to reason about it.

From what I remember in school the idea with the clock sweep is to set
the usage flags to the maximum whenever the buffer is used and
decrement (actually iirc typically shift right)  it when the clock
sweep goes by. Ie, simulate a LRU where when the buffer is accessed it
jumps to the head of the list and when the clock comes by it moves
gradually down the list.

What you're pointing out is that the clock might not come by very
often resulting everything being at the head of the list. In that case
I'm not clear it really matters what gets evicted though. And the cpu
effort of running the clock n times sounds bad but doing the work
earlier doesn't really change the amount of work being done, it just
amortizes it over more calls.

But if you want to do that it seems to me the way to do it is every
time a buffer is pinned set to the maximum and then run the clock
max_value - previous_value. So the total usage counts of all buffers
remains constant. If that results in contention one way to reduce it
is to do this probabilistically. Run the clock 1% of the time but run
it 100x as much as you would normally.

But I think you've misidentified the problem and what those other
algorithms are trying to solve. The problem is not that Postgres will
pick a bad buffer to evict. If all the buffers have been since the
last time the clock came around then they're all "hot" anyways and it
doesn't really matter which one we evict. The problem is that we
expend an inordinate amount of work finding the few non-hot buffers.
When you have a really large amount of memory and 99.9% of it is hot
but 0.1% is whatever random non-hot page was needed last then there's
an obvious buffer to evict when you need a new one. But we spend a lot
of work decrementing every hot buffer's usage count 4 times only to
have them immediately incremented again just to find the 1 buffer
where the usage count was 4 or 3. The goal of these algorithms that
divide the buffers into groups is to avoid having to do so much work
to find the colder buffers. Once the hot buffers migrate to the hot
pool we only need to run the clock there when we find we have new hot
pages that we want to promote. All the thrashing in the cold pool can
be more efficient because there's many fewer pages to consider.

-- 
greg

pgsql-hackers by date:

From: Robert Haas
Date: 17 April 2014, 13:23:50
Subject: Re: Minor performance improvement in transition to external sort

From: Pavel Stehule
Date: 17 April 2014, 13:41:41
Subject: Re: four minor proposals for 9.5

Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

Previous

Next