Home > mailing lists

Re: BufFreelistLock - Mailing list pgsql-hackers

From	Jeff Janes
Subject	Re: BufFreelistLock
Date	December 14, 2010 13:08:28
Msg-id	AANLkTintDS7jDwRouBHxHQXnoJyaRHpGSzGCWw3Oyr+u@mail.gmail.com Whole thread Raw
In response to	Re: BufFreelistLock (Jim Nasby <jim@nasby.net>)
Responses	Re: BufFreelistLock
List	pgsql-hackers

Tree view

On Sun, Dec 12, 2010 at 6:48 PM, Jim Nasby <jim@nasby.net> wrote:
> On Dec 10, 2010, at 10:49 AM, Tom Lane wrote:
>> Alvaro Herrera <alvherre@commandprompt.com> writes:
>>> Excerpts from Jeff Janes's message of vie dic 10 12:24:34 -0300 2010:
>>>> As far as I can tell, bgwriter never adds things to the freelist.
>>>> That is only done at start up, and when a relation or a database is
>>>> dropped.  The clock sweep does the vast majority of the work.
>>
>>> AFAIU bgwriter runs the clock sweep most of the time (BgBufferSync).
>>
>> I think bgwriter just tries to write out dirty buffers so they'll be
>> clean when the clock sweep reaches them.  It doesn't try to move them to
>> the freelist.
>
> Yeah, it calls SyncOneBuffer which does nothing for the clock sweep.
>
>> There might be some advantage in having it move buffers
>> to a freelist that's just protected by a simple spinlock (or at least,
>> a lock different from the one that protects the clock sweep).  The
>> idea would be that most of the time, backends just need to lock the
>> freelist for long enough to take a buffer off it, and don't run clock
>> sweep at all.
>
> Yeah, the clock sweep code is very intensive compared to pulling a buffer from the freelist, yet AFAICT nothing will
runthe clock sweep except backends. Unless I'm missing something, the free list is practically useless because buffers
areonly put there by InvalidateBuffer, which is only called by DropRelFileNodeBuffers and DropDatabaseBuffers. 

Buffers are also put on the freelist at start up (all of them).  But
of course any busy system with more data than buffers will rapidly
deplete them, and DropRelFileNodeBuffers and DropDatabaseBuffers are
generally not going to happen enough to be meaningful on most setups,
I would think.  I was wondering, if the steady state condition is to
always use the clock sweep, if that shouldn't be the only mechanism
that exists.

> So we make backends queue up behind the freelist lock with very little odds of getting a buffer, then we make them
queueup for the clock sweep lock and make them actually run the clock sweep. 

It is the same lock that governs both.  Given the simplicity of the
checking that the freelist is empty, I don't think it adds much
overhead.

>
> BTW, when we moved from 96G to 192G servers I tried increasing shared buffers from 8G to 28G and performance went
downenough to be noticeable (we don't have any good benchmarks, so I cant really quantify the degradation). Going back
to8G brought performance back up, so it seems like it was the change in shared buffers that caused the issue (the
largerservers also have 24 cores vs 16). 

What kind of work load do you have (intensity of reading versus
writing)?  How intensely concurrent is the access?

> My immediate thought was that we needed more lock partitions, but I haven't had the chance to see if that helps. ISTM
theissue could just as well be due to clock sweep suddenly taking over 3x longer than before. 

It would surprise me if most clock sweeps need to make anything near a
full pass over the buffers for each allocation (but technically it
wouldn't need to do that take 3x longer.  It could be that the
fraction of a pass it needs to make is merely proportional to
shared_buffers.  That too would surprise me, though).  You could
compare the number of passes with the number of allocations to see how
much sweeping is done per allocation.  However, I don't think the
number of passes is reported anywhere, unless you compile with #define
BGW_DEBUG and
run with debug2.

I wouldn't expect an increase in shared_buffers to make contention on
BufFreelistLock worse.  If the increased buffers are used to hold
heavily-accessed data, then you will find the pages you want in
shared_buffers more often, and so need to run the clock-sweep less
often.  That should make up for longer sweeps.  But if the increased
buffers are used to hold data that is just read once and thrown away,
then the clock sweep shouldn't need to sweep very far before finding a
candidate.

But of course being able to test would be better than speculation.

Cheers,

Jeff

pgsql-hackers by date:

From: Tom Lane
Date: 14 December 2010, 13:06:55
Subject: Re: Transaction-scope advisory locks

From: Tom Lane
Date: 14 December 2010, 13:10:36
Subject: Re: Complier warnings on mingw gcc 4.5.0

Re: BufFreelistLock - Mailing list pgsql-hackers

Previous

Next