Re: BufFreelistLock - Mailing list pgsql-hackers
From | Jeff Janes |
---|---|
Subject | Re: BufFreelistLock |
Date | |
Msg-id | AANLkTintDS7jDwRouBHxHQXnoJyaRHpGSzGCWw3Oyr+u@mail.gmail.com Whole thread Raw |
In response to | Re: BufFreelistLock (Jim Nasby <jim@nasby.net>) |
Responses |
Re: BufFreelistLock
|
List | pgsql-hackers |
On Sun, Dec 12, 2010 at 6:48 PM, Jim Nasby <jim@nasby.net> wrote: > On Dec 10, 2010, at 10:49 AM, Tom Lane wrote: >> Alvaro Herrera <alvherre@commandprompt.com> writes: >>> Excerpts from Jeff Janes's message of vie dic 10 12:24:34 -0300 2010: >>>> As far as I can tell, bgwriter never adds things to the freelist. >>>> That is only done at start up, and when a relation or a database is >>>> dropped. The clock sweep does the vast majority of the work. >> >>> AFAIU bgwriter runs the clock sweep most of the time (BgBufferSync). >> >> I think bgwriter just tries to write out dirty buffers so they'll be >> clean when the clock sweep reaches them. It doesn't try to move them to >> the freelist. > > Yeah, it calls SyncOneBuffer which does nothing for the clock sweep. > >> There might be some advantage in having it move buffers >> to a freelist that's just protected by a simple spinlock (or at least, >> a lock different from the one that protects the clock sweep). The >> idea would be that most of the time, backends just need to lock the >> freelist for long enough to take a buffer off it, and don't run clock >> sweep at all. > > Yeah, the clock sweep code is very intensive compared to pulling a buffer from the freelist, yet AFAICT nothing will runthe clock sweep except backends. Unless I'm missing something, the free list is practically useless because buffers areonly put there by InvalidateBuffer, which is only called by DropRelFileNodeBuffers and DropDatabaseBuffers. Buffers are also put on the freelist at start up (all of them). But of course any busy system with more data than buffers will rapidly deplete them, and DropRelFileNodeBuffers and DropDatabaseBuffers are generally not going to happen enough to be meaningful on most setups, I would think. I was wondering, if the steady state condition is to always use the clock sweep, if that shouldn't be the only mechanism that exists. > So we make backends queue up behind the freelist lock with very little odds of getting a buffer, then we make them queueup for the clock sweep lock and make them actually run the clock sweep. It is the same lock that governs both. Given the simplicity of the checking that the freelist is empty, I don't think it adds much overhead. > > BTW, when we moved from 96G to 192G servers I tried increasing shared buffers from 8G to 28G and performance went downenough to be noticeable (we don't have any good benchmarks, so I cant really quantify the degradation). Going back to8G brought performance back up, so it seems like it was the change in shared buffers that caused the issue (the largerservers also have 24 cores vs 16). What kind of work load do you have (intensity of reading versus writing)? How intensely concurrent is the access? > My immediate thought was that we needed more lock partitions, but I haven't had the chance to see if that helps. ISTM theissue could just as well be due to clock sweep suddenly taking over 3x longer than before. It would surprise me if most clock sweeps need to make anything near a full pass over the buffers for each allocation (but technically it wouldn't need to do that take 3x longer. It could be that the fraction of a pass it needs to make is merely proportional to shared_buffers. That too would surprise me, though). You could compare the number of passes with the number of allocations to see how much sweeping is done per allocation. However, I don't think the number of passes is reported anywhere, unless you compile with #define BGW_DEBUG and run with debug2. I wouldn't expect an increase in shared_buffers to make contention on BufFreelistLock worse. If the increased buffers are used to hold heavily-accessed data, then you will find the pages you want in shared_buffers more often, and so need to run the clock-sweep less often. That should make up for longer sweeps. But if the increased buffers are used to hold data that is just read once and thrown away, then the clock sweep shouldn't need to sweep very far before finding a candidate. But of course being able to test would be better than speculation. Cheers, Jeff
pgsql-hackers by date: