Re: Spread checkpoint sync - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Spread checkpoint sync |
Date | |
Msg-id | AANLkTikj76AzK4yEQ7Sn-guaXpCUEuAons23mnU3NSV1@mail.gmail.com Whole thread Raw |
In response to | Re: Spread checkpoint sync (Simon Riggs <simon@2ndQuadrant.com>) |
Responses |
Re: Spread checkpoint sync
Re: Spread checkpoint sync |
List | pgsql-hackers |
On Sat, Jan 15, 2011 at 8:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On Sat, 2011-01-15 at 05:47 -0500, Greg Smith wrote: >> Robert Haas wrote: >> > On Tue, Nov 30, 2010 at 3:29 PM, Greg Smith <greg@2ndquadrant.com> wrote: >> > >> > > One of the ideas Simon and I had been considering at one point was adding >> > > some better de-duplication logic to the fsync absorb code, which I'm >> > > reminded by the pattern here might be helpful independently of other >> > > improvements. >> > > >> > >> > Hopefully I'm not stepping on any toes here, but I thought this was an >> > awfully good idea and had a chance to take a look at how hard it would >> > be today while en route from point A to point B. The answer turned >> > out to be "not very", so PFA a patch that seems to work. I tested it >> > by attaching gdb to the background writer while running pgbench, and >> > it eliminate the backend fsyncs without even breaking a sweat. >> > >> >> No toe damage, this is great, I hadn't gotten to coding for this angle >> yet at all. Suffering from an overload of ideas and (mostly wasted) >> test data, so thanks for exploring this concept and proving it works. > > No toe damage either, but are we sure we want the de-duplication logic > and in this place? > > I was originally of the opinion that de-duplicating the list would save > time in the bgwriter, but that guess was wrong by about two orders of > magnitude, IIRC. The extra time in the bgwriter wasn't even noticeable. Well, the point of this is not to save time in the bgwriter - I'm not surprised to hear that wasn't noticeable. The point is that when the fsync request queue fills up, backends start performing an fsync *for every block they write*, and that's about as bad for performance as it's possible to be. So it's worth going to a little bit of trouble to try to make sure it doesn't happen. It didn't happen *terribly* frequently before, but it does seem to be common enough to worry about - e.g. on one occasion, I was able to reproduce it just by running pgbench -i -s 25 or something like that on a laptop. With this patch applied, there's no performance impact vs. current code in the very, very common case where space remains in the queue - 999 times out of 1000, writing to the fsync queue will be just as fast as ever. But in the unusual case where the queue has been filled up, compacting the queue is much much faster than performing an fsync, and the best part is that the compaction is generally massive. I was seeing things like "4096 entries compressed to 14". So clearly even if the compaction took as long as the fsync itself it would be worth it, because the next 4000+ guys who come along again go through the fast path. But in fact I think it's much faster than an fsync. In order to get pathological behavior even with this patch applied, you'd need to have NBuffers pending fsync requests and they'd all have to be different. I don't think that's theoretically impossible, but Greg's research seems to indicate that even on busy systems we don't come even a little bit close to the circumstances that would cause it to occur in practice. Every other change we might make in this area will further improve this case, too: for example, doing an absorb after each fsync would presumably help, as would the more drastic step of splitting the bgwriter into two background processes (one to do background page cleaning, and the other to do checkpoints, for example). But even without those sorts of changes, I think this is enough to effectively eliminate the full fsync queue problem in practice, which seems worth doing independently of anything else. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: