Re: Checkpoint spikes - Mailing list pgsql-performance

From Heikki Linnakangas
Subject Re: Checkpoint spikes
Date
Msg-id 4B17A09E.80406@enterprisedb.com
Whole thread Raw
In response to Re: Checkpoint spikes  (Greg Smith <greg@2ndquadrant.com>)
Responses Re: Checkpoint spikes
List pgsql-performance
Greg Smith wrote:
> Richard Neill wrote:
>> Here's the typical checkpoint logs:
>> 2009-12-03 06:21:21 GMT LOG:  checkpoint complete: wrote 12400 buffers
>> (2.2%); 0 transaction log file(s) added, 0 removed, 12 recycled;
>> write=149.883 s, sync=5.143 s, total=155.040 s
> See that "sync" number there?  That's your problem; while that sync
> operation is going on, everybody else is grinding to a halt waiting for
> it.  Not a coincidence that the duration is about the same amount of
> time that your queries are getting stuck.  This example shows 12400
> buffers = 97MB of total data written.  Since those writes are pretty
> random I/O, it's easily possible to get stuck for a few seconds waiting
> for that much data to make it out to disk.  You only gave the write
> phase a couple of minutes to spread things out over; meanwhile, Linux
> may not even bother starting to write things out until 30 seconds into
> that, so the effective time between when writes to disk start and when
> the matching sync happens on your system is extremely small.  That's not
> good--you have to give that several minutes of breathing room if you
> want to avoid checkpoint spikes.

I wonder how common this issue is? When we implemented spreading of the
write phase, we had long discussions about spreading out the fsyncs too,
but in the end it wasn't done. Perhaps it is time to revisit that now
that 8.3 has been out for some time and people have experience with the
load-distributed checkpoints.

I'm not sure how the spreading of the fsync()s should work, it's hard to
estimate how long each fsync() is going to take, for example, but surely
something would be better than nothing.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

pgsql-performance by date:

Previous
From: Laurent Laborde
Date:
Subject: Re: Analyse without locking?
Next
From: Greg Smith
Date:
Subject: Re: Checkpoint spikes