Re: Load Distributed Checkpoints, take 3 - Mailing list pgsql-patches
From | Heikki Linnakangas |
---|---|
Subject | Re: Load Distributed Checkpoints, take 3 |
Date | |
Msg-id | 467A37B4.7030708@enterprisedb.com Whole thread Raw |
In response to | Re: Load Distributed Checkpoints, take 3 (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Load Distributed Checkpoints, take 3
|
List | pgsql-patches |
Tom Lane wrote: > Heikki Linnakangas <heikki@enterprisedb.com> writes: >> I added a spinlock to protect the signaling fields between bgwriter and >> backends. The current non-locking approach gets really difficult as the >> patch adds two new flags, and both are more important than the existing >> ckpt_time_warn flag. > > That may be, but you could minimize the cost and notational ugliness by > not using the spinlock where you don't have to. Put the sig_atomic_t > fields back the way they were, and many of the uses of the spinlock go > away. All you really need it for is the places where the additional > flags are set or read. I find it easier to understand if it's used whenever any of the fields are accessed. You don't need to read and write ckpt_failed and ckpt_started/ckpt_done in specific order anymore, for example. > Some other comments: > > I tend to agree with whoever said upthread that the combination of GUC > variables proposed here is confusing and ugly. It'd make more sense to > have min and max checkpoint rates in KB/s, with the max checkpoint rate > only honored when we are predicting we'll finish before the next > checkpoint time. Really? I thought everyone is happy with the current combination, and that it was just the old trio of parameters controlling the write, nap and sync phases that was ugly. One particularly nice thing about defining the duration as a fraction of checkpoint interval is that we can come up with a reasonable default value that doesn't depend on your hardware. How would a min and max rate work? Anyone else have an opinion on the parameters? > The flow of control and responsibility between xlog.c, bgwriter.c and > bufmgr.c seems to have reached the breaking point of unintelligibility. > Can you think of a refactoring that would simplify it? We might have > to resign ourselves to this much messiness, but I'd rather not... The problem we're trying to solve is doing a checkpoint while running the normal bgwriter activity at the same time. The normal way to do two things simultaneously is to have two different processes (or threads). I thought about having a separate checkpoint process, but I gave up on that thought because the communication needed between backends, bgwriter and the checkpointer seems like a mess. The checkpointer would need the pendingOpsTable so that it can do the fsyncs, and it would also need to receive the forget-messages to that table. We could move that table entirely to the checkpointer, but since bgwriter is presumably doing most of the writes, there would be a lot of chatter between bgwriter and the checkpointer. The current approach is like co-operative multitasking. BufferSyncs yields control to bgwriter every now and then. The division of labor between xlog.c and other modules is not that bad, IMO. CreateCheckPoint is the main entry point to create a checkpoint, and it calls other modules to do their stuff, including bufmgr.c. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
pgsql-patches by date: