Re: Fix checkpoint skip logic on idle systems by tracking LSN progress - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: Fix checkpoint skip logic on idle systems by tracking LSN progress |
Date | |
Msg-id | 20160927.191612.123888421.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Fix checkpoint skip logic on idle systems by tracking LSN progress (Michael Paquier <michael.paquier@gmail.com>) |
Responses |
Re: Fix checkpoint skip logic on idle systems by tracking
LSN progress
Re: Fix checkpoint skip logic on idle systems by tracking LSN progress |
List | pgsql-hackers |
Hi, I apologize in advance that the comments in this message might one of the ideas discarded in the past thread.. I might not grasp the discussion completely X( The attached patches are rebased to the master and additional one mentioned below. At Wed, 18 May 2016 17:57:49 -0400, Michael Paquier <michael.paquier@gmail.com> wrote in <CAB7nPqQcPqxEM3S735Bd2RzApNqSNJVietAC=6kfkYv_45dKwA@mail.gmail.com> > A couple of months back is has been reported to pgsql-bugs that WAL > segments were always switched with a low value of archive_timeout even > if a system is completely idle: > http://www.postgresql.org/message-id/20151016203031.3019.72930@wrigleys.postgresql.org > In short, a closer look at the problem has showed up that the logic in > charge of checking if a checkpoint should be skipped or not is > currently broken, because it completely ignores standby snapshots in > its calculation of the WAL activity. So a checkpoint always occurs > after checkpoint_timeout on an idle system since hot_standby has been > introduced as wal_level. This did not get better from 9.4, since > standby snapshots are logged every 15s by the background writer > process. In 9.6, since wal_level = 'archive' and 'hot_standby' > actually has the same meaning, the skip logic that worked with > wal_level = 'archive' does not do its job anymore. > > One solution that has been discussed is to track the progress of WAL > activity when doing record insertion by being able to mark some > records as not updating the progress of WAL. Standby snapshot records > enter in this category, making the checkpoint skip logic more robust. > Attached is a patch implementing a solution for it, by adding in If I understand the old thread correctly, this still doesn't solve the main issue, excessive checkpoint and segment switching. The reason is the interaction between XLOG_SWITCH and checkpoint as mentioned there. I think we may treat XLOG_SWITCH as NO_PROGRESS, since we can recover to the lastest state without it. If it is not wrong, the second patch attached (v12-2) inserts XLOG_SWITCH as NO_PROGRESS and skips segment switching when no progress took place for the round. > WALInsertLock a new field that gets updated for each record to track > the LSN progress. This allows to reliably skip the generation of > standby snapshots in the bgwriter or checkpoints on an idle system. WALInsertLock doesn't seem to me to be a good place for progressAt even considering the discussion about adding few instructions (containing a branch) in the hot-section. BackgroundWriterMain blocks all running XLogInsertRecord every 200 ms, not 15 or 30 seconds (only for replica, though). If this is correct, the Amit's suggestion below will have more significance, and I rather agree with it. XLogCtl is more suitable place for progressAt for the case. https://www.postgresql.org/message-id/CAA4eK1LB9HDq+F8Lw8bGRQx6Sw42XaikX1UQ2DZk+AuEGbfjWA@mail.gmail.com Amit> Taking and releasing locks again and again (which is done in patch) Amit> matters much more than adding few instructions under it and I think Amit> if you would have written the code in-a-way as in patch in some of Amit> the critical path, it would have been regressed badly, but because Amit> checkpoint doesn't happen often, reproducing regression is difficult. https://www.postgresql.org/message-id/CAB7nPqSO6HVJ0T6LUT84PCy+_ihitdt64Ze2D+SJrHZy72Y0wg@mail.gmail.com > > Also, I think it is worth to once take the performance data for > > write tests (using pgbench 30 minute run or some other way) with > > minimum checkpoint_timeout (i.e 30s) to see if the additional locking > > has any impact on performance. I think taking locks at intervals > > of 15s or 30s should not matter much, but it is better to be safe. > > I don't think performance will be impacted, but there are no reasons > to not do any measurements either. I'll try to get some graphs > tomorrow with runs on my laptop, mainly looking for any effects of > this patch on TPS when checkpoints show up. I don't think the impact is measurable on a laptop, where 2 to 4 cores available at most. > Per discussion with Andres at PGcon, we decided that this is an > optimization, only for 9.7~ because this has been broken for a long > time. I have also changed XLogIncludeOrigin() to use a more generic > routine to set of status flags for a record being inserted: > XLogSetFlags(). This routine can use two flags: > - INCLUDE_ORIGIN to decide if the origin should be logged or not > - NO_PROGRESS to decide at insertion if a record should update the LSN > progress or not. > Andres mentioned me that we'd want to have something similar to > XLogIncludeOrigin, but while hacking I noticed that grouping both > things under the same umbrella made more sense. This looks reasonable. regards, -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: