Thread: Inconsistency in reporting checkpointer stats
Hi, While working on checkpoint related stuff, I have encountered that there is some inconsistency while reporting checkpointer stats. When a checkpoint gets completed, a checkpoint complete message gets logged. This message has a lot of information including the buffers written (CheckpointStats.ckpt_bufs_written). This variable gets incremented in 2 cases. First is in BufferSync() and the second is in SlruInternalWritePage(). On the other hand the checkpointer stats exposed using pg_stat_bgwriter contains a lot of information including buffers written (PendingCheckpointerStats.buf_written_checkpoints). This variable gets incremented in only one place and that is in BufferSync(). So there is inconsistent behaviour between these two data. Please refer to the sample output below. postgres=# select * from pg_stat_bgwriter; checkpoints_timed | checkpoints_req | checkpoint_write_time | checkpoint_sync_time | buffers_checkpoint | buffers_clean | maxwritten_clean | buffers_backend | buffers_backend_fsync | buffers_alloc | stats_reset -------------------+-----------------+-----------------------+----------------------+--------------------+---------------+------------------+-----------------+-----------------------+---------------+------------------------------- 0 | 1 | 75 | 176 | 4702 | 0 | 0 | 4656 | 0 | 5023 | 2022-12-14 07:01:01.494672+00 2022-12-14 07:03:18.052 UTC [6087] LOG: checkpoint starting: immediate force wait 2022-12-14 07:03:18.370 UTC [6087] LOG: checkpoint complete: wrote 4705 buffers (28.7%); 0 WAL file(s) added, 0 removed, 4 recycled; write=0.075 s, sync=0.176 s, total=0.318 s; sync files=34, longest=0.159 s, average=0.006 s; distance=66180 kB, estimate=66180 kB; lsn=0/5565E38, redo lsn=0/5565E00 In order to fix this, the PendingCheckpointerStats.buf_written_checkpoints should be incremented in SlruInternalWritePage() similar to CheckpointStats.ckpt_bufs_written. I have attached the patch for the same. Please share your thoughts. Thanks & Regards, Nitin Jadhav
Attachment
On Wed, Dec 14, 2022 at 1:02 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > Hi, > > While working on checkpoint related stuff, I have encountered that > there is some inconsistency while reporting checkpointer stats. When a > checkpoint gets completed, a checkpoint complete message gets logged. > This message has a lot of information including the buffers written > (CheckpointStats.ckpt_bufs_written). This variable gets incremented in > 2 cases. First is in BufferSync() and the second is in > SlruInternalWritePage(). On the other hand the checkpointer stats > exposed using pg_stat_bgwriter contains a lot of information including > buffers written (PendingCheckpointerStats.buf_written_checkpoints). > This variable gets incremented in only one place and that is in > BufferSync(). So there is inconsistent behaviour between these two > data. Please refer to the sample output below. > > In order to fix this, the > PendingCheckpointerStats.buf_written_checkpoints should be incremented > in SlruInternalWritePage() similar to > CheckpointStats.ckpt_bufs_written. I have attached the patch for the > same. Please share your thoughts. Indeed PendingCheckpointerStats.buf_written_checkpoints needs to count buffer writes in SlruInternalWritePage(). However, does it need to be done immediately there? The stats will not be visible to the users until the next pgstat_report_checkpointer(). Incrementing buf_written_checkpoints in BufferSync() makes sense as the pgstat_report_checkpointer() gets called in there via CheckpointWriteDelay() and it becomes visible to the user immediately. Have you checked if pgstat_report_checkpointer() gets called while the checkpoint calls SlruInternalWritePage()? If not, then you can just assign ckpt_bufs_written to buf_written_checkpoints in LogCheckpointEnd() like its other friends checkpoint_write_time and checkpoint_sync_time. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Dec 14, 2022 at 04:54:53PM +0530, Bharath Rupireddy wrote: > Indeed PendingCheckpointerStats.buf_written_checkpoints needs to count > buffer writes in SlruInternalWritePage(). However, does it need to be > done immediately there? The stats will not be visible to the users > until the next pgstat_report_checkpointer(). Incrementing > buf_written_checkpoints in BufferSync() makes sense as the > pgstat_report_checkpointer() gets called in there via > CheckpointWriteDelay() and it becomes visible to the user immediately. > Have you checked if pgstat_report_checkpointer() gets called while the > checkpoint calls SlruInternalWritePage()? If not, then you can just > assign ckpt_bufs_written to buf_written_checkpoints in > LogCheckpointEnd() like its other friends > checkpoint_write_time and checkpoint_sync_time. /* If part of a checkpoint, count this as a buffer written. */ if (fdata) CheckpointStats.ckpt_bufs_written++; + PendingCheckpointerStats.buf_written_checkpoints++; Also, the proposed patch would touch PendingCheckpointerStats even when there is no fdata, aka outside the context of a checkpoint or shutdown sequence.. -- Michael
Attachment
> Indeed PendingCheckpointerStats.buf_written_checkpoints needs to count > buffer writes in SlruInternalWritePage(). However, does it need to be > done immediately there? The stats will not be visible to the users > until the next pgstat_report_checkpointer(). Incrementing > buf_written_checkpoints in BufferSync() makes sense as the > pgstat_report_checkpointer() gets called in there via > CheckpointWriteDelay() and it becomes visible to the user immediately. > Have you checked if pgstat_report_checkpointer() gets called while the > checkpoint calls SlruInternalWritePage()? If not, then you can just > assign ckpt_bufs_written to buf_written_checkpoints in > LogCheckpointEnd() like its other friends > checkpoint_write_time and checkpoint_sync_time. In case of an immediate checkpoint, the CheckpointWriteDelay() never gets called until the checkpoint is completed. So no issues in this case. CheckpointWriteDelay() comes into picture in case of non immediate checkpoints (i.e. checkpoint timeout is reached or max wal size is reached). If we remove the increment in BufferSync() and SlruInternalWritePage() and then just assign ckpt_bufs_written to buf_written_checkpoints in LogCheckpointEnd() then the update will be available after the end of each checkpoint which is not better than the existing behaviour (without patch). If we keep the increment in BufferSync() then we have to calculate the remaining buffer incremented in SlruInternalWritePage() and then increment buf_written_checkpoints with this number in LogCheckpointEnd(). This just makes it complicated and again the buffer incremented in SlruInternalWritePage() will get updated at the end of the checkpoint. In the case of checkpoint_write_time and checkpoint_sync_time, it makes sense because this information is based on the entire checkpoint operation and it should be done at the end. So I feel the patch handles it in a better way even though the pgstat_report_checkpointer() does not get called immediately but it will be called during the next increment in BufferSync() which is before the end of the checkpoint. Please share if you have any other ideas. On Wed, Dec 14, 2022 at 4:55 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Dec 14, 2022 at 1:02 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > Hi, > > > > While working on checkpoint related stuff, I have encountered that > > there is some inconsistency while reporting checkpointer stats. When a > > checkpoint gets completed, a checkpoint complete message gets logged. > > This message has a lot of information including the buffers written > > (CheckpointStats.ckpt_bufs_written). This variable gets incremented in > > 2 cases. First is in BufferSync() and the second is in > > SlruInternalWritePage(). On the other hand the checkpointer stats > > exposed using pg_stat_bgwriter contains a lot of information including > > buffers written (PendingCheckpointerStats.buf_written_checkpoints). > > This variable gets incremented in only one place and that is in > > BufferSync(). So there is inconsistent behaviour between these two > > data. Please refer to the sample output below. > > > > In order to fix this, the > > PendingCheckpointerStats.buf_written_checkpoints should be incremented > > in SlruInternalWritePage() similar to > > CheckpointStats.ckpt_bufs_written. I have attached the patch for the > > same. Please share your thoughts. > > Indeed PendingCheckpointerStats.buf_written_checkpoints needs to count > buffer writes in SlruInternalWritePage(). However, does it need to be > done immediately there? The stats will not be visible to the users > until the next pgstat_report_checkpointer(). Incrementing > buf_written_checkpoints in BufferSync() makes sense as the > pgstat_report_checkpointer() gets called in there via > CheckpointWriteDelay() and it becomes visible to the user immediately. > Have you checked if pgstat_report_checkpointer() gets called while the > checkpoint calls SlruInternalWritePage()? If not, then you can just > assign ckpt_bufs_written to buf_written_checkpoints in > LogCheckpointEnd() like its other friends > checkpoint_write_time and checkpoint_sync_time. > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com
> /* If part of a checkpoint, count this as a buffer written. */ > if (fdata) > CheckpointStats.ckpt_bufs_written++; > + PendingCheckpointerStats.buf_written_checkpoints++; > Also, the proposed patch would touch PendingCheckpointerStats even > when there is no fdata, aka outside the context of a checkpoint or > shutdown sequence. Sorry. I missed adding braces. Fixed in the v2 patch attached. Thanks & Regards, Nitin Jadhav On Thu, Dec 15, 2022 at 3:16 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Dec 14, 2022 at 04:54:53PM +0530, Bharath Rupireddy wrote: > > Indeed PendingCheckpointerStats.buf_written_checkpoints needs to count > > buffer writes in SlruInternalWritePage(). However, does it need to be > > done immediately there? The stats will not be visible to the users > > until the next pgstat_report_checkpointer(). Incrementing > > buf_written_checkpoints in BufferSync() makes sense as the > > pgstat_report_checkpointer() gets called in there via > > CheckpointWriteDelay() and it becomes visible to the user immediately. > > Have you checked if pgstat_report_checkpointer() gets called while the > > checkpoint calls SlruInternalWritePage()? If not, then you can just > > assign ckpt_bufs_written to buf_written_checkpoints in > > LogCheckpointEnd() like its other friends > > checkpoint_write_time and checkpoint_sync_time. > > /* If part of a checkpoint, count this as a buffer written. */ > if (fdata) > CheckpointStats.ckpt_bufs_written++; > + PendingCheckpointerStats.buf_written_checkpoints++; > Also, the proposed patch would touch PendingCheckpointerStats even > when there is no fdata, aka outside the context of a checkpoint or > shutdown sequence.. > -- > Michael
Attachment
At Wed, 14 Dec 2022 16:54:53 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > Indeed PendingCheckpointerStats.buf_written_checkpoints needs to count > buffer writes in SlruInternalWritePage(). However, does it need to be > done immediately there? The stats will not be visible to the users > until the next pgstat_report_checkpointer(). Incrementing > buf_written_checkpoints in BufferSync() makes sense as the > pgstat_report_checkpointer() gets called in there via > CheckpointWriteDelay() and it becomes visible to the user immediately. > Have you checked if pgstat_report_checkpointer() gets called while the > checkpoint calls SlruInternalWritePage()? If not, then you can just > assign ckpt_bufs_written to buf_written_checkpoints in > LogCheckpointEnd() like its other friends > checkpoint_write_time and checkpoint_sync_time. If I'm getting Bharath correctly, it results in double counting of BufferSync. If we want to keep the realtime-reporting nature of BufferSync, BufferSync should give up to increment CheckPointerStats' counter. Such separation seems to be a kind of stupid and quite bug-prone. In the first place I don't like that we count the same things twice. Couldn't we count the number only by any one of them? If we remove CheckPointerStats.ckpt_bufs_written, CreateCheckPoint can get the final number as the difference between the start-end values of *the shared stats*. As long as a checkpoint runs on a single process, trace info in BufferSync will work fine. Assuming single process checkpointing there must be no problem to do that. (Anyway the current shared stats update for checkpointer is assuming single-process). Otherwise, in exchange with giving up the realtime nature, we can count the number only by CheckPointerStats.ckpt_bufs_written. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Fri, Dec 16, 2022 at 2:14 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote: > > In the first place I don't like that we count the same things twice. > Couldn't we count the number only by any one of them? > > If we remove CheckPointerStats.ckpt_bufs_written, CreateCheckPoint can > get the final number as the difference between the start-end values of > *the shared stats*. As long as a checkpoint runs on a single process, > trace info in BufferSync will work fine. Assuming single process > checkpointing there must be no problem to do that. (Anyway the current > shared stats update for checkpointer is assuming single-process). What if someone resets checkpointer shared stats with pg_stat_reset_shared()? In such a case, the checkpoint complete message will not have the stats, no? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Wed, Dec 14, 2022 at 2:32 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > In order to fix this, the > PendingCheckpointerStats.buf_written_checkpoints should be incremented > in SlruInternalWritePage() similar to > CheckpointStats.ckpt_bufs_written. I have attached the patch for the > same. Please share your thoughts. Presumably we could make this consistent either by counting SLRU writes in both places, or by counting them in neither place. This proposal would count them in both places. But why is that the right thing to do? I'm somewhat inclined to think that we should use "buffers" to mean regular data buffers, and if SLRU buffers also need to be counted, we ought to make that a separate counter. Or just leave it out altogether. This is arguable, though, for sure.... -- Robert Haas EDB: http://www.enterprisedb.com
> Presumably we could make this consistent either by counting SLRU > writes in both places, or by counting them in neither place. This > proposal would count them in both places. But why is that the right > thing to do? > > I'm somewhat inclined to think that we should use "buffers" to mean > regular data buffers, and if SLRU buffers also need to be counted, we > ought to make that a separate counter. Or just leave it out > altogether. > > This is arguable, though, for sure.... Thanks Robert for sharing your thoughts. My first thought was to just remove counting SLRU buffers, then after some more analysis, I found that the checkpointer is responsible for including both regular data buffers and SLRU buffers. Please refer to dee663f7843902535a15ae366cede8b4089f1144 commit for more information. The part of the commit message is included here [1] for quick reference. Hence I concluded to keep the information and added an increment to count SLRU buffers. I am not in favour of making this as a separate counter as this can be treated as little low level information and it just adds up in the stats. Please share your thoughts. [1]: Hoist ProcessSyncRequests() up into CheckPointGuts() to make it clearer that it applies to all the SLRU mini-buffer-pools as well as the main buffer pool. Rearrange things so that data collected in CheckpointStats includes SLRU activity. Thanks & Regards, Nitin Jadhav On Tue, Dec 20, 2022 at 2:38 AM Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Dec 14, 2022 at 2:32 AM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > In order to fix this, the > > PendingCheckpointerStats.buf_written_checkpoints should be incremented > > in SlruInternalWritePage() similar to > > CheckpointStats.ckpt_bufs_written. I have attached the patch for the > > same. Please share your thoughts. > > Presumably we could make this consistent either by counting SLRU > writes in both places, or by counting them in neither place. This > proposal would count them in both places. But why is that the right > thing to do? > > I'm somewhat inclined to think that we should use "buffers" to mean > regular data buffers, and if SLRU buffers also need to be counted, we > ought to make that a separate counter. Or just leave it out > altogether. > > This is arguable, though, for sure.... > > -- > Robert Haas > EDB: http://www.enterprisedb.com
On Tue, Dec 20, 2022 at 8:03 AM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > Thanks Robert for sharing your thoughts. > My first thought was to just remove counting SLRU buffers, then after > some more analysis, I found that the checkpointer is responsible for > including both regular data buffers and SLRU buffers. I know that, but what the checkpointer handles and what ought to be included in the stats are two separate questions. I think that the SLRU information is potentially useful, but mixing it with the information about regular buffers just seems confusing. -- Robert Haas EDB: http://www.enterprisedb.com
On 2022-12-20 08:18:36 -0500, Robert Haas wrote: > I think that the SLRU information is potentially useful, but mixing it > with the information about regular buffers just seems confusing. +1 At least for now, it'd be different if/when we manage to move SLRUs to the main buffer pool. - Andres
On Tue, Dec 20, 2022 at 11:08 PM Andres Freund <andres@anarazel.de> wrote: > > On 2022-12-20 08:18:36 -0500, Robert Haas wrote: > > I think that the SLRU information is potentially useful, but mixing it > > with the information about regular buffers just seems confusing. > > +1 > > At least for now, it'd be different if/when we manage to move SLRUs to > the main buffer pool. +1 to not count SLRU writes in ckpt_bufs_written. If needed we can have new fields CheckpointStats.ckpt_slru_bufs_written and PendingCheckpointerStats.slru_buf_written_checkpoint. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Thanks Robert and Andres for sharing your thoughts. I have modified the code accordingly and attached the new version of patches. patch 0001 fixes the inconsistency in checkpointer stats and patch 0002 separates main buffer and SLRU buffer count from checkpoint complete log message. In 0001, I added a new column to pg_stat_bgwriter view and named it as slru_buffers_checkpoint and kept the existing column buffers_checkpoint as-is. Should I rename this to something like main_buffers_checkpoint? Thoughts? Please refer to sample checkpoint complete log message[1]. I am not quite satisfied with the percentage of buffers written information logged there. The percentage is calculated based on NBuffers in both the cases but I am just worried that are we passing wrong information to the user while user may think that the percentage of buffers is based on the total number of buffers available and the percentage of SLRU buffers is based on the total number of SLRU buffers available. Kindly review and share your comments. [1]: 2022-12-21 10:52:25.931 UTC [63530] LOG: checkpoint complete: wrote 4670 buffers (28.5%), wrote 3 slru buffers (0.0%); 0 WAL file(s) added, 0 removed, 4 recycled; write=0.045 s, sync=0.161 s, total=0.244 s; sync files=25, longest=0.146 s, average=0.007 s; distance=66130 kB, estimate=66130 kB; lsn=0/5557C78, redo lsn=0/5557C40 Thanks & Regards, Nitin Jadhav On Tue, Dec 20, 2022 at 11:08 PM Andres Freund <andres@anarazel.de> wrote: > > On 2022-12-20 08:18:36 -0500, Robert Haas wrote: > > I think that the SLRU information is potentially useful, but mixing it > > with the information about regular buffers just seems confusing. > > +1 > > At least for now, it'd be different if/when we manage to move SLRUs to > the main buffer pool. > > - Andres
Attachment
At Mon, 19 Dec 2022 18:05:38 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Fri, Dec 16, 2022 at 2:14 PM Kyotaro Horiguchi > <horikyota.ntt@gmail.com> wrote: > > > > In the first place I don't like that we count the same things twice. > > Couldn't we count the number only by any one of them? > > > > If we remove CheckPointerStats.ckpt_bufs_written, CreateCheckPoint can > > get the final number as the difference between the start-end values of > > *the shared stats*. As long as a checkpoint runs on a single process, > > trace info in BufferSync will work fine. Assuming single process > > checkpointing there must be no problem to do that. (Anyway the current > > shared stats update for checkpointer is assuming single-process). > > What if someone resets checkpointer shared stats with > pg_stat_reset_shared()? In such a case, the checkpoint complete > message will not have the stats, no? I don't know. I don't believe the stats system doesn't follow such a strict resetting policy. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Wed, 21 Dec 2022 17:14:12 +0530, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote in > [1]: > 2022-12-21 10:52:25.931 UTC [63530] LOG: checkpoint complete: wrote > 4670 buffers (28.5%), wrote 3 slru buffers (0.0%); 0 WAL file(s) > added, 0 removed, 4 recycled; write=0.045 s, sync=0.161 s, total=0.244 > s; sync files=25, longest=0.146 s, average=0.007 s; distance=66130 kB, > estimate=66130 kB; lsn=0/5557C78, redo lsn=0/5557C40 > > Thanks & Regards, > Nitin Jadhav > > On Tue, Dec 20, 2022 at 11:08 PM Andres Freund <andres@anarazel.de> wrote: > > > > On 2022-12-20 08:18:36 -0500, Robert Haas wrote: > > > I think that the SLRU information is potentially useful, but mixing it > > > with the information about regular buffers just seems confusing. > > > > +1 > > > > At least for now, it'd be different if/when we manage to move SLRUs to > > the main buffer pool. It sounds reasonable to exclude SRLU write from buffer writes. But I'm not sure its useful to count SLRU writes separately since it is under the noise level of buffer writes in reglular cases and the value doesn't lead to tuning. However I'm not strongly opposed to adding it either. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Wed, Dec 21, 2022 at 5:15 PM Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > I have modified the code accordingly and attached the new version of > patches. patch 0001 fixes the inconsistency in checkpointer stats and > patch 0002 separates main buffer and SLRU buffer count from checkpoint > complete log message. IMO, there's no need for 2 separate patches for these changes. + (errmsg("restartpoint complete: wrote %d buffers (%.1f%%), " + "wrote %d slru buffers (%.1f%%); %d WAL file(s) added, " + "%d removed, %d recycled; write=%ld.%03d s, " + "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, " + "longest=%ld.%03d s, average=%ld.%03d s; distance=%d kB, " + "estimate=%d kB; lsn=%X/%X, redo lsn=%X/%X", Hm, restartpoint /checkpoint complete message is already too long to read and adding slru buffers to it make it further longer. Note that we don't need to add every checkpoint stat to the log message but to pg_stat_bgwriter. Isn't it enough to show SLRU buffers information in pg_stat_bgwriter alone? Can't one look at pg_stat_slru's blks_written (pgstat_count_slru_page_written()) to really track the SLRUs written? Or is it that one may want to track SLRUs during a checkpoint separately? Is there a real use-case/customer reported issue driving this change? After looking at pg_stat_slru.blks_written, I think the best way is to just leave things as-is and let CheckpointStats count slru buffers too unless there's really a reported issue that says pg_stat_slru.blks_written doesn't serve the purpose. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> IMO, there's no need for 2 separate patches for these changes. I will make it a single patch while sharing the next patch. > + (errmsg("restartpoint complete: wrote %d buffers (%.1f%%), " > + "wrote %d slru buffers (%.1f%%); %d WAL > file(s) added, " > + "%d removed, %d recycled; write=%ld.%03d s, " > + "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, " > + "longest=%ld.%03d s, average=%ld.%03d s; > distance=%d kB, " > + "estimate=%d kB; lsn=%X/%X, redo lsn=%X/%X", > Hm, restartpoint /checkpoint complete message is already too long to > read and adding slru buffers to it make it further longer. Note that > we don't need to add every checkpoint stat to the log message but to > pg_stat_bgwriter. Isn't it enough to show SLRU buffers information in > pg_stat_bgwriter alone? I understand that the log message is too long already but I feel it is ok since it logs only one time per checkpoint and as discussed upthread, SLRU information is potentially useful. > Can't one look at pg_stat_slru's blks_written > (pgstat_count_slru_page_written()) to really track the SLRUs written? > Or is it that one may want to track SLRUs during a checkpoint > separately? Is there a real use-case/customer reported issue driving > this change? > > After looking at pg_stat_slru.blks_written, I think the best way is to > just leave things as-is and let CheckpointStats count slru buffers too > unless there's really a reported issue that says > pg_stat_slru.blks_written doesn't serve the purpose. The v1 patch corresponds to what you are suggesting. But the question is not about tracking slru buffers, it is about separating this information from main buffers count during checkpoint. I think there is enough discussion done upthread to exclude slru buffers from CheckpointStats.ckpt_bufs_written. Please share if you still strongly feel that a separate counter is not required. Thanks & Regards, Nitin Jadhav On Fri, Jan 27, 2023 at 10:45 AM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > On Wed, Dec 21, 2022 at 5:15 PM Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > I have modified the code accordingly and attached the new version of > > patches. patch 0001 fixes the inconsistency in checkpointer stats and > > patch 0002 separates main buffer and SLRU buffer count from checkpoint > > complete log message. > > IMO, there's no need for 2 separate patches for these changes. > > + (errmsg("restartpoint complete: wrote %d buffers (%.1f%%), " > + "wrote %d slru buffers (%.1f%%); %d WAL > file(s) added, " > + "%d removed, %d recycled; write=%ld.%03d s, " > + "sync=%ld.%03d s, total=%ld.%03d s; sync files=%d, " > + "longest=%ld.%03d s, average=%ld.%03d s; > distance=%d kB, " > + "estimate=%d kB; lsn=%X/%X, redo lsn=%X/%X", > Hm, restartpoint /checkpoint complete message is already too long to > read and adding slru buffers to it make it further longer. Note that > we don't need to add every checkpoint stat to the log message but to > pg_stat_bgwriter. Isn't it enough to show SLRU buffers information in > pg_stat_bgwriter alone? > > Can't one look at pg_stat_slru's blks_written > (pgstat_count_slru_page_written()) to really track the SLRUs written? > Or is it that one may want to track SLRUs during a checkpoint > separately? Is there a real use-case/customer reported issue driving > this change? > > After looking at pg_stat_slru.blks_written, I think the best way is to > just leave things as-is and let CheckpointStats count slru buffers too > unless there's really a reported issue that says > pg_stat_slru.blks_written doesn't serve the purpose. > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com
On 2022-12-21 17:14:12 +0530, Nitin Jadhav wrote: > Kindly review and share your comments. This doesn't pass the tests, because the regression tests weren't adjusted: https://api.cirrus-ci.com/v1/artifact/task/5937624817336320/testrun/build/testrun/regress/regress/regression.diffs
> This doesn't pass the tests, because the regression tests weren't adjusted: > https://api.cirrus-ci.com/v1/artifact/task/5937624817336320/testrun/build/testrun/regress/regress/regression.diffs Thanks for sharing this. I have fixed this in the patch attached. >> IMO, there's no need for 2 separate patches for these changes. > I will make it a single patch while sharing the next patch. Clubbed both patches into one. Thanks & Regards, Nitin Jadhav On Tue, Feb 14, 2023 at 6:08 AM Andres Freund <andres@anarazel.de> wrote: > > > On 2022-12-21 17:14:12 +0530, Nitin Jadhav wrote: > > Kindly review and share your comments. > > This doesn't pass the tests, because the regression tests weren't adjusted: > > https://api.cirrus-ci.com/v1/artifact/task/5937624817336320/testrun/build/testrun/regress/regress/regression.diffs
Attachment
On Sun, 19 Feb 2023 at 04:58, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > This doesn't pass the tests, because the regression tests weren't adjusted: > > https://api.cirrus-ci.com/v1/artifact/task/5937624817336320/testrun/build/testrun/regress/regress/regression.diffs > > Thanks for sharing this. I have fixed this in the patch attached. AFAICS this patch was marked Waiting on Author due to this feedback on Feb 14 but the patch was updated Feb 19 and is still passing today. I've marked it Needs Review. I'm not sure if all of the feedback from Bharath Rupireddy and Kyotaro Horiguchi was addressed. If there's any specific questions remaining you need feedback on it would be good to ask explicitly. Otherwise if you think it's ready you could mark it Ready for Commit. -- Gregory Stark As Commitfest Manager
On Sun, 19 Feb 2023 at 15:28, Nitin Jadhav <nitinjadhavpostgres@gmail.com> wrote: > > > This doesn't pass the tests, because the regression tests weren't adjusted: > > https://api.cirrus-ci.com/v1/artifact/task/5937624817336320/testrun/build/testrun/regress/regress/regression.diffs > > Thanks for sharing this. I have fixed this in the patch attached. > > > >> IMO, there's no need for 2 separate patches for these changes. > > I will make it a single patch while sharing the next patch. > > Clubbed both patches into one. The patch does not apply anymore, please post a rebased version of the patch : 1 out of 1 hunk FAILED -- saving rejects to file src/backend/catalog/system_views.sql.rej patching file src/backend/utils/activity/pgstat_checkpointer.c Hunk #1 FAILED at 52. 1 out of 1 hunk FAILED -- saving rejects to file src/backend/utils/activity/pgstat_checkpointer.c.rej patching file src/backend/utils/adt/pgstatfuncs.c Hunk #1 succeeded at 1217 with fuzz 1 (offset 24 lines). patching file src/include/access/xlog.h Hunk #1 succeeded at 165 (offset 3 lines). patching file src/include/catalog/pg_proc.dat Hunk #1 FAILED at 5680. 1 out of 1 hunk FAILED -- saving rejects to file src/include/catalog/pg_proc.dat.rej Regards, Vignesh
On Sat, Jan 20, 2024 at 08:10:03AM +0530, vignesh C wrote: > The patch does not apply anymore, please post a rebased version of the patch : There is more to it. Some of the columns of pg_stat_bgwriter have been moved to a different view, aka pg_stat_checkpointer. I have marked the patch as returned with feedback for now, so feel free to resubmit if you can get a new version of the patch. -- Michael
Attachment
I apologize for not being active on this thread. However, I have now returned to the thread and confirmed that the inconsistency is still present in the latest code. I believe it’s crucial to address this issue, and I am currently submitting the v5 version of the patch. The v4 version had addressed the feedback from Bharath, Kyotaro, Andres, and Robert. The current version has been rebased to incorporate Vignesh’s suggestions. In response to Michael’s comments, I’ve moved the new ‘slru_written’ column from the ‘pg_stat_bgwriter’ view to the ‘pg_stat_checkpointer’ in the attached patch. To summarize our discussions, we’ve reached a consensus to correct the mismatch between the information on buffers written as displayed in the ‘pg_stat_checkpointer’ view and the checkpointer log message. We’ve also agreed to separate the SLRU buffers data from the buffers written and present the SLRU buffers data in a distinct field. I have created the new commitfest entry here https://commitfest.postgresql.org/49/5130/. Kindly share if any comments. Best Regards, Nitin Jadhav Azure Database for PostgreSQL Microsoft On Tue, Jan 30, 2024 at 1:20 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Sat, Jan 20, 2024 at 08:10:03AM +0530, vignesh C wrote: > > The patch does not apply anymore, please post a rebased version of the patch : > > There is more to it. Some of the columns of pg_stat_bgwriter have > been moved to a different view, aka pg_stat_checkpointer. I have > marked the patch as returned with feedback for now, so feel free to > resubmit if you can get a new version of the patch. > -- > Michael
Attachment
On 2024/07/18 16:08, Nitin Jadhav wrote: > I apologize for not being active on this thread. However, I have now > returned to the thread and confirmed that the inconsistency is still > present in the latest code. I believe it’s crucial to address this > issue, and I am currently submitting the v5 version of the patch. The > v4 version had addressed the feedback from Bharath, Kyotaro, Andres, > and Robert. The current version has been rebased to incorporate > Vignesh’s suggestions. In response to Michael’s comments, I’ve moved > the new ‘slru_written’ column from the ‘pg_stat_bgwriter’ view to the > ‘pg_stat_checkpointer’ in the attached patch. > > To summarize our discussions, we’ve reached a consensus to correct the > mismatch between the information on buffers written as displayed in > the ‘pg_stat_checkpointer’ view and the checkpointer log message. > We’ve also agreed to separate the SLRU buffers data from the buffers > written and present the SLRU buffers data in a distinct field. > > I have created the new commitfest entry here > https://commitfest.postgresql.org/49/5130/. > Kindly share if any comments. Thanks for updating the patch! In pgstat_checkpointer.c, it looks like you missed adding CHECKPOINTER_COMP(slru_written) in pgstat_checkpointer_snapshot_cb(). + <row> + <entry role="catalog_table_entry"><para role="column_definition"> + <structfield>slru_written</structfield> <type>bigint</type> + </para> + <para> + Number of SLRU buffers written during checkpoints and restartpoints + </para></entry> + </row> This entry should be moved to the pg_stat_checkpointer documentation. + CheckpointStats.ckpt_slru_written, + (double) CheckpointStats.ckpt_slru_written * 100 / NBuffers, I don't think NBuffers represents the maximum number of SLRU buffers. We might need to calculate this based on specific GUC settings, like transaction_buffers. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2024/09/22 20:44, Nitin Jadhav wrote: >> + CheckpointStats.ckpt_slru_written, >> + (double) CheckpointStats.ckpt_slru_written * 100 / NBuffers, >> >> I don't think NBuffers represents the maximum number of SLRU buffers. >> We might need to calculate this based on specific GUC settings, >> like transaction_buffers. > > Great observation. Since the SLRU buffers written during a checkpoint > can include transaction_buffers, commit_timestamp_buffers, > subtransaction_buffers, multixact_member_buffers, > multixact_offset_buffers, and serializable_buffers, the total count of > SLRU buffers should be the sum of all these types. We might need to > introduce a global variable, such as total_slru_count, in the > globals.c file to hold this sum. The num_slots variable in the > SlruSharedData structure needs to be accessed from all types of SLRU > and stored in total_slru_count. This can then be used during logging > to calculate the percentage of SLRU buffers written. However, I’m > unsure if this effort is justified. While it makes sense for normal > buffers to display the percentage, the additional code required might > not provide significant value to users. Therefore, I have removed this > in the attached v6 patch. +1 Thanks for updating the patch! It looks good to me. Barring any objections, I will commit this patch. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
> Before committing the patch, I revised it with the following changes: > > - I added "shared" to the description of pg_stat_checkpointer.buffers_written > to clarify that it tracks shared buffers. > - In the checkpoint log message, I replaced "slru" with "SLRU" for consistency, > as uppercase is typically used for SLRU. > - I updated the commit message. Thanks for updating. Looks good to me. Best Regards, Nitin Jadhav Azure Database for PostgreSQL Microsoft On Tue, Oct 1, 2024 at 9:28 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > > > On 2024/10/01 3:33, Fujii Masao wrote: > > > > > > On 2024/09/22 20:44, Nitin Jadhav wrote: > >>> + CheckpointStats.ckpt_slru_written, > >>> + (double) CheckpointStats.ckpt_slru_written * 100 / NBuffers, > >>> > >>> I don't think NBuffers represents the maximum number of SLRU buffers. > >>> We might need to calculate this based on specific GUC settings, > >>> like transaction_buffers. > >> > >> Great observation. Since the SLRU buffers written during a checkpoint > >> can include transaction_buffers, commit_timestamp_buffers, > >> subtransaction_buffers, multixact_member_buffers, > >> multixact_offset_buffers, and serializable_buffers, the total count of > >> SLRU buffers should be the sum of all these types. We might need to > >> introduce a global variable, such as total_slru_count, in the > >> globals.c file to hold this sum. The num_slots variable in the > >> SlruSharedData structure needs to be accessed from all types of SLRU > >> and stored in total_slru_count. This can then be used during logging > >> to calculate the percentage of SLRU buffers written. However, I’m > >> unsure if this effort is justified. While it makes sense for normal > >> buffers to display the percentage, the additional code required might > >> not provide significant value to users. Therefore, I have removed this > >> in the attached v6 patch. > > > > +1 > > > > Thanks for updating the patch! It looks good to me. > > Barring any objections, I will commit this patch. > > Before committing the patch, I revised it with the following changes: > > - I added "shared" to the description of pg_stat_checkpointer.buffers_written > to clarify that it tracks shared buffers. > - In the checkpoint log message, I replaced "slru" with "SLRU" for consistency, > as uppercase is typically used for SLRU. > - I updated the commit message. > > Regards, > > -- > Fujii Masao > Advanced Computing Technology Center > Research and Development Headquarters > NTT DATA CORPORATION
On 2024/10/02 1:50, Nitin Jadhav wrote: >> Before committing the patch, I revised it with the following changes: >> >> - I added "shared" to the description of pg_stat_checkpointer.buffers_written >> to clarify that it tracks shared buffers. >> - In the checkpoint log message, I replaced "slru" with "SLRU" for consistency, >> as uppercase is typically used for SLRU. >> - I updated the commit message. > > Thanks for updating. Looks good to me. Thanks for the review! Pushed. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION