Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) - Mailing list pgsql-hackers
From | Nitin Jadhav |
---|---|
Subject | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
Date | |
Msg-id | CAMm1aWYsfgO91mXxy7kzrDcML8nuY3Ab4w9RnSjZ_mr6=Y-hwQ@mail.gmail.com Whole thread Raw |
In response to | Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) (Matthias van de Meent <boekewurm+postgres@gmail.com>) |
Responses |
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)
Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs) |
List | pgsql-hackers |
> > The progress reporting mechanism of postgres uses the > > 'st_progress_param' array of 'PgBackendStatus' structure to hold the > > information related to the progress. There is a function > > 'pgstat_progress_update_param()' which takes 'index' and 'val' as > > arguments and updates the 'val' to corresponding 'index' in the > > 'st_progress_param' array. This mechanism works fine when all the > > progress information is of type integer as the data type of > > 'st_progress_param' is of type integer. If the progress data is of > > different type than integer, then there is no easy way to do so. > > Progress parameters are int64, so all of the new 'checkpoint start > location' (lsn = uint64), 'triggering backend PID' (int), 'elapsed > time' (store as start time in stat_progress, timestamp fits in 64 > bits) and 'checkpoint or restartpoint?' (boolean) would each fit in a > current stat_progress parameter. Some processing would be required at > the view, but that's not impossible to overcome. Thank you for sharing the information. 'triggering backend PID' (int) - can be stored without any problem. 'checkpoint or restartpoint?' (boolean) - can be stored as a integer value like PROGRESS_CHECKPOINT_TYPE_CHECKPOINT(0) and PROGRESS_CHECKPOINT_TYPE_RESTARTPOINT(1). 'elapsed time' (store as start time in stat_progress, timestamp fits in 64 bits) - As Timestamptz is of type int64 internally, so we can store the timestamp value in the progres parameter and then expose a function like 'pg_stat_get_progress_checkpoint_elapsed' which takes int64 (not Timestamptz) as argument and then returns string representing the elapsed time. This function can be called in the view. Is it safe/advisable to use int64 type here rather than Timestamptz for this purpose? 'checkpoint start location' (lsn = uint64) - I feel we cannot use progress parameters for this case. As assigning uint64 to int64 type would be an issue for larger values and can lead to hidden bugs. Thoughts? Thanks & Regards, Nitin Jadhav On Thu, Feb 17, 2022 at 1:33 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > On Thu, 10 Feb 2022 at 07:53, Nitin Jadhav > <nitinjadhavpostgres@gmail.com> wrote: > > > > > > We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of justemitting the stats at the end. > > > > > > > > Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substepit is running, whether it is on target for completion, checkpoint_Reason > > > > (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need somerefactoring here. > > > > > > I agree to provide above mentioned information as part of showing the > > > progress of current checkpoint operation. I am currently looking into > > > the code to know if any other information can be added. > > > > Here is the initial patch to show the progress of checkpoint through > > pg_stat_progress_checkpoint view. Please find the attachment. > > > > The information added to this view are pid - process ID of a > > CHECKPOINTER process, kind - kind of checkpoint indicates the reason > > for checkpoint (values can be wal, time or force), phase - indicates > > the current phase of checkpoint operation, total_buffer_writes - total > > number of buffers to be written, buffers_processed - number of buffers > > processed, buffers_written - number of buffers written, > > total_file_syncs - total number of files to be synced, files_synced - > > number of files synced. > > > > There are many operations happen as part of checkpoint. For each of > > the operation I am updating the phase field of > > pg_stat_progress_checkpoint view. The values supported for this field > > are initializing, checkpointing replication slots, checkpointing > > snapshots, checkpointing logical rewrite mappings, checkpointing CLOG > > pages, checkpointing CommitTs pages, checkpointing SUBTRANS pages, > > checkpointing MULTIXACT pages, checkpointing SLRU pages, checkpointing > > buffers, performing sync requests, performing two phase checkpoint, > > recycling old XLOG files and Finalizing. In case of checkpointing > > buffers phase, the fields total_buffer_writes, buffers_processed and > > buffers_written shows the detailed progress of writing buffers. In > > case of performing sync requests phase, the fields total_file_syncs > > and files_synced shows the detailed progress of syncing files. In > > other phases, only the phase field is getting updated and it is > > difficult to show the progress because we do not get the total number > > of files count without traversing the directory. It is not worth to > > calculate that as it affects the performance of the checkpoint. I also > > gave a thought to just mention the number of files processed, but this > > wont give a meaningful progress information (It can be treated as > > statistics). Hence just updating the phase field in those scenarios. > > > > Apart from above fields, I am planning to add few more fields to the > > view in the next patch. That is, process ID of the backend process > > which triggered a CHECKPOINT command, checkpoint start location, filed > > to indicate whether it is a checkpoint or restartpoint and elapsed > > time of the checkpoint operation. Please share your thoughts. I would > > be happy to add any other information that contributes to showing the > > progress of checkpoint. > > > > As per the discussion in this thread, there should be some mechanism > > to show the progress of checkpoint during shutdown and end-of-recovery > > cases as we cannot access pg_stat_progress_checkpoint in those cases. > > I am working on this to use log_startup_progress_interval mechanism to > > log the progress in the server logs. > > > > Kindly review the patch and share your thoughts. > > Interesting idea, and overall a nice addition to the > pg_stat_progress_* reporting infrastructure. > > Could you add your patch to the current commitfest at > https://commitfest.postgresql.org/37/? > > See below for some comments on the patch: > > > xlog.c @ checkpoint_progress_start, checkpoint_progress_update_param, checkpoint_progress_end > > + /* In bootstrap mode, we don't actually record anything. */ > > + if (IsBootstrapProcessingMode()) > > + return; > > Why do you check against the state of the system? > pgstat_progress_update_* already provides protections against updating > the progress tables if the progress infrastructure is not loaded; and > otherwise (in the happy path) the cost of updating the progress fields > will be quite a bit higher than normal. Updating stat_progress isn't > very expensive (quite cheap, really), so I don't quite get why you > guard against reporting stats when you expect no other client to be > listening. > > I think you can simplify this a lot by directly using > pgstat_progress_update_param() instead. > > > xlog.c @ checkpoint_progress_start > > + pgstat_progress_start_command(PROGRESS_COMMAND_CHECKPOINT, InvalidOid); > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_PHASE, > > + PROGRESS_CHECKPOINT_PHASE_INIT); > > + if (flags & CHECKPOINT_CAUSE_XLOG) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_WAL); > > + else if (flags & CHECKPOINT_CAUSE_TIME) > > + checkpoint_progress_update_param(flags, PROGRESS_CHECKPOINT_KIND, > > + PROGRESS_CHECKPOINT_KIND_TIME); > > + [...] > > Could you assign the kind of checkpoint to a local variable, and then > update the "phase" and "kind" parameters at the same time through > pgstat_progress_update_multi_param(2, ...)? See > BuildRelationExtStatistics in extended_stats.c for an example usage. > Note that regardless of whether checkpoint_progress_update* will > remain, the checks done in that function already have been checked in > this function as well, so you can use the pgstat_* functions directly. > > > monitoring.sgml > > + <structname>pg_stat_progress_checkpoint</structname> view will contain a > > + single row indicating the progress of checkpoint operation. > > ... add "if a checkpoint is currently active". > > > + <structfield>total_buffer_writes</structfield> <type>bigint</type> > > + <structfield>total_file_syncs</structfield> <type>bigint</type> > > The other progress tables use [type]_total as column names for counter > targets (e.g. backup_total for backup_streamed, heap_blks_total for > heap_blks_scanned, etc.). I think that `buffers_total` and > `files_total` would be better column names. > > > + The checkpoint operation is requested due to XLOG filling. > > + The checkpoint was started because >max_wal_size< of WAL was written. > > > + The checkpoint operation is requested due to timeout. > > + The checkpoint was started due to the expiration of a > >checkpoint_timeout< interval > > > + The checkpoint operation is forced even if no XLOG activity has occurred > > + since the last one. > > + Some operation forced a checkpoint. > > > + <entry><literal>checkpointing CommitTs pages</literal></entry> > > CommitTs -> Commit time stamp > > Thanks for working on this. > > Kind regards, > > Matthias van de Meent
pgsql-hackers by date: