Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Re: Publish checkpoint timing and sync files summary data to pg_stat_bgwriter |
Date | |
Msg-id | CAEYLb_XiYvpzrAUKG=NxJKR7KCzKHyhf0ubWMWZ+pfqO_eW7qA@mail.gmail.com Whole thread Raw |
In response to | Publish checkpoint timing and sync files summary data to pg_stat_bgwriter (Greg Smith <greg@2ndQuadrant.com>) |
Responses |
Re: Publish checkpoint timing and sync files summary data
to pg_stat_bgwriter
|
List | pgsql-hackers |
On 16 January 2012 06:28, Greg Smith <greg@2ndquadrant.com> wrote: > One of the most useful bits of feedback on how well checkpoint I/O is going > is the amount of time taken to sync files to disk. Right now the only way > to get that is to parse the logs. The attached patch publishes the most > useful three bits of data you could only get from log_checkpoints before out > to pg_stat_bgwriter. Easiest to just show an example: This is a preliminary review of the patch. Attached is a revision of the patch that clears up some minor bit-rot and style issues, just as a convenience to everyone else who might like to try it out against master who doesn't want to repeat my work. There were oid collisions against master, and an upstream change in the pg_proc format with the introduction of the "proleakproof" column. Regression tests pass (as they must have in the prior revision, as the original authors modified "expected/rules.out"). Note that I have deliberately avoided changing the patch further than that, though I believe that further changes will be necessary. One beef that I have with the variable name m_write_ms is that "ms" could equally well refer to microseconds or milliseconds, and these mistakes are very common. Therefore, you might think about making the fields of type "instr_time", which is consistent with some other such fields used within the statistics collector - this takes care of issues with overflows, portability and confusion over units, to as great an extent as is possible. I'm not even confident that this is the right thing though, because PgStat_StatFuncEntry, for example, *does* use PgStat_Countuer as the type for time values, in its case apparently *microseconds* rather than milliseconds, even though the pg_stat_user_functions view displays values in millisecond (it's incorrectly commented). Even the use of instr_time within some pgstat.h structs sees the value converted to microseconds for transmission to the collector. It might be that since we don't really accumulate time values here, it's better to just directly represent that value as PgStat_Counter, but in that case I'd still clearly mark the value as being milliseconds. In any case, I would have the columns in milliseconds, but not ending with *_ms within pg_stat_bgwriter, for consistency with pg_stat_user_functions et al. There is no reason to cast to uint64 when accumulating milliseconds, as the lvalue is currently a PgStat_Counter. Details of existing comment bug: The docs show that pg_stat_*_functions accumulates time in units of milliseconds. However, a PgStat_FunctionEntry payload is commented as sending the time/self time to the stats collector in microseconds. So this comment, in the existing code, is actually wrong: PgStat_Counter f_time; /* times in microseconds */ PgStat_Counter f_time_self; } PgStat_StatFuncEntry; Can someone commit a fix for that please? It could easily result in a user-visible bug. Mixing units, and representing them as simple integers and totally eschewing type safety is sort of a pet peeve of mine, but consistency with the existing code is more important. My general impression of the patch's code quality is that it is more or less idomatic, and, as Greg said, simple; it generalises from the style of existing code in LogCheckpointEnd(), as well as existing fields within the PgStat_MsgBgWriter and PgStat_GlobalStats structs . As for the substance of the patch, I am in general agreement that this is a good idea. Storing the statistics in pg_stat_bgwriter is a more flexible approach than is immediately apparent. Users can usefully calculate delta values, as part of the same process through which checkpoint tuning is performed by comparing output from "select now(), * from pg_stat_bgwriter" at different intervals. This also puts this information within easy reach of monitoring tools. So, I'd ask Greg and/or Jaime to produce a revision of the patch with those concerns in mind, as well as fixing the md.c usage of log_checkpoints. That variable is currently used in these places only: src/backend/access/transam/xlog.c 80:bool log_checkpoints = false; 7694: * Note: because it is possible for log_checkpoints to change while a 7696: * log_checkpoints is currently off. 7835: if (log_checkpoints) 8035: if (log_checkpoints) 8229: * Note: because it is possible for log_checkpoints to change while a 8231: * log_checkpoints is currently off. 8236: if (log_checkpoints) 8301: if (log_checkpoints) 8305: ereport((log_checkpoints ? LOG : DEBUG2), src/backend/storage/smgr/md.c 1097: if (log_checkpoints) 1105: if (log_checkpoints && (!INSTR_TIME_IS_ZERO(sync_start))) I also agree with separating backend write times due to lagging background writes, and the backend writes "by design", as described by Jeff. I'm not inclined to worry too much about the usability issues surrounding the relative values of things like the total checkpoint time and total write and sync phase times matching up. What do you intend to do about testing the log_checkpoints = 'off' case? Can I help in some way? It seems pretty obvious to me that the changes made are correct, apart from the fact that you omitted to modify md.c . In particular, there are no pre or post conditions involved with calling TimeCheckpointEnd() (formerly LogCheckpointEnd() ). This is evident from the following comment at transam/xlog.c:7709: * Note: because it is possible for log_checkpoints to change while a * checkpoint proceeds, we always accumulate stats, even if * log_checkpoints is currently off. It seems fairly silly to me to not expose this information if it's invariably being collected anyway. Cheers -- Peter Geoghegan http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training and Services
Attachment
pgsql-hackers by date: