Re: Show WAL write and fsync stats in pg_stat_io - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: Show WAL write and fsync stats in pg_stat_io |
Date | |
Msg-id | CAAKRu_ZzY9k5XgvsSwU2sOvb==POv=jaKTom9uRVfjAgWhTvgw@mail.gmail.com Whole thread Raw |
In response to | Show WAL write and fsync stats in pg_stat_io (Nazir Bilal Yavuz <byavuz81@gmail.com>) |
Responses |
Re: Show WAL write and fsync stats in pg_stat_io
|
List | pgsql-hackers |
On Wed, Jun 28, 2023 at 6:09 AM Nazir Bilal Yavuz <byavuz81@gmail.com> wrote: > This is a WIP patch to add WAL write and fsync stats to pg_stat_io > view. Thanks for working on this! I have some feedback on the content of the patch as well as some items that I feel are missing. I think it would be good to count WAL reads even though they are not currently represented in pg_stat_wal. Here is a thread discussing this [1]. Eventually, the docs will need an update as well. You can wait until a later version of the patch to do this, but I would include it in a list of the remaining TODOs in your next version. I think we will also want to add an IOContext for WAL initialization. Then we can track how long is spent doing WAL init (including filling the WAL file with zeroes). XLogFileInitInternal() is likely where we would want to add it. And op_bytes for this would likely be wal_segment_size. I thought I heard about someone proposing adding WAL init to pg_stat_wal, but I can't find the thread. I think there is also an argument for counting WAL files recycled as IOOP_REUSES. We should start thinking about how to interpret the different IOOps within the two IOContexts and discussing what would be useful to count. For example, should removing a logfile count as an IOOP_EVICT? Maybe it is not directly related to "IO" enough or even an interesting statistic, but we should think about what kinds of IO-related WAL statistics we want to track. Any that we decide not to count for now should be "banned" in pgstat_tracks_io_op() for clarity. For example, if we create a separate IOContext for WAL file init, I'm not sure what would count as an IOOP_EXTEND in IOCONTEXT_NORMAL for IOOBJECT_WAL. Also, I think there are some backend types which will not generate WAL and we should determine which those are and skip those rows in pgstat_tracks_io_object(). diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 8b0710abe6..2ee6c21398 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -2207,6 +2207,10 @@ XLogWrite(XLogwrtRqst WriteRqst, TimeLineID tli, bool flexible) I think we should likely follow the pattern of using pgstat_prepare_io_time() and pgstat_count_io_op_time() as it is done elsewhere. You could pass the IOObject as a parameter to pgstat_prepare_io_time() in order to determine if we should check track_io_timing or track_wal_io_timing. And we'll want to check track_wal_io_timing if IOObject is IOOBJECT_WAL in pgstat_count_io_op_time(). INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_write_time, duration, start); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_WRITE, start, 1); + } else + { Other users of pgstat_count_io_op_time()/io_op_n() which write multiple pages at a time pass the number of pages in as the cnt parameter. (see ExtendBufferedRelLocal() as an example). I think we want to do that for WAL also. In this case, it would be the local variable "npages" and we can do it outside of this loop. It is true that the existing WAL stats count wal_writes here. However, this is essentially counting write system calls, which is probably not what we want for pg_stat_io. See [2] for a discussion about whether to count blocks written back or writeback system calls for a previous pg_stat_io feature. All of the other block-based IO statistics in pg_stat_io count the number of blocks. This being said, we probably want to just leave PendingWalStats.wal_write++ here. We would normally move it into pg_stat_io like we have with pgBufferUsage and the db IO stats that are updated in pgstat_count_io_op_time(). This consolidation makes it easier to eventually reduce the duplication. However, in this case, it seems wal_write counts something we don't count in pg_stat_io, so it can probably be left here. I would still move the PendingWalStats.wal_write_time into pgstat_count_io_op_time(), since that seems like it is the same as what will be in pg_stat_io. Also, op_bytes for IOOBJECT_WAL/IOCONTEXT_NORMAL should be XLOG_BLCKSZ (see comment in pg_stat_get_io() in pgstatfuncs.c). Those default to the same value but can be made to be different. + pgstat_count_io_op_n(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_WRITE, 1); } PendingWalStats.wal_write++; @@ -8233,6 +8237,10 @@ issue_xlog_fsync(int fd, XLogSegNo segno, TimeLineID tli) INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_ACCUM_DIFF(PendingWalStats.wal_sync_time, duration, start); + pgstat_count_io_op_time(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, start, 1); I would wrap this line and check other lines to make sure they are not too long. + } else + { + pgstat_count_io_op_n(IOOBJECT_WAL, IOCONTEXT_NORMAL, IOOP_FSYNC, 1); } PendingWalStats.wal_sync++; Same feedback as above about using the prepare/count pattern used for pg_stat_io elsewhere. In this case, you should be able to move PendingWalStats.wal_sync into there as well. diff --git a/src/backend/utils/activity/pgstat_io.c b/src/backend/utils/activity/pgstat_io.c @@ -350,6 +352,11 @@ pgstat_tracks_io_object(BackendType bktype, IOObject io_object, if (!pgstat_tracks_io_bktype(bktype)) return false; + + if (io_context != IOCONTEXT_NORMAL && + io_object == IOOBJECT_WAL) + return false; We should add more restrictions. See the top of my email for details. > There is a track_io_timing variable to control pg_stat_io > timings and a track_wal_io_timing variable to control WAL timings. I > couldn't decide on which logic to enable WAL timings on pg_stat_io. > For now, both pg_stat_io and track_wal_io_timing are needed to be > enabled to track WAL timings in pg_stat_io. Hmm. I could see a case where someone doesn't want to incur the overhead of track_io_timing for regular IO but does want to do so for WAL because they are interested in a specific issue. I'm not sure though. I could be convinced otherwise (based on relative overhead, etc). > Also, if you compare WAL stats in pg_stat_wal and pg_stat_io; you can > come across differences. These differences are caused by the > background writer's WAL stats not being flushed. Because of that, > background writer's WAL stats are not seen in pg_stat_wal but in > pg_stat_io. I already sent a patch [1] to fix that. Cool! Thanks for doing that. - Melanie [1] https://www.postgresql.org/message-id/flat/20230216191138.jotc73lqb7xhfqbi%40awork3.anarazel.de#eb4a641427fa1eb013e9ecdd8648e640 [2] https://www.postgresql.org/message-id/20230504165738.4e2hfoddoels542c%40awork3.anarazel.de
pgsql-hackers by date: