Re: About to add WAL write/fsync statistics to pg_stat_wal view - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: About to add WAL write/fsync statistics to pg_stat_wal view |
Date | |
Msg-id | d18188f83ab56a32b48e29baf920d245@oss.nttdata.com Whole thread Raw |
In response to | Re: About to add WAL write/fsync statistics to pg_stat_wal view (Masahiro Ikeda <ikedamsh@oss.nttdata.com>) |
Responses |
Re: About to add WAL write/fsync statistics to pg_stat_wal view
|
List | pgsql-hackers |
On 2021-01-25 13:15, Masahiro Ikeda wrote: > On 2021-01-25 10:36, Masahiko Sawada wrote: >> On Fri, Jan 22, 2021 at 10:05 PM Masahiro Ikeda >> <ikedamsh@oss.nttdata.com> wrote: >>> >>> On 2021-01-22 14:50, Masahiko Sawada wrote: >>> > On Fri, Dec 25, 2020 at 6:46 PM Masahiro Ikeda >>> > <ikedamsh@oss.nttdata.com> wrote: >>> >> >>> >> Hi, >>> >> >>> >> I rebased the patch to the master branch. >>> > >>> > Thank you for working on this. I've read the latest patch. Here are >>> > comments: >>> > >>> > --- >>> > + if (track_wal_io_timing) >>> > + { >>> > + INSTR_TIME_SET_CURRENT(duration); >>> > + INSTR_TIME_SUBTRACT(duration, start); >>> > + WalStats.m_wal_write_time += >>> > INSTR_TIME_GET_MILLISEC(duration); >>> > + } >>> > >>> > * I think it should add the time in micro sec. >>> > After running pgbench with track_wal_io_timing = on for 30 sec, >>> > pg_stat_wal showed the following on my environment: >>> > >>> > postgres(1:61569)=# select * from pg_stat_wal; >>> > -[ RECORD 1 ]----+----------------------------- >>> > wal_records | 285947 >>> > wal_fpi | 53285 >>> > wal_bytes | 442008213 >>> > wal_buffers_full | 0 >>> > wal_write | 25516 >>> > wal_write_time | 0 >>> > wal_sync | 25437 >>> > wal_sync_time | 14490 >>> > stats_reset | 2021-01-22 10:56:13.29464+09 >>> > >>> > Since writes can complete less than a millisecond, wal_write_time >>> > didn't increase. I think sync_time could also have the same problem. >>> >>> Thanks for your comments. I didn't notice that. >>> I changed the unit from milliseconds to microseconds. >>> >>> > --- >>> > + /* >>> > + * Measure i/o timing to fsync WAL data. >>> > + * >>> > + * The wal receiver skip to collect it to avoid performance >>> > degradation of standy servers. >>> > + * If sync_method doesn't have its fsync method, to skip too. >>> > + */ >>> > + if (!AmWalReceiverProcess() && track_wal_io_timing && >>> > fsyncMethodCalled()) >>> > + INSTR_TIME_SET_CURRENT(start); >>> > >>> > * Why does only the wal receiver skip it even if track_wal_io_timinig >>> > is true? I think the performance degradation is also true for backend >>> > processes. If there is another reason for that, I think it's better to >>> > mention in both the doc and comment. >>> > * How about checking track_wal_io_timing first? >>> > * s/standy/standby/ >>> >>> I fixed it. >>> As kuroda-san mentioned too, the skip is no need to be considered. >> >> I think you also removed the code to have the wal receiver report the >> stats. So with the latest patch, the wal receiver tracks those >> statistics but doesn't report. >> And maybe XLogWalRcvWrite() also needs to track I/O? > > Thanks, I forgot to add them. > I'll fix it. > > >>> >>> > --- >>> > + /* increment the i/o timing and the number of times to fsync WAL >>> > data */ >>> > + if (fsyncMethodCalled()) >>> > + { >>> > + if (!AmWalReceiverProcess() && track_wal_io_timing) >>> > + { >>> > + INSTR_TIME_SET_CURRENT(duration); >>> > + INSTR_TIME_SUBTRACT(duration, start); >>> > + WalStats.m_wal_sync_time += >>> > INSTR_TIME_GET_MILLISEC(duration); >>> > + } >>> > + >>> > + WalStats.m_wal_sync++; >>> > + } >>> > >>> > * I'd avoid always calling fsyncMethodCalled() in this path. How about >>> > incrementing m_wal_sync after each sync operation? >>> >>> I think if syncing the disk does not occur, m_wal_sync should not be >>> incremented. >>> It depends enableFsync and sync_method. >>> >>> enableFsync is checked in each fsync method like >>> pg_fsync_no_writethrough(), >>> so if incrementing m_wal_sync after each sync operation, it should be >>> implemented >>> in each fsync method. It leads to many duplicated codes. >> >> Right. I missed that each fsync function checks enableFsync. >> >>> So, why don't you change the function to a flag whether to >>> sync data to the disk will be occurred or not in issue_xlog_fsync()? >> >> Looks better. Since we don't necessarily need to increment m_wal_sync >> after doing fsync we can write the code without an additional variable >> as follows: >> >> if (enableFsync) >> { >> switch (sync_method) >> { >> case SYNC_METHOD_FSYNC: >> #ifdef HAVE_FSYNC_WRITETHROUGH >> case SYNC_METHOD_FSYNC_WRITETHROUGH: >> #endif >> #ifdef HAVE_FDATASYNC >> case SYNC_METHOD_FDATASYNC: >> #endif >> WalStats.m_wal_sync++; >> if (track_wal_io_timing) >> INSTR_TIME_SET_CURRENT(start); >> break; >> default: >> break; >> } >> } >> >> (do fsync and error handling here) >> >> /* increment the i/o timing and the number of times to fsync WAL >> data */ >> if (track_wal_io_timing) >> { >> INSTR_TIME_SET_CURRENT(duration); >> INSTR_TIME_SUBTRACT(duration, start); >> WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); >> } > > IIUC, I think we can't handle the following case. > > When "sync_method" is SYNC_METHOD_OPEN or SYNC_METHOD_OPEN_DSYNC and > "track_wal_io_timing" is enabled, "start" doesn't be initialized. > > My understanding is something wrong, isn't it? I thought the following is better. ``` /* Measure i/o timing to sync WAL data.*/ if (track_wal_io_timing) INSTR_TIME_SET_CURRENT(start); (do fsync and error handling here) /* check whether to sync WAL data to the disk right now. */ if (enableFsync) { if ((sync_method == SYNC_METHOD_FSYNC) || (sync_method == SYNC_METHOD_FSYNC_WRITETHROUGH) || (sync_method == SYNC_METHOD_FDATASYNC)) { /* increment the i/o timing and the number of times to fsync WAL data */ if (track_wal_io_timing) { instr_time duration; INSTR_TIME_SET_CURRENT(duration); INSTR_TIME_SUBTRACT(duration, start); WalStats.m_wal_sync_time = INSTR_TIME_GET_MICROSEC(duration); } WalStats.m_wal_sync++; } } ``` Although INSTR_TIME_SET_CURRENT(start) is called everytime regardless of the "sync_method" and "enableFsync", we don't make additional variables. But it's ok because "track_wal_io_timing" leads already performance degradation. What do you think? Regards, -- Masahiro Ikeda NTT DATA CORPORATION
pgsql-hackers by date: