Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers
From | Fujii Masao |
---|---|
Subject | Re: Add statistics to pg_stat_wal view for wal related parameter tuning |
Date | |
Msg-id | 58ccc451-8429-ce44-f0b7-773c8dfedf45@oss.nttdata.com Whole thread Raw |
In response to | Re: Add statistics to pg_stat_wal view for wal related parameter tuning (Masahiro Ikeda <ikedamsh@oss.nttdata.com>) |
Responses |
Re: Add statistics to pg_stat_wal view for wal related parameter tuning
|
List | pgsql-hackers |
On 2020/11/19 16:31, Masahiro Ikeda wrote: > On 2020-11-17 11:46, Fujii Masao wrote: >> On 2020/11/16 16:35, Masahiro Ikeda wrote: >>> On 2020-11-12 14:58, Fujii Masao wrote: >>>> On 2020/11/06 10:25, Masahiro Ikeda wrote: >>>>> On 2020-10-30 11:50, Fujii Masao wrote: >>>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Thanks for your comments and advice. I updated the patch. >>>>>>> >>>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote: >>>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda >>>>>>>> <ikedamsh@oss.nttdata.com> wrote in >>>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote: >>>>>>>>> > I see that we also need to add extra code to capture these stats (some >>>>>>>>> > of which is in performance-critical path especially in >>>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might >>>>>>>>> > be that it is all fine as it is very important to collect these stats >>>>>>>>> > at cluster-level in spite that the same information can be gathered at >>>>>>>>> > statement-level to help customers but I don't see a very strong case >>>>>>>>> > for that in your proposal. >>>>>>>> >>>>>>>> We should avoid that duplication as possible even if the both number >>>>>>>> are important. >>>>>>>> >>>>>>>>> Also about performance, I thought there are few impacts because it >>>>>>>>> increments stats in memory. If I can implement to reuse pgWalUsage's >>>>>>>>> value which already collects these stats, there is no impact in >>>>>>>>> XLogInsertRecord. >>>>>>>>> For example, how about pg_stat_wal() calculates the accumulated >>>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's >>>>>>>>> value? >>>>>>>> >>>>>>>> I don't think that works, but it would work that pgstat_send_wal() >>>>>>>> takes the difference of that values between two successive calls. >>>>>>>> >>>>>>>> WalUsage prevWalUsage; >>>>>>>> ... >>>>>>>> pgstat_send_wal() >>>>>>>> { >>>>>>>> .. >>>>>>>> /* fill in some values using pgWalUsage */ >>>>>>>> WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes; >>>>>>>> WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records; >>>>>>>> WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi; >>>>>>>> ... >>>>>>>> pgstat_send(&WalStats, sizeof(WalStats)); >>>>>>>> >>>>>>>> /* remember the current numbers */ >>>>>>>> prevWalUsage = pgWalUsage; >>>>>>> >>>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage >>>>>>> which is already defined and eliminates the extra overhead. >>>>>> >>>>>> + /* fill in some values using pgWalUsage */ >>>>>> + WalStats.m_wal_bytes = pgWalUsage.wal_bytes - prevWalUsage.wal_bytes; >>>>>> + WalStats.m_wal_records = pgWalUsage.wal_records - prevWalUsage.wal_records; >>>>>> + WalStats.m_wal_fpi = pgWalUsage.wal_fpi - prevWalUsage.wal_fpi; >>>>>> >>>>>> It's better to use WalUsageAccumDiff() here? >>>>> >>>>> Yes, thanks. I fixed it. >>>>> >>>>>> prevWalUsage needs to be initialized with pgWalUsage? >>>>>> >>>>>> + if (AmWalWriterProcess()){ >>>>>> + WalStats.m_wal_write_walwriter++; >>>>>> + } >>>>>> + else >>>>>> + { >>>>>> + WalStats.m_wal_write_backend++; >>>>>> + } >>>>>> >>>>>> I think that it's better not to separate m_wal_write_xxx into two for >>>>>> walwriter and other processes. Instead, we can use one m_wal_write_xxx >>>>>> counter and make pgstat_send_wal() send also the process type to >>>>>> the stats collector. Then the stats collector can accumulate the counters >>>>>> per process type if necessary. If we adopt this approach, we can easily >>>>>> extend pg_stat_wal so that any fields can be reported per process type. >>>>> >>>>> I'll remove the above source code because these counters are not useful. >>>>> >>>>> >>>>> On 2020-10-30 12:00, Fujii Masao wrote: >>>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote: >>>>>>> Hi, >>>>>>> >>>>>>> I think we need to add some statistics to pg_stat_wal view. >>>>>>> >>>>>>> Although there are some parameter related WAL, >>>>>>> there are few statistics for tuning them. >>>>>>> >>>>>>> I think it's better to provide the following statistics. >>>>>>> Please let me know your comments. >>>>>>> >>>>>>> ``` >>>>>>> postgres=# SELECT * from pg_stat_wal; >>>>>>> -[ RECORD 1 ]-------+------------------------------ >>>>>>> wal_records | 2000224 >>>>>>> wal_fpi | 47 >>>>>>> wal_bytes | 248216337 >>>>>>> wal_buffers_full | 20954 >>>>>>> wal_init_file | 8 >>>>>>> wal_write_backend | 20960 >>>>>>> wal_write_walwriter | 46 >>>>>>> wal_write_time | 51 >>>>>>> wal_sync_backend | 7 >>>>>>> wal_sync_walwriter | 8 >>>>>>> wal_sync_time | 0 >>>>>>> stats_reset | 2020-10-20 11:04:51.307771+09 >>>>>>> ``` >>>>>>> >>>>>>> 1. Basic statistics of WAL activity >>>>>>> >>>>>>> - wal_records: Total number of WAL records generated >>>>>>> - wal_fpi: Total number of WAL full page images generated >>>>>>> - wal_bytes: Total amount of WAL bytes generated >>>>>>> >>>>>>> To understand DB's performance, first, we will check the performance >>>>>>> trends for the entire database instance. >>>>>>> For example, if the number of wal_fpi becomes higher, users may tune >>>>>>> "wal_compression", "checkpoint_timeout" and so on. >>>>>>> >>>>>>> Although users can check the above statistics via EXPLAIN, auto_explain, >>>>>>> autovacuum and pg_stat_statements now, >>>>>>> if users want to see the performance trends for the entire database, >>>>>>> they must recalculate the statistics. >>>>>>> >>>>>>> I think it is useful to add the sum of the basic statistics. >>>>>>> >>>>>>> >>>>>>> 2. WAL segment file creation >>>>>>> >>>>>>> - wal_init_file: Total number of WAL segment files created. >>>>>>> >>>>>>> To create a new WAL file may have an impact on the performance of >>>>>>> a write-heavy workload generating lots of WAL. If this number is reported high, >>>>>>> to reduce the number of this initialization, we can tune WAL-related parameters >>>>>>> so that more "recycled" WAL files can be held. >>>>>>> >>>>>>> >>>>>>> >>>>>>> 3. Number of when WAL is flushed >>>>>>> >>>>>>> - wal_write_backend : Total number of WAL data written to the disk by backends >>>>>>> - wal_write_walwriter : Total number of WAL data written to the disk by walwriter >>>>>>> - wal_sync_backend : Total number of WAL data synced to the disk by backends >>>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk by walwrite >>>>>>> >>>>>>> I think it's useful for tuning "synchronous_commit" and "commit_delay" for query executions. >>>>>>> If the number of WAL is flushed is high, users can know "synchronous_commit" is useful for the workload. >>>>>> >>>>>> I just wonder how useful these counters are. Even without these counters, >>>>>> we already know synchronous_commit=off is likely to cause the better >>>>>> performance (but has the risk of data loss). So ISTM that these counters are >>>>>> not so useful when tuning synchronous_commit. >>>>> >>>>> Thanks, my understanding was wrong. >>>>> I agreed that your comments. >>>>> >>>>> I merged the statistics of *_backend and *_walwriter. >>>>> I think the sum of them is useful to calculate the average per write/sync time. >>>>> For example, per write time is equals wal_write_time / wal_write. >>>> >>>> Understood. >>>> >>>> Thanks for updating the patch! >>> >>> Thanks for your comments. >>> >>>> patching file src/include/catalog/pg_proc.dat >>>> Hunk #1 FAILED at 5491. >>>> 1 out of 1 hunk FAILED -- saving rejects to file >>>> src/include/catalog/pg_proc.dat.rej >>>> >>>> I got this failure when applying the patch. Could you update the patch? >>> >>> Thanks, I updated the patch. >>> >>>> - Number of times WAL data was written to the disk because WAL >>>> buffers got full >>>> + Total number of times WAL data written to the disk because WAL >>>> buffers got full >>>> >>>> Isn't "was" necessary between "data" and "written"? >>> >>> Yes, I fixed it. >>> >>>> + <entry role="catalog_table_entry"><para role="column_definition"> >>>> + <structfield>wal_bytes</structfield> <type>bigint</type> >>>> >>>> Shouldn't the type of wal_bytes be numeric because the total number of >>>> WAL bytes can exceed the range of bigint? I think that the type of >>>> pg_stat_statements.wal_bytes is also numeric for the same reason. >>> >>> Thanks, I fixed it. >>> >>> Since I cast the type of wal_bytes from PgStat_Counter to uint64, >>> I changed the type of PgStat_MsgWal and PgStat_WalStats too. >>> >>>> + <entry role="catalog_table_entry"><para role="column_definition"> >>>> + <structfield>wal_write_time</structfield> <type>bigint</type> >>>> >>>> Shouldn't the type of wal_xxx_time be double precision, >>>> like pg_stat_database.blk_write_time? >>> >>> Thanks, I changed it. >>> >>>> Even when fsync is set to off or wal_sync_method is set to open_sync, >>>> wal_sync is incremented. Isn't this behavior confusing? >> >> What do you think about this comment? > > Sorry, I'll change to increment wal_sync and wal_sync_time only > if a specific fsync method is called. > >> I found that we discussed track-WAL-IO-timing feature at the past discussion >> about the similar feature [1]. But the feature was droppped from the proposal >> patch because there was the performance concern. So probably we need to >> revisit the past discussion and benchmark the performance. Thought? >> >> If track-WAL-IO-timing feature may cause performance regression, >> it might be an idea to extract wal_records, wal_fpi and wal_bytes parts >> from the patch and commit it at first. >> >> [1] >> https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com > > Thanks, I'll check the thread. > I agree to add basic statistics at first and I attached the patch. Thanks! + /* Send WAL statistics */ + pgstat_send_wal(); This is not necessary because walwriter generates no WAL data? > >>>> >>>> >>>> + Total amount of time that has been spent in the portion of >>>> + WAL data was written to disk by backend and walwriter, in milliseconds >>>> + (if <xref linkend="guc-track-io-timing"/> is enabled, otherwise zero) >>>> >>>> With the patch, track_io_timing controls both IO for data files and >>>> WAL files. But we may want to track only either of them. So it's better >>>> to extend track_io_timing so that we can specify the tracking target >>>> in the parameter? For example, we can make track_io_timing accept >>>> data, wal and all. Or we should introduce new GUC for WAL, e.g., >>>> track_wal_io_timing? Thought? >>> >>> OK, I introduced the new GUC "track_wal_io_timing". >>> >>>> I'm afraid that "by backend and walwriter" part can make us thinkg >>>> incorrectly that WAL writes by other processes like autovacuum >>>> are not tracked. >>> >>> Sorry, I removed "by backend and walwriter". >> >> Thanks for updating the patch! >> >> +WalUsage prevWalUsage; >> >> ISTM that we can declare this as static variable because >> it's used only in pgstat.c. > > Thanks, I fixed it. > >> + memset(&walusage, 0, sizeof(WalUsage)); >> + WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage); >> >> This memset seems unnecessary. > > I couldn't understand why this memset is unnecessary. > Since WalUsageAccumDiff not only calculates the difference but also adds the value, > I thought walusage needs to be initialized. Yes, you're right! Sorry for noise... Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
pgsql-hackers by date: