Re: Add statistics to pg_stat_wal view for wal related parameter tuning - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: Add statistics to pg_stat_wal view for wal related parameter tuning |
Date | |
Msg-id | 89de5056627ebdd144967959bd789bd6@oss.nttdata.com Whole thread Raw |
In response to | Re: Add statistics to pg_stat_wal view for wal related parameter tuning (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: Add statistics to pg_stat_wal view for wal related parameter tuning
|
List | pgsql-hackers |
On 2020-11-17 11:46, Fujii Masao wrote: > On 2020/11/16 16:35, Masahiro Ikeda wrote: >> On 2020-11-12 14:58, Fujii Masao wrote: >>> On 2020/11/06 10:25, Masahiro Ikeda wrote: >>>> On 2020-10-30 11:50, Fujii Masao wrote: >>>>> On 2020/10/29 17:03, Masahiro Ikeda wrote: >>>>>> Hi, >>>>>> >>>>>> Thanks for your comments and advice. I updated the patch. >>>>>> >>>>>> On 2020-10-21 18:03, Kyotaro Horiguchi wrote: >>>>>>> At Tue, 20 Oct 2020 16:11:29 +0900, Masahiro Ikeda >>>>>>> <ikedamsh@oss.nttdata.com> wrote in >>>>>>>> On 2020-10-20 12:46, Amit Kapila wrote: >>>>>>>> > I see that we also need to add extra code to capture these stats (some >>>>>>>> > of which is in performance-critical path especially in >>>>>>>> > XLogInsertRecord) which again makes me a bit uncomfortable. It might >>>>>>>> > be that it is all fine as it is very important to collect these stats >>>>>>>> > at cluster-level in spite that the same information can be gathered at >>>>>>>> > statement-level to help customers but I don't see a very strong case >>>>>>>> > for that in your proposal. >>>>>>> >>>>>>> We should avoid that duplication as possible even if the both >>>>>>> number >>>>>>> are important. >>>>>>> >>>>>>>> Also about performance, I thought there are few impacts because >>>>>>>> it >>>>>>>> increments stats in memory. If I can implement to reuse >>>>>>>> pgWalUsage's >>>>>>>> value which already collects these stats, there is no impact in >>>>>>>> XLogInsertRecord. >>>>>>>> For example, how about pg_stat_wal() calculates the accumulated >>>>>>>> value of wal_records, wal_fpi, and wal_bytes to use pgWalUsage's >>>>>>>> value? >>>>>>> >>>>>>> I don't think that works, but it would work that >>>>>>> pgstat_send_wal() >>>>>>> takes the difference of that values between two successive calls. >>>>>>> >>>>>>> WalUsage prevWalUsage; >>>>>>> ... >>>>>>> pgstat_send_wal() >>>>>>> { >>>>>>> .. >>>>>>> /* fill in some values using pgWalUsage */ >>>>>>> WalStats.m_wal_bytes = pgWalUsage.wal_bytes - >>>>>>> prevWalUsage.wal_bytes; >>>>>>> WalStats.m_wal_records = pgWalUsage.wal_records - >>>>>>> prevWalUsage.wal_records; >>>>>>> WalStats.m_wal_wal_fpi = pgWalUsage.wal_fpi - >>>>>>> prevWalUsage.wal_fpi; >>>>>>> ... >>>>>>> pgstat_send(&WalStats, sizeof(WalStats)); >>>>>>> >>>>>>> /* remember the current numbers */ >>>>>>> prevWalUsage = pgWalUsage; >>>>>> >>>>>> Thanks for Horiguchi-san's advice, I changed to reuse pgWalUsage >>>>>> which is already defined and eliminates the extra overhead. >>>>> >>>>> + /* fill in some values using pgWalUsage */ >>>>> + WalStats.m_wal_bytes = pgWalUsage.wal_bytes - >>>>> prevWalUsage.wal_bytes; >>>>> + WalStats.m_wal_records = pgWalUsage.wal_records - >>>>> prevWalUsage.wal_records; >>>>> + WalStats.m_wal_fpi = pgWalUsage.wal_fpi - >>>>> prevWalUsage.wal_fpi; >>>>> >>>>> It's better to use WalUsageAccumDiff() here? >>>> >>>> Yes, thanks. I fixed it. >>>> >>>>> prevWalUsage needs to be initialized with pgWalUsage? >>>>> >>>>> + if (AmWalWriterProcess()){ >>>>> + WalStats.m_wal_write_walwriter++; >>>>> + } >>>>> + else >>>>> + { >>>>> + WalStats.m_wal_write_backend++; >>>>> + } >>>>> >>>>> I think that it's better not to separate m_wal_write_xxx into two >>>>> for >>>>> walwriter and other processes. Instead, we can use one >>>>> m_wal_write_xxx >>>>> counter and make pgstat_send_wal() send also the process type to >>>>> the stats collector. Then the stats collector can accumulate the >>>>> counters >>>>> per process type if necessary. If we adopt this approach, we can >>>>> easily >>>>> extend pg_stat_wal so that any fields can be reported per process >>>>> type. >>>> >>>> I'll remove the above source code because these counters are not >>>> useful. >>>> >>>> >>>> On 2020-10-30 12:00, Fujii Masao wrote: >>>>> On 2020/10/20 11:31, Masahiro Ikeda wrote: >>>>>> Hi, >>>>>> >>>>>> I think we need to add some statistics to pg_stat_wal view. >>>>>> >>>>>> Although there are some parameter related WAL, >>>>>> there are few statistics for tuning them. >>>>>> >>>>>> I think it's better to provide the following statistics. >>>>>> Please let me know your comments. >>>>>> >>>>>> ``` >>>>>> postgres=# SELECT * from pg_stat_wal; >>>>>> -[ RECORD 1 ]-------+------------------------------ >>>>>> wal_records | 2000224 >>>>>> wal_fpi | 47 >>>>>> wal_bytes | 248216337 >>>>>> wal_buffers_full | 20954 >>>>>> wal_init_file | 8 >>>>>> wal_write_backend | 20960 >>>>>> wal_write_walwriter | 46 >>>>>> wal_write_time | 51 >>>>>> wal_sync_backend | 7 >>>>>> wal_sync_walwriter | 8 >>>>>> wal_sync_time | 0 >>>>>> stats_reset | 2020-10-20 11:04:51.307771+09 >>>>>> ``` >>>>>> >>>>>> 1. Basic statistics of WAL activity >>>>>> >>>>>> - wal_records: Total number of WAL records generated >>>>>> - wal_fpi: Total number of WAL full page images generated >>>>>> - wal_bytes: Total amount of WAL bytes generated >>>>>> >>>>>> To understand DB's performance, first, we will check the >>>>>> performance >>>>>> trends for the entire database instance. >>>>>> For example, if the number of wal_fpi becomes higher, users may >>>>>> tune >>>>>> "wal_compression", "checkpoint_timeout" and so on. >>>>>> >>>>>> Although users can check the above statistics via EXPLAIN, >>>>>> auto_explain, >>>>>> autovacuum and pg_stat_statements now, >>>>>> if users want to see the performance trends for the entire >>>>>> database, >>>>>> they must recalculate the statistics. >>>>>> >>>>>> I think it is useful to add the sum of the basic statistics. >>>>>> >>>>>> >>>>>> 2. WAL segment file creation >>>>>> >>>>>> - wal_init_file: Total number of WAL segment files created. >>>>>> >>>>>> To create a new WAL file may have an impact on the performance of >>>>>> a write-heavy workload generating lots of WAL. If this number is >>>>>> reported high, >>>>>> to reduce the number of this initialization, we can tune >>>>>> WAL-related parameters >>>>>> so that more "recycled" WAL files can be held. >>>>>> >>>>>> >>>>>> >>>>>> 3. Number of when WAL is flushed >>>>>> >>>>>> - wal_write_backend : Total number of WAL data written to the disk >>>>>> by backends >>>>>> - wal_write_walwriter : Total number of WAL data written to the >>>>>> disk by walwriter >>>>>> - wal_sync_backend : Total number of WAL data synced to the disk >>>>>> by backends >>>>>> - wal_sync_walwriter : Total number of WAL data synced to the disk >>>>>> by walwrite >>>>>> >>>>>> I think it's useful for tuning "synchronous_commit" and >>>>>> "commit_delay" for query executions. >>>>>> If the number of WAL is flushed is high, users can know >>>>>> "synchronous_commit" is useful for the workload. >>>>> >>>>> I just wonder how useful these counters are. Even without these >>>>> counters, >>>>> we already know synchronous_commit=off is likely to cause the >>>>> better >>>>> performance (but has the risk of data loss). So ISTM that these >>>>> counters are >>>>> not so useful when tuning synchronous_commit. >>>> >>>> Thanks, my understanding was wrong. >>>> I agreed that your comments. >>>> >>>> I merged the statistics of *_backend and *_walwriter. >>>> I think the sum of them is useful to calculate the average per >>>> write/sync time. >>>> For example, per write time is equals wal_write_time / wal_write. >>> >>> Understood. >>> >>> Thanks for updating the patch! >> >> Thanks for your comments. >> >>> patching file src/include/catalog/pg_proc.dat >>> Hunk #1 FAILED at 5491. >>> 1 out of 1 hunk FAILED -- saving rejects to file >>> src/include/catalog/pg_proc.dat.rej >>> >>> I got this failure when applying the patch. Could you update the >>> patch? >> >> Thanks, I updated the patch. >> >>> - Number of times WAL data was written to the disk because WAL >>> buffers got full >>> + Total number of times WAL data written to the disk because >>> WAL >>> buffers got full >>> >>> Isn't "was" necessary between "data" and "written"? >> >> Yes, I fixed it. >> >>> + <entry role="catalog_table_entry"><para >>> role="column_definition"> >>> + <structfield>wal_bytes</structfield> <type>bigint</type> >>> >>> Shouldn't the type of wal_bytes be numeric because the total number >>> of >>> WAL bytes can exceed the range of bigint? I think that the type of >>> pg_stat_statements.wal_bytes is also numeric for the same reason. >> >> Thanks, I fixed it. >> >> Since I cast the type of wal_bytes from PgStat_Counter to uint64, >> I changed the type of PgStat_MsgWal and PgStat_WalStats too. >> >>> + <entry role="catalog_table_entry"><para >>> role="column_definition"> >>> + <structfield>wal_write_time</structfield> <type>bigint</type> >>> >>> Shouldn't the type of wal_xxx_time be double precision, >>> like pg_stat_database.blk_write_time? >> >> Thanks, I changed it. >> >>> Even when fsync is set to off or wal_sync_method is set to open_sync, >>> wal_sync is incremented. Isn't this behavior confusing? > > What do you think about this comment? Sorry, I'll change to increment wal_sync and wal_sync_time only if a specific fsync method is called. > I found that we discussed track-WAL-IO-timing feature at the past > discussion > about the similar feature [1]. But the feature was droppped from the > proposal > patch because there was the performance concern. So probably we need to > revisit the past discussion and benchmark the performance. Thought? > > If track-WAL-IO-timing feature may cause performance regression, > it might be an idea to extract wal_records, wal_fpi and wal_bytes parts > from the patch and commit it at first. > > [1] > https://postgr.es/m/CAJrrPGc6APFUGYNcPe4qcNxpL8gXKYv1KST+vwJcFtCSCEySnA@mail.gmail.com Thanks, I'll check the thread. I agree to add basic statistics at first and I attached the patch. >>> >>> >>> + Total amount of time that has been spent in the portion of >>> + WAL data was written to disk by backend and walwriter, in >>> milliseconds >>> + (if <xref linkend="guc-track-io-timing"/> is enabled, >>> otherwise zero) >>> >>> With the patch, track_io_timing controls both IO for data files and >>> WAL files. But we may want to track only either of them. So it's >>> better >>> to extend track_io_timing so that we can specify the tracking target >>> in the parameter? For example, we can make track_io_timing accept >>> data, wal and all. Or we should introduce new GUC for WAL, e.g., >>> track_wal_io_timing? Thought? >> >> OK, I introduced the new GUC "track_wal_io_timing". >> >>> I'm afraid that "by backend and walwriter" part can make us thinkg >>> incorrectly that WAL writes by other processes like autovacuum >>> are not tracked. >> >> Sorry, I removed "by backend and walwriter". > > Thanks for updating the patch! > > +WalUsage prevWalUsage; > > ISTM that we can declare this as static variable because > it's used only in pgstat.c. Thanks, I fixed it. > + memset(&walusage, 0, sizeof(WalUsage)); > + WalUsageAccumDiff(&walusage, &pgWalUsage, &prevWalUsage); > > This memset seems unnecessary. I couldn't understand why this memset is unnecessary. Since WalUsageAccumDiff not only calculates the difference but also adds the value, I thought walusage needs to be initialized. > /* We assume this initializes to zeroes */ > static const PgStat_MsgWal all_zeroes; > > This declaration of the variable should be placed around > the top of pgstat_send_wal(). Sorry, I fixed it. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
pgsql-hackers by date: