Re: New statistics for tuning WAL buffer size - Mailing list pgsql-hackers
From | Masahiro Ikeda |
---|---|
Subject | Re: New statistics for tuning WAL buffer size |
Date | |
Msg-id | 25c55c57ca9af60707bff6667307a3c7@oss.nttdata.com Whole thread Raw |
In response to | Re: New statistics for tuning WAL buffer size (Fujii Masao <masao.fujii@oss.nttdata.com>) |
Responses |
Re: New statistics for tuning WAL buffer size
|
List | pgsql-hackers |
On 2020-09-15 17:10, Fujii Masao wrote: > On 2020/09/15 15:52, Masahiro Ikeda wrote: >> On 2020-09-11 01:40, Fujii Masao wrote: >>> On 2020/09/09 13:57, Masahiro Ikeda wrote: >>>> On 2020-09-07 16:19, Fujii Masao wrote: >>>>> On 2020/09/07 9:58, Masahiro Ikeda wrote: >>>>>> Thanks for the review and advice! >>>>>> >>>>>> On 2020-09-03 16:05, Fujii Masao wrote: >>>>>>> On 2020/09/02 18:56, Masahiro Ikeda wrote: >>>>>>>>> +/* ---------- >>>>>>>>> + * Backend types >>>>>>>>> + * ---------- >>>>>>>>> >>>>>>>>> You seem to forget to add "*/" into the above comment. >>>>>>>>> This issue could cause the following compiler warning. >>>>>>>>> ../../src/include/pgstat.h:761:1: warning: '/*' within block >>>>>>>>> comment [-Wcomment] >>>>>>>> >>>>>>>> Thanks for the comment. I fixed. >>>>>>> >>>>>>> Thanks for the fix! But why are those comments necessary? >>>>>> >>>>>> Sorry about that. This comment is not necessary. >>>>>> I removed it. >>>>>> >>>>>>>> The pg_stat_walwriter is not security restricted now, so >>>>>>>> ordinary users can access it. >>>>>>>> It has the same security level as pg_stat_archiver. If you have >>>>>>>> any comments, please let me know. >>>>>>> >>>>>>> + <structfield>dirty_writes</structfield> >>>>>>> <type>bigint</type> >>>>>>> >>>>>>> I guess that the column name "dirty_writes" derived from >>>>>>> the DTrace probe name. Isn't this name confusing? We should >>>>>>> rename it to "wal_buffers_full" or something? >>>>>> >>>>>> I agree and rename it to "wal_buffers_full". >>>>>> >>>>>>> +/* ---------- >>>>>>> + * PgStat_MsgWalWriter Sent by the walwriter to >>>>>>> update statistics. >>>>>>> >>>>>>> This comment seems not accurate because backends also send it. >>>>>>> >>>>>>> +/* >>>>>>> + * WAL writes statistics counter is updated in XLogWrite >>>>>>> function >>>>>>> + */ >>>>>>> +extern PgStat_MsgWalWriter WalWriterStats; >>>>>>> >>>>>>> This comment seems not right because the counter is not updated >>>>>>> in XLogWrite(). >>>>>> >>>>>> Right. I fixed it to "Sent by each backend and background workers >>>>>> to update WAL statistics." >>>>>> In the future, other statistics will be included so I remove the >>>>>> function's name. >>>>>> >>>>>> >>>>>>> +-- There will surely and maximum one record >>>>>>> +select count(*) = 1 as ok from pg_stat_walwriter; >>>>>>> >>>>>>> What about changing this comment to "There must be only one >>>>>>> record"? >>>>>> >>>>>> Thanks, I fixed. >>>>>> >>>>>>> + WalWriterStats.m_xlog_dirty_writes++; >>>>>>> LWLockRelease(WALWriteLock); >>>>>>> >>>>>>> Since WalWriterStats.m_xlog_dirty_writes doesn't need to be >>>>>>> protected >>>>>>> with WALWriteLock, isn't it better to increment that after >>>>>>> releasing the lock? >>>>>> >>>>>> Thanks, I fixed. >>>>>> >>>>>>> +CREATE VIEW pg_stat_walwriter AS >>>>>>> + SELECT >>>>>>> + pg_stat_get_xlog_dirty_writes() AS dirty_writes, >>>>>>> + pg_stat_get_walwriter_stat_reset_time() AS stats_reset; >>>>>>> + >>>>>>> CREATE VIEW pg_stat_progress_vacuum AS >>>>>>> >>>>>>> In system_views.sql, the definition of pg_stat_walwriter should >>>>>>> be >>>>>>> placed just after that of pg_stat_bgwriter not >>>>>>> pg_stat_progress_analyze. >>>>>> >>>>>> OK, I fixed it. >>>>>> >>>>>>> } >>>>>>> - >>>>>>> /* >>>>>>> * We found an existing collector stats file. Read it and >>>>>>> put all the >>>>>>> >>>>>>> You seem to accidentally have removed the empty line here. >>>>>> >>>>>> Sorry about that. I fixed it. >>>>>> >>>>>>> - errhint("Target must be \"archiver\" or >>>>>>> \"bgwriter\"."))); >>>>>>> + errhint("Target must be \"archiver\" or >>>>>>> \"bgwriter\" or >>>>>>> \"walwriter\"."))); >>>>>>> >>>>>>> There are two "or" in the message, but the former should be >>>>>>> replaced with ","? >>>>>> >>>>>> Thanks, I fixed. >>>>>> >>>>>> >>>>>> On 2020-09-05 18:40, Magnus Hagander wrote: >>>>>>> On Fri, Sep 4, 2020 at 5:42 AM Fujii Masao >>>>>>> <masao.fujii@oss.nttdata.com> wrote: >>>>>>> >>>>>>>> On 2020/09/04 11:50, tsunakawa.takay@fujitsu.com wrote: >>>>>>>>> From: Fujii Masao <masao.fujii@oss.nttdata.com> >>>>>>>>>>> I changed the view name from pg_stat_walwrites to >>>>>>>> pg_stat_walwriter. >>>>>>>>>>> I think it is better to match naming scheme with other views >>>>>>>> like >>>>>>>>>> pg_stat_bgwriter, >>>>>>>>>>> which is for bgwriter statistics but it has the statistics >>>>>>>> related to backend. >>>>>>>>>> >>>>>>>>>> I prefer the view name pg_stat_walwriter for the consistency >>>>>>>>>> with >>>>>>>>>> other view names. But we also have pg_stat_wal_receiver. Which >>>>>>>>>> makes me think that maybe pg_stat_wal_writer is better for >>>>>>>>>> the consistency. Thought? IMO either of them works for me. >>>>>>>>>> I'd like to hear more opinons about this. >>>>>>>>> >>>>>>>>> I think pg_stat_bgwriter is now a misnomer, because it contains >>>>>>>> the backends' activity. Likewise, pg_stat_walwriter leads to >>>>>>>> misunderstanding because its information is not limited to WAL >>>>>>>> writer. >>>>>>>>> >>>>>>>>> How about simply pg_stat_wal? In the future, we may want to >>>>>>>> include WAL reads in this view, e.g. reading undo logs in zheap. >>>>>>>> >>>>>>>> Sounds reasonable. >>>>>>> >>>>>>> +1. >>>>>>> >>>>>>> pg_stat_bgwriter has had the "wrong name" for quite some time now >>>>>>> -- >>>>>>> it became even more apparent when the checkpointer was split out >>>>>>> to >>>>>>> it's own process, and that's not exactly a recent change. And it >>>>>>> had >>>>>>> allocs in it from day one... >>>>>>> >>>>>>> I think naming it for what the data in it is ("wal") rather than >>>>>>> which >>>>>>> process deals with it ("walwriter") is correct, unless the >>>>>>> statistics >>>>>>> can be known to only *ever* affect one type of process. (And then >>>>>>> different processes can affect different columns in the view). As >>>>>>> a >>>>>>> general rule -- and that's from what I can tell exactly what's >>>>>>> being >>>>>>> proposed. >>>>>> >>>>>> Thanks for your comments. I agree with your opinions. >>>>>> I changed the view name to "pg_stat_wal". >>>>>> >>>>>> >>>>>> I fixed the code to send the WAL statistics from not only backend >>>>>> and walwriter >>>>>> but also checkpointer, walsender and autovacuum worker. >>>>> >>>>> Good point! Thanks for updating the patch! >>>>> >>>>> >>>>> @@ -604,6 +604,7 @@ heap_vacuum_rel(Relation onerel, VacuumParams >>>>> *params, >>>>> onerel->rd_rel->relisshared, >>>>> Max(new_live_tuples, 0), >>>>> vacrelstats->new_dead_tuples); >>>>> + pgstat_send_wal(); >>>>> >>>>> I guess that you changed heap_vacuum_rel() as above so that >>>>> autovacuum >>>>> workers can send WAL stats. But heap_vacuum_rel() can be called by >>>>> the processes (e.g., backends) other than autovacuum workers? Also >>>>> what happens if autovacuum workers just do ANALYZE only? In that >>>>> case, >>>>> heap_vacuum_rel() may not be called. >>>>> >>>>> Currently autovacuum worker reports the stats at the exit via >>>>> pgstat_beshutdown_hook(). Unlike other processes, autovacuum worker >>>>> is not the process that basically keeps running during the service. >>>>> It exits >>>>> after it does vacuum or analyze. So ISTM that it's not bad to >>>>> report the stats >>>>> only at the exit, in autovacuum worker case. There is no need to >>>>> add extra >>>>> code for WAL stats report by autovacuum worker. Thought? >>>> >>>> Thanks, I understood. I removed this code. >>>> >>>>> >>>>> @@ -1430,6 +1430,9 @@ WalSndWaitForWal(XLogRecPtr loc) >>>>> else >>>>> RecentFlushPtr = GetXLogReplayRecPtr(NULL); >>>>> + /* Send wal statistics */ >>>>> + pgstat_send_wal(); >>>>> >>>>> AFAIR logical walsender uses three loops in WalSndLoop(), >>>>> WalSndWriteData() >>>>> and WalSndWaitForWal(). But could you tell me why added >>>>> pgstat_send_wal() >>>>> into WalSndWaitForWal()? I'd like to know why WalSndWaitForWal() is >>>>> the best >>>>> for that purpose. >>>> >>>> I checked what function calls XLogBackgroundFlush() which calls >>>> AdvanceXLInsertBuffer() to increment m_wal_buffers_full. >>>> >>>> I found that WalSndWaitForWal() calls it, so I added it. >>> >>> Ok. But XLogBackgroundFlush() calls AdvanceXLInsertBuffer() wit the >>> second argument opportunistic=true, so in this case WAL write by >>> wal_buffers full seems to never happen. Right? If this understanding >>> is right, WalSndWaitForWal() doesn't need to call pgstat_send_wal(). >>> Probably also walwriter doesn't need to do that. > > Thanks for updating the patch! This patch adds pgstat_send_wal() in > walwriter main loop. But isn't this unnecessary because of the above > reason? > That is, since walwriter calls AdvanceXLInsertBuffer() with > the second argument "opportunistic" = true via XLogBackgroundFlush(), > the event of full wal_buffers will never happen. No? Right, I fixed it. >>> >>> The logical rep walsender can generate WAL and call >>> AdvanceXLInsertBuffer() when it executes the replication commands >>> like >>> CREATE_REPLICATION_SLOT. But this case is already covered by >>> pgstat_report_activity()->pgstat_send_wal() called in PostgresMain(), >>> with your patch. So no more calls to pgstat_send_wal() seems >>> necessary >>> for logical rep walsender. >> >> Thanks for your reviews. I didn't notice that. >> I updated the patches. > > Sorry, the above my analysis might be incorrect. During logical > replication, > walsender may access to the system table. Which may cause HOT pruning > or killing of dead index tuple. Also which can cause WAL and > full wal_buffers event. Thought? Thanks. I confirmed that it causes HOT pruning or killing of dead index tuple if DecodeCommit() is called. As you said, DecodeCommit() may access the system table. WalSndLoop() -> XLogSendLogical() -> LogicalDecodingProcessRecord() -> DecodeXactOp() -> DecodeCommit() -> ReorderBufferCommit() -> ReorderBufferProcessTXN() -> RelidByRelfilenode() -> systable_getnext() The wals are generated only when logical replication is performed. So, I added pgstat_send_wal() in XLogSendLogical(). But, I concerned that it causes poor performance since pgstat_send_wal() is called per wal record, Is it necessary to introduce a mechanism to send in bulk? But I worried about how to implement is best. Is it good to send wal statistics per X recoreds? I think there are other background processes that access the system tables, so I organized which process must send wal metrics and added pgstat_send_wal() to the main loop of some background processes for example, autovacuum launcher, logical replication launcher, and logical replication worker's one. (*) [x]: it needs to send it [ ]: it don't need to send it * [ ] postmaster * [ ] background writer * [x] checkpointer: it generates wal for checkpoint. * [ ] walwriter * [x] autovacuum launcher: it accesses to the system tables to get the database list. * [x] autovacuum worker: it generates wal for vacuum. * [ ] stats collector * [x] backend: it generate wal for query execution. * [ ] startup * [ ] archiver * [x] walsender: it accesses to the system tables if logical replication is performed. * [ ] walreceiver * [x] logical replication launcher: it accesses to the system tables to get the subscription list. * [x] logical replication worker: it accesses to the system tables to get oid from relname. * [x] parallel worker: it generates wal for query execution. If my understanding is wrong, please let me know. Regards, -- Masahiro Ikeda NTT DATA CORPORATION
Attachment
pgsql-hackers by date: