Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Date | |
Msg-id | CAAKRu_bwAmyQ=3SfxG3Jb8rRUBjoMwJtT99D6gG5fqqLmVpCQA@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
List | pgsql-hackers |
v18 attached. On Thu, Dec 9, 2021 at 2:17 PM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote: > > From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplageman@gmail.com> > > Date: Mon, 11 Oct 2021 16:15:06 -0400 > > Subject: [PATCH v17 1/7] Read-only atomic backend write function > > > > For counters in shared memory which can be read by any backend but only > > written to by one backend, an atomic is still needed to protect against > > torn values, however, pg_atomic_fetch_add_u64() is overkill for > > incrementing the counter. pg_atomic_inc_counter() is a helper function > > which can be used to increment these values safely but without > > unnecessary overhead. > > > > Author: Thomas Munro > > --- > > src/include/port/atomics.h | 11 +++++++++++ > > 1 file changed, 11 insertions(+) > > > > diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h > > index 856338f161..39ffff24dd 100644 > > --- a/src/include/port/atomics.h > > +++ b/src/include/port/atomics.h > > @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_) > > return pg_atomic_sub_fetch_u64_impl(ptr, sub_); > > } > > > > +/* > > + * On modern systems this is really just *counter++. On some older systems > > + * there might be more to it, due to inability to read and write 64 bit values > > + * atomically. > > + */ > > +static inline void > > +pg_atomic_inc_counter(pg_atomic_uint64 *counter) > > +{ > > + pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1); > > +} > > I wonder if it's worth putting something in the name indicating that this is > not actual atomic RMW operation. Perhaps adding _unlocked? > Done. > > > From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplageman@gmail.com> > > Date: Wed, 24 Nov 2021 10:32:56 -0500 > > Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus > > > > Add an array of counters in PgBackendStatus which count the buffers > > allocated, extended, fsynced, and written by a given backend. Each "IO > > Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct, > > local, shared, or strategy). "local" and "shared" IO Path counters count > > operations on local and shared buffers. The "strategy" IO Path counts > > buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy. > > The "direct" IO Path counts blocks of IO which are read, written, or > > fsync'd using smgrwrite/extend/immedsync directly (as opposed to through > > [Local]BufferAlloc()). > > > > With this commit, all backends increment a counter in their > > PgBackendStatus when performing an IO operation. This is in preparation > > for future commits which will persist these stats upon backend exit and > > use the counters to provide observability of database IO operations. > > > > Note that this commit does not add code to increment the "direct" path. > > A separate proposed patch [1] which would add wrappers for smgrwrite(), > > smgrextend(), and smgrimmedsync() would provide a good location to call > > pgstat_inc_ioop() for unbuffered IO and avoid regressions for future > > users of these functions. > > > > [1] https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com > > On longer thread it's nice for committers to already have Reviewed-By: in the > commit message. Done. > > diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c > > index 7229598822..413cc605f8 100644 > > --- a/src/backend/utils/activity/backend_status.c > > +++ b/src/backend/utils/activity/backend_status.c > > @@ -399,6 +399,15 @@ pgstat_bestart(void) > > lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID; > > lbeentry.st_progress_command_target = InvalidOid; > > lbeentry.st_query_id = UINT64CONST(0); > > + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++) > > + { > > + IOOps *io_ops = &lbeentry.io_path_stats[io_path]; > > + > > + pg_atomic_init_u64(&io_ops->allocs, 0); > > + pg_atomic_init_u64(&io_ops->extends, 0); > > + pg_atomic_init_u64(&io_ops->fsyncs, 0); > > + pg_atomic_init_u64(&io_ops->writes, 0); > > + } > > > > /* > > * we don't zero st_progress_param here to save cycles; nobody should > > nit: I think we nearly always have a blank line before loops Done. > > diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c > > index 646126edee..93f1b4bcfc 100644 > > --- a/src/backend/utils/init/postinit.c > > +++ b/src/backend/utils/init/postinit.c > > @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, > > RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler); > > } > > > > + pgstat_beinit(); > > /* > > * Initialize local process's access to XLOG. > > */ > > nit: same with multi-line comments. Done. > > @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, > > */ > > CreateAuxProcessResourceOwner(); > > > > + pgstat_bestart(); > > StartupXLOG(); > > /* Release (and warn about) any buffer pins leaked in StartupXLOG */ > > ReleaseAuxProcessResources(true); > > @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username, > > EnablePortalManager(); > > > > /* Initialize status reporting */ > > - pgstat_beinit(); > > I'd like to see changes like moving this kind of thing around broken around > and committed separately. It's much easier to pinpoint breakage if the CF > breaks after moving just pgstat_beinit() around, rather than when committing > this considerably larger patch. And reordering subsystem initialization has > the habit of causing problems... Done > > +/* ---------- > > + * IO Stats reporting utility types > > + * ---------- > > + */ > > + > > +typedef enum IOOp > > +{ > > + IOOP_ALLOC, > > + IOOP_EXTEND, > > + IOOP_FSYNC, > > + IOOP_WRITE, > > +} IOOp; > > [...] > > +/* > > + * Structure for counting all types of IOOps for a live backend. > > + */ > > +typedef struct IOOps > > +{ > > + pg_atomic_uint64 allocs; > > + pg_atomic_uint64 extends; > > + pg_atomic_uint64 fsyncs; > > + pg_atomic_uint64 writes; > > +} IOOps; > > To me IOop and IOOps sound to much alike - even though they're really kind of > separate things. s/IOOps/IOOpCounters/ maybe? Done. > > @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg) > > { > > Assert(!pgstat_is_shutdown); > > > > + /* > > + * Only need to send stats on IO Ops for IO Paths when a process exits. > > + * Users requiring IO Ops for both live and exited backends can read from > > + * live backends' PgBackendStatus and sum this with totals from exited > > + * backends persisted by the stats collector. > > + */ > > + pgstat_send_buffers(); > > Perhaps something like this comment belongs somewhere at the top of the file, > or in the header, or ...? It's a fairly central design piece, and it's not > obvious one would need to look in the shutdown hook for it? > now in pgstat.h above the declaration of pgstat_send_buffers() > > +/* > > + * Before exiting, a backend sends its IO op statistics to the collector so > > + * that they may be persisted. > > + */ > > +void > > +pgstat_send_buffers(void) > > +{ > > + PgStat_MsgIOPathOps msg; > > + > > + PgBackendStatus *beentry = MyBEEntry; > > + > > + /* > > + * Though some backends with type B_INVALID (such as the single-user mode > > + * process) do initialize and increment IO operations stats, there is no > > + * spot in the array of IO operations for backends of type B_INVALID. As > > + * such, do not send these to the stats collector. > > + */ > > + if (!beentry || beentry->st_backendType == B_INVALID) > > + return; > > Why does single user mode use B_INVALID? That doesn't seem quite right. I think PgBackendStatus->st_backendType is set from MyBackendType which isn't set for the single user mode process. What BackendType would you expect to see? > > + memset(&msg, 0, sizeof(msg)); > > + msg.backend_type = beentry->st_backendType; > > + > > + pgstat_sum_io_path_ops(msg.iop.io_path_ops, > > + (IOOps *) &beentry->io_path_stats); > > + > > + pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS); > > + pgstat_send(&msg, sizeof(msg)); > > +} > > It seems worth having a path skipping sending the message if there was no IO? Makes sense. I've updated pgstat_send_buffers() to do a loop after calling pgstat_sum_io_path_ops() and check if it should skip sending. I also thought about having pgstat_sum_io_path_ops() return a value to indicate if everything was 0 -- which could be useful to future callers potentially? I didn't do this because I am not sure what the return value would be. It could be a bool and be true if any IO was done and false if none was done -- but that doesn't really make sense given the function's name it would be called like if (!pgstat_sum_io_path_ops()) return which I'm not sure is very clear > > +/* > > + * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared, > > + * local) to those in the equivalent stats structure for exited backends. Note > > + * that this adds and doesn't set, so the destination stats structure should be > > + * zeroed out by the caller initially. This would commonly be used to transfer > > + * all IO Op stats for all IO Paths for a particular backend type to the > > + * pgstats structure. > > + */ > > +void > > +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src) > > +{ > > + for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++) > > + { > > Sacriligeous, but I find io_path a harder to understand variable name for the > counter than i (or io_path_off or ...) ;) I've updated almost all my non-standard loop index variable names. > > +static void > > +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len) > > +{ > > + PgStatIOOps *src_io_path_ops; > > + PgStatIOOps *dest_io_path_ops; > > + > > + /* > > + * Subtract 1 from message's BackendType to get a valid index into the > > + * array of IO Ops which does not include an entry for B_INVALID > > + * BackendType. > > + */ > > + Assert(msg->backend_type > B_INVALID); > > Probably worth also asserting the upper boundary? Done. > > From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplageman@gmail.com> > > Date: Wed, 24 Nov 2021 11:39:48 -0500 > > Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters > > > > Backends count IO operations for various IO paths in their PgBackendStatus. > > Upon exit, they send these counts to the stats collector. Prior to this commit, > > these IO Ops stats would have been reset when the target was "bgwriter". > > > > With this commit, target "bgwriter" no longer will cause the IO operations > > stats to be reset, and the IO operations stats can be reset with new target, > > "buffers". > > --- > > doc/src/sgml/monitoring.sgml | 2 +- > > src/backend/postmaster/pgstat.c | 83 +++++++++++++++++++-- > > src/backend/utils/activity/backend_status.c | 29 +++++++ > > src/include/pgstat.h | 8 +- > > src/include/utils/backend_status.h | 2 + > > 5 files changed, 117 insertions(+), 7 deletions(-) > > > > diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml > > index 62f2a3332b..bda3eef309 100644 > > --- a/doc/src/sgml/monitoring.sgml > > +++ b/doc/src/sgml/monitoring.sgml > > @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i > > <structfield>stats_reset</structfield> <type>timestamp with time zone</type> > > </para> > > <para> > > - Time at which these statistics were last reset > > + Time at which these statistics were last reset. > > </para></entry> > > </row> > > </tbody> > > Hm? > > Shouldn't this new reset target be documented? It is in the commit adding the view. I didn't include it in this commit because the pg_stat_buffers view doesn't exist yet, as of this commit, and I thought it would be odd to mention it in the docs (in this commit). As an aside, I shouldn't have left this correction in this commit. I moved it now to the other one. > > +/* > > + * Helper function to collect and send live backends' current IO operations > > + * stats counters when a stats reset is initiated so that they may be deducted > > + * from future totals. > > + */ > > +static void > > +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg) > > +{ > > + PgStatIOPathOps ops[BACKEND_NUM_TYPES]; > > + > > + memset(ops, 0, sizeof(ops)); > > + pgstat_report_live_backend_io_path_ops(ops); > > + > > + /* > > + * Iterate through the array of IO Ops for all IO Paths for each > > + * BackendType. Because the array does not include a spot for BackendType > > + * B_INVALID, add 1 to the index when setting backend_type so that there is > > + * no confusion as to the BackendType with which this reset message > > + * corresponds. > > + */ > > + for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; backend_type_idx++) > > + { > > + msg->m_backend_resets.backend_type = backend_type_idx + 1; > > + memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx], > > + sizeof(msg->m_backend_resets.iop)); > > + pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter)); > > + } > > +} > > Probably worth explaining why multiple messages are sent? Done. > > @@ -5583,10 +5621,45 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len) > > { > > if (msg->m_resettarget == RESET_BGWRITER) > > { > > - /* Reset the global, bgwriter and checkpointer statistics for the cluster. */ > > - memset(&globalStats, 0, sizeof(globalStats)); > > + /* > > + * Reset the global bgwriter and checkpointer statistics for the > > + * cluster. > > + */ > > + memset(&globalStats.checkpointer, 0, sizeof(globalStats.checkpointer)); > > + memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter)); > > globalStats.bgwriter.stat_reset_timestamp = GetCurrentTimestamp(); > > } > > Oh, is this a live bug? I don't think it is a bug. globalStats only contained bgwriter and checkpointer stats and those were all only displayed in pg_stat_bgwriter(), so memsetting the whole thing seems fine. > > + /* > > + * Subtract 1 from the BackendType to arrive at a valid index in the > > + * array, as it does not contain a spot for B_INVALID BackendType. > > + */ > > Instead of repeating a comment about +- 1 in a bunch of places, would it look > better to have two helper inline functions for this purpose? Done. > > +/* > > +* When adding a new column to the pg_stat_buffers view, add a new enum > > +* value here above COLUMN_LENGTH. > > +*/ > > +enum > > +{ > > + COLUMN_BACKEND_TYPE, > > + COLUMN_IO_PATH, > > + COLUMN_ALLOCS, > > + COLUMN_EXTENDS, > > + COLUMN_FSYNCS, > > + COLUMN_WRITES, > > + COLUMN_RESET_TIME, > > + COLUMN_LENGTH, > > +}; > > COLUMN_LENGTH seems like a fairly generic name... Changed. > > From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001 > > From: Melanie Plageman <melanieplageman@gmail.com> > > Date: Wed, 24 Nov 2021 12:20:10 -0500 > > Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats > > > > Remove stats from pg_stat_bgwriter which are now more clearly expressed > > in pg_stat_buffers. > > > > TODO: > > - make pg_stat_checkpointer view and move relevant stats into it > > - add additional stats to pg_stat_bgwriter > > When do you think it makes sense to tackle these wrt committing some of the > patches? Well, the new stats are a superset of the old stats (no stats have been removed that are not represented in the new or old views). So, I don't see that as a blocker for committing these patches. Since it is weird that pg_stat_bgwriter had mostly checkpointer stats, I've edited this commit to rename that view to pg_stat_checkpointer. I have not made a separate view just for maxwritten_clean (presumably called pg_stat_bgwriter), but I would not be opposed to doing this if you thought having a view with a single column isn't a problem (in the event that we don't get around to adding more bgwriter stats right away). I noticed after changing the docs on the "bgwriter" target for pg_stat_reset_shared to say "checkpointer", that it still said "bgwriter" in src/backend/po/ko.po src/backend/po/it.po ... I presume these are automatically updated with some incantation, but I wasn't sure what it was nor could I find documentation on this. > > diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c > > index 6926fc5742..67447f997a 100644 > > --- a/src/backend/storage/buffer/bufmgr.c > > +++ b/src/backend/storage/buffer/bufmgr.c > > @@ -2164,7 +2164,6 @@ BufferSync(int flags) > > if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN) > > { > > TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id); > > - PendingCheckpointerStats.m_buf_written_checkpoints++; > > num_written++; > > } > > } > > @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context) > > */ > > strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc); > > > > - /* Report buffer alloc counts to pgstat */ > > - PendingBgWriterStats.m_buf_alloc += recent_alloc; > > - > > /* > > * If we're not running the LRU scan, just stop after doing the stats > > * stuff. We mark the saved state invalid so that we can recover sanely > > @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context) > > reusable_buffers++; > > } > > > > - PendingBgWriterStats.m_buf_written_clean += num_written; > > - > > Isn't num_written unused now, unless tracepoints are enabled? I'd expect some > compilers to warn... Perhaps we should just remove information from the > tracepoint? The local variable num_written is used in BgBufferSync() to determine whether or not to increment maxwritten_clean which is still represented in the view pg_stat_checkpointer (formerly pg_stat_bgwriter). A local variable num_written is used in BufferSync() to increment CheckpointStats.ckpt_bufs_written which is logged in LogCheckpointEnd(), so I'm not sure that can be removed. - Melanie
Attachment
- v18-0008-small-comment-correction.patch
- v18-0005-Add-buffers-to-pgstat_reset_shared_counters.patch
- v18-0007-Remove-superfluous-bgwriter-stats.patch
- v18-0004-Send-IO-operations-to-stats-collector.patch
- v18-0006-Add-system-view-tracking-IO-ops-per-backend-type.patch
- v18-0003-Add-IO-operation-counters-to-PgBackendStatus.patch
- v18-0001-Read-only-atomic-backend-write-function.patch
- v18-0002-Move-backend-pgstat-initialization-earlier.patch
pgsql-hackers by date: