Home > mailing lists
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date	January 13, 2023 02:19:36
Msg-id	CAAKRu_bTjvHSd5tWWQkW8SWY-5frVp_vOJpFD+-V7jJG1Rf_Dw@mail.gmail.com Whole thread Raw
In response to	Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) (Justin Pryzby <pryzby@telsasoft.com>)
Responses	Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
List	pgsql-hackers
Tree view
Attached is v46.

On Wed, Dec 28, 2022 at 6:56 PM Andres Freund <andres@anarazel.de> wrote:
> On 2022-10-06 13:42:09 -0400, Melanie Plageman wrote:
> > > Additionally, some minor notes:
> > >
> > > - Since the stats are counting blocks, it would make sense to prefix the view columns with "blks_", and word them
inthe past tense (to match current style), i.e. "blks_written", "blks_read", "blks_extended", "blks_fsynced"
(realisticallyone would combine this new view with other data e.g. from pg_stat_database or pg_stat_statements, which
alluse the "blks_" prefix, and stop using pg_stat_bgwriter for this which does not use such a prefix) 
> >
> > I have changed the column names to be in the past tense.
>
> For a while I was convinced by the consistency argument (after Melanie
> pointing it out to me). But the more I look, the less convinced I am. The
> existing IO related stats in pg_stat_database, pg_stat_bgwriter aren't past
> tense, just the ones in pg_stat_statements. pg_stat_database uses past tense
> for tup_*, but not xact_*, deadlocks, checksum_failures etc.
>
> And even pg_stat_statements isn't consistent about it - otherwise it'd be
> 'planned' instead of 'plans', 'called' instead of 'calls' etc.
>
> I started to look at the naming "tense" issue again, after I got "confused"
> about "extended", because that somehow makes me think about more detailed
> stats or such, rather than files getting extended.
>
> ISTM that 'evictions', 'extends', 'fsyncs', 'reads', 'reuses', 'writes' are
> clearer than the past tense versions, and about as consistent with existing
> columns.

I have updated the column names to the above recommendation.

On Wed, Jan 11, 2023 at 11:32 AM vignesh C <vignesh21@gmail.com> wrote:
>
> For some reason cfbot is not able to apply this patch as in [1],
> please have a look and post an updated patch if required:
> === Applying patches on top of PostgreSQL commit ID
> 3c6fc58209f24b959ee18f5d19ef96403d08f15c ===
> === applying patch
> ./v45-0001-pgindent-and-some-manual-cleanup-in-pgstat-relat.patch
> patching file src/backend/storage/buffer/bufmgr.c
> patching file src/backend/storage/buffer/localbuf.c
> patching file src/backend/utils/activity/pgstat.c
> patching file src/backend/utils/activity/pgstat_relation.c
> patching file src/backend/utils/adt/pgstatfuncs.c
> patching file src/include/pgstat.h
> patching file src/include/utils/pgstat_internal.h
> === applying patch ./v45-0002-pgstat-Infrastructure-to-track-IO-operations.patch
> gpatch: **** Only garbage was found in the patch input.
>
> [1] - http://cfbot.cputube.org/patch_41_3272.log
>

This was an issue with cfbot that Thomas has now fixed as he describes
in [1].

On Wed, Jan 11, 2023 at 4:58 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> > Subject: [PATCH v45 4/5] Add system view tracking IO ops per backend type
>
> The patch can/will fail with:
>
> CREATE TABLESPACE test_io_shared_stats_tblspc LOCATION '';
> +WARNING:  tablespaces created by regression test cases should have names starting with "regress_"
>
> CREATE TABLESPACE test_stats LOCATION '';
> +WARNING:  tablespaces created by regression test cases should have names starting with "regress_"
>
> (I already sent patches to address the omission in cirrus.yml)

Thanks. I've fixed this
I make a tablespace in amcheck -- are there recommendations for naming
tablespaces in contrib also?

>
> 1760             :                  errhint("Target must be \"archiver\", \"io\", \"bgwriter\",
\"recovery_prefetch\",or \"wal\"."))); 
> => Do you want to put these in order?

Thanks. Fixed.

> pgstat_get_io_op_name() isn't currently being hit by tests; actually,
> it's completely unused.

Deleted it.

> FlushRelationBuffers() isn't being hit for local buffers.

I added a test.

> > +      <entry><structname>pg_stat_io</structname><indexterm><primary>pg_stat_io</primary></indexterm></entry>
> > +      <entry>
> > +       One row per backend type, context, target object combination showing
> > +       cluster-wide I/O statistics.
>
> I suggest: "One row for each combination of of .."

I have made this change.

> > +   The <structname>pg_stat_io</structname> and
> > +   <structname>pg_statio_</structname> set of views are especially useful for
> > +   determining the effectiveness of the buffer cache.  When the number of actual
> > +   disk reads is much smaller than the number of buffer hits, then the cache is
> > +   satisfying most read requests without invoking a kernel call.
>
> I would change this say "Postgres' own buffer cache is satisfying ..."

So, this is existing copy to which I added the pg_stat_io view name and
re-flowed the indentation.
However, I think your suggestions are a good idea, so I've taken them
and just rewritten this paragraph altogether.

>
> > However, these
> > +   statistics do not give the entire story: due to the way in which
> > +   <productname>PostgreSQL</productname> handles disk I/O, data that is not in
> > +   the <productname>PostgreSQL</productname> buffer cache might still reside in
> > +   the kernel's I/O cache, and might therefore still be fetched without
>
> I suggest to refer to "the kernel's page cache"

same applies here.

>
> > +   The <structname>pg_stat_io</structname> view will contain one row for each
> > +   backend type, I/O context, and target I/O object combination showing
> > +   cluster-wide I/O statistics. Combinations which do not make sense are
> > +   omitted.
>
> "..for each combination of .."

I have changed this.

>
> > +          <varname>io_context</varname> for a type of I/O operation. For
>
> "for I/O operations"

So I actually mean for a type of I/O operation -- that is, relation data
is normally written to a shared buffer but sometimes we bypass shared
buffers and just call write and sometimes we use a buffer access
strategy and write it to a special ring buffer (made up of buffers
stolen from shared buffers, but still). So I don't want to say "for I/O
operations" because I think that would imply that writes of relation
data will always be in the same IO Context.

>
> > +          <literal>vacuum</literal>: I/O operations done outside of shared
> > +          buffers incurred while vacuuming and analyzing permanent relations.
>
> s/incurred/performed/

I changed this.

>
> > +          <literal>bulkread</literal>: Qualifying large read I/O operations
> > +          done outside of shared buffers, for example, a sequential scan of a
> > +          large table.
>
> I don't think it's correct to say that it's "outside of" shared-buffers.

I suppose "outside of" gives the wrong idea. But I need to make clear
that this I/O is to and from buffers which are not a part of shared
buffers right now -- they may still be accessible from the same data
structures which access shared buffers but they are currently being used
in a different way.

> s/Qualifying/Certain/

I feel like qualifying is more specific than certain, but I would be open
to changing it if there was a specific reason you don't like it.

>
> > +          <literal>bulkwrite</literal>: Qualifying large write I/O operations
> > +          done outside of shared buffers, such as <command>COPY</command>.
>
> Same
>
> > +        Target object of an I/O operation. Possible values are:
> > +       <itemizedlist>
> > +        <listitem>
> > +         <para>
> > +          <literal>relation</literal>: This includes permanent relations.
>
> It says "includes permanent" but what seems to mean is that it
> "exclusive of temporary relations".

I've changed this.

>
> > +     <row>
> > +      <entry role="catalog_table_entry">
> > +       <para role="column_definition">
> > +        <structfield>read</structfield> <type>bigint</type>
> > +       </para>
> > +       <para>
> > +        Number of read operations in units of <varname>op_bytes</varname>.
>
> This looks too much like it means "bytes".
> Should say: "in number of blocks of size >op_bytes<"
>
> But wait - is it the number of read operations "in units of op_bytes"
> (which would means this already multiplied by op_bytes, and is in units
> of bytes).
>
> Or the "number of read operations" *of* op_bytes chunks ?  Which would
> mean this is a "pure" number, and could be multipled by op_bytes to
> obtain a size in bytes.

It is the number of read operations of op_bytes size -- thanks so much
for pointing this out. The wording was really unclear.
The idea is that you can do something like:
SELECT pg_size_pretty(reads * op_bytes) FROM pg_stat_io;
and get it in bytes.

The view will contain other types of IO that are not in BLCKSZ chunks,
which is where this column will be handy.

>
> > +        Number of write operations in units of <varname>op_bytes</varname>.
>
> > +        Number of relation extend operations in units of
> > +        <varname>op_bytes</varname>.
>
> same
>
> > +        In <varname>io_context</varname> <literal>normal</literal>, this counts
> > +        the number of times a block was evicted from a buffer and replaced with
> > +        another block. In <varname>io_context</varname>s
> > +        <literal>bulkwrite</literal>, <literal>bulkread</literal>, and
> > +        <literal>vacuum</literal>, this counts the number of times a block was
> > +        evicted from shared buffers in order to add the shared buffer to a
> > +        separate size-limited ring buffer.
>
> This never defines what "evicted" means.  Does it mea that a dirty
> buffer was written out ?

Thanks. I've updated this.

>
> > +        The number of times an existing buffer in a size-limited ring buffer
> > +        outside of shared buffers was reused as part of an I/O operation in the
> > +        <literal>bulkread</literal>, <literal>bulkwrite</literal>, or
> > +        <literal>vacuum</literal> <varname>io_context</varname>s.
>
> Maybe say "as part of a bulk I/O operation (bulkread, bulkwrite, or
> vacuum)."

I've changed this.

>
> > +  <para>
> > +   <structname>pg_stat_io</structname> can be used to inform database tuning.
>
> > +   For example:
> > +   <itemizedlist>
> > +    <listitem>
> > +     <para>
> > +      A high <varname>evicted</varname> count can indicate that shared buffers
> > +      should be increased.
> > +     </para>
> > +    </listitem>
> > +    <listitem>
> > +     <para>
> > +      Client backends rely on the checkpointer to ensure data is persisted to
> > +      permanent storage. Large numbers of <varname>files_synced</varname> by
> > +      <literal>client backend</literal>s could indicate a misconfiguration of
> > +      shared buffers or of checkpointer. More information on checkpointer
>
> of *the* checkpointer
>
> > +      Normally, client backends should be able to rely on auxiliary processes
> > +      like the checkpointer and background writer to write out dirty data as
>
> *the* bg writer
>
> > +      much as possible. Large numbers of writes by client backends could
> > +      indicate a misconfiguration of shared buffers or of checkpointer. More
>
> *the* ckpointer

I've made most of these changes.

> Should this link to various docs for checkpointer/bgwriter ?

I couldn't find docs related to tuning checkpointer outside of the WAL
configuration docs. There is the docs page for the CHECKPOINT command --
but I don't think that is very relevant here.

> Maybe the docs for ALTER/COPY/VACUUM/CREATE/etc should be updated to
> refer to some central description of ring buffers.  Maybe something
> should be included to the appendix.

I agree it would be nice to explain Buffer Access Strategies in the docs.

- Melanie

[1] https://www.postgresql.org/message-id/CA%2BhUKGLiY1e%2B1%3DpB7hXJOyGj1dJOfgde%2BHmiSnv3gDKayUFJMA%40mail.gmail.com
Attachment

pgsql-hackers by date:
From: David Rowley
Date: 13 January 2023, 02:18:20
Subject: Re: Todo: Teach planner to evaluate multiple windows in the optimal order
From: Peter Smith
Date: 13 January 2023, 02:25:43
Subject: Re: Perform streaming logical transactions by background workers and parallel apply
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

Attachment

Previous

Next