Home > mailing lists
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

From	Melanie Plageman
Subject	Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Date	October 19, 2022 19:26:51
Msg-id	CAAKRu_agmNMyMbfffGegbgFpUUTL=k3U64n_v0gLQGW5DeDkhg@mail.gmail.com Whole thread Raw
In response to	Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) (Maciek Sakrejda <m.sakrejda@gmail.com>)
Responses	Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
List	pgsql-hackers
Tree view
v34 is attached.
I think the column names need discussion. Also, the docs need more work
(I added a lot of new content there). I could use feedback on the column
names and definitions and review/rephrasing ideas for the docs
additions.

On Mon, Oct 17, 2022 at 1:28 AM Maciek Sakrejda <m.sakrejda@gmail.com> wrote:
>
> On Thu, Oct 13, 2022 at 10:29 AM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
> > I think that it makes sense to count both the initial buffers added to
> > the ring and subsequent shared buffers added to the ring (either when
> > the current strategy buffer is pinned or in use or when a bulkread
> > rejects dirty strategy buffers in favor of new shared buffers) as
> > strategy clocksweeps because of how the statistic would be used.
> >
> > Clocksweeps give you an idea of how much of your working set is cached
> > (setting aside initially reading data into shared buffers when you are
> > warming up the db). You may use clocksweeps to determine if you need to
> > make shared buffers larger.
> >
> > Distinguishing strategy buffer clocksweeps from shared buffer
> > clocksweeps allows us to avoid enlarging shared buffers if most of the
> > clocksweeps are to bring in blocks for the strategy operation.
> >
> > However, I could see an argument that discounting strategy clocksweeps
> > done because the current strategy buffer is pinned makes the number of
> > shared buffer clocksweeps artificially low since those other queries
> > using the buffer would have suffered a cache miss were it not for the
> > strategy. And, in this case, you would take strategy clocksweeps
> > together with shared clocksweeps to make your decision. And if we
> > include buffers initially added to the strategy ring in the strategy
> > clocksweep statistic, this number may be off because those blocks may
> > not be needed in the main shared working set. But you won't know that
> > until you try to reuse the buffer and it is pinned. So, I think we don't
> > have a better option than counting initial buffers added to the ring as
> > strategy clocksweeps (as opposed to as reuses).
> >
> > So, in answer to your question, no, I cannot think of a scenario like
> > that.
>
> That analysis makes sense to me; thanks.

I have made some major changes in this area to make the columns more
useful. I have renamed and split "clocksweeps". It is now "evicted" and
"freelist acquired". This makes it clear when a block must be evicted
from a shared buffer must be and may help to identify misconfiguration
of shared buffers.

There is some nuance here that I tried to make clear in the docs.
"freelist acquired" in a shared context is straightforward.
"freelist acquired" in a strategy context is counted when a shared
buffer is added to the strategy ring (not when it is reused).

"freelist acquired" in the local buffer context is actually the initial
allocation of a local buffer (in contrast with reuse).

"evicted" in the shared IOContext is a block being evicted from a shared
buffer in order to reuse that buffer when not using a strategy.

"evicted" in a strategy IOContext is a block being evicted from
a shared buffer in order to add that shared buffer to the strategy ring.

This is in contrast with "reused" in a strategy IOContext which is when
an existing buffer in the strategy ring has a block evicted in order to
reuse that buffer in a strategy context.

"evicted" in a local IOContext is when an existing local buffer has a
block evicted in order to reuse that local buffer.

"freelist_acquired" is confusing for local buffers but I wanted to
distinguish between reuse/eviction of local buffers and initial
allocation. "freelist_acquired" seemed more fitting because there is a
clocksweep to find a local buffer and if it hasn't been allocated yet it
is allocated in a place similar to where shared buffers acquire a buffer
from the freelist. If I didn't count it here, I would need to make a new
column only for local buffers called "allocated" or something like that.

I chose not to call "evicted" "sb_evicted"
because then we would need a separate "local_evicted". I could instead
make "local_evicted", "sb_evicted", and rename "reused" to
"strat_evicted". If I did that we would end up with separate columns for
every IO Context describing behavior when a buffer is initially acquired
vs when it is reused.

It would look something like this:

shared buffers:
    initial: freelist_acquired
    reused: sb_evicted

local buffers:
    initial: allocated
    reused: local_evicted

strategy buffers:
    initial: sb_evicted | freelist_acquired
    reused: strat_evicted
    replaced: sb_evicted | freelist_acquired

This seems not too bad at first, but if you consider that later we will
add other kinds of IO -- eg WAL IO or temporary file IO, we won't be
able to use these existing columns and will need to add even more
columns describing the exact behavior in those cases.

I wanted to devise a paradigm which allowed for reuse of columns across
IOContexts even if with slightly different meanings.

I have also added the columns "repossessed" and "rejected". "rejected"
is when a bulkread rejects a strategy buffer because it is dirty and
requires flush. Seeing a lot of rejections could indicate you need to
vacuum. "repossessed" is the number of times a strategy buffer was
pinned or in use by another backend and had to be removed from the
strategy ring and replaced with a new shared buffer. This gives you some
indication that there is contention on blocks recently used by a
strategy.

I've also added some descriptions to the docs of how these columns might
be used or what a large value in one of them may mean.

I haven't added tests for repossessed or rejected yet. I can add tests
for repossessed if we decide to keep it. Rejected is hard to write a
test for because we can't guarantee checkpointer won't clean up the
buffer before we can reject it

>
> > It also made me remember that I am incorrectly counting rejected buffers
> > as reused. I'm not sure if it is a good idea to subtract from reuses
> > when a buffer is rejected. Waiting until after it is rejected to count
> > the reuse will take some other code changes. Perhaps we could also count
> > rejections in the stats?
>
> I'm not sure what makes sense here.

I have fixed the counting of rejected and have made a new column
dedicated to rejected.

>
> > > From the io_context column description:
> > >
> > > +       The autovacuum daemon, explicit <command>VACUUM</command>,
> > > explicit
> > > +       <command>ANALYZE</command>, many bulk reads, and many bulk
> > > writes use a
> > > +       fixed amount of memory, acquiring the equivalent number of
> > > shared
> > > +       buffers and reusing them circularly to avoid occupying an
> > > undue portion
> > > +       of the main shared buffer pool.
> > > +      </para></entry>
> > >
> > > I don't understand how this is relevant to the io_context column.
> > > Could you expand on that, or am I just missing something obvious?
> > >
> >
> > I'm trying to explain why those other IO Contexts exist (bulkread,
> > bulkwrite, vacuum) and why they are separate from shared buffers.
> > Should I cut it altogether or preface it with something like: these are
> > counted separate from shared buffers because...?
>
> Oh I see. That makes sense; it just wasn't obvious to me this was
> talking about the last three values of io_context. I think a brief
> preface like that would be helpful (maybe explicitly with "these last
> three values", and I think "counted separately").

I've done this. Thanks for the suggested wording.

>
> > > +     <row>
> > > +      <entry role="catalog_table_entry"><para
> > > role="column_definition">
> > > +       <structfield>extended</structfield> <type>bigint</type>
> > > +      </para>
> > > +      <para>
> > > +       Extends of relations done by this
> > > <varname>backend_type</varname> in
> > > +       order to write data in this <varname>io_context</varname>.
> > > +      </para></entry>
> > > +     </row>
> > >
> > > I understand what this is, but not why this is something I might want
> > > to know about.
> >
> > Unlike writes, backends largely have to do their own extends, so
> > separating this from writes lets us determine whether or not we need to
> > change checkpointer/bgwriter to be more aggressive using the writes
> > without the distraction of the extends. Should I mention this in the
> > docs? The other stats views don't seems to editorialize at all, and I
> > wasn't sure if this was an objective enough point to include in docs.
>
> Thanks for the clarification. Just to make sure I understand, you mean
> that if I see a high extended count, that may be interesting in terms
> of write activity, but I can't fix that by tuning--it's just the
> nature of my workload?

That is correct.

>
> > > That seems broadly reasonable, but pg_settings also has a 'unit'
> > > field, and in that view, unit is '8kB' on my system--i.e., it
> > > (presumably) reflects the block size. Is that something we should try
> > > to be consistent with (not sure if that's a good idea, but thought it
> > > was worth asking)?
> > >
> >
> > I think this idea is a good option. I am wondering if it would be clear
> > when mixed with non-block-oriented IO. Block-oriented IO would say 8kB
> > (or whatever the build-time value of a block was) and non-block-oriented
> > IO would say B or kB. The math would work out.
>
> Right, yeah. Although maybe that's a little confusing? When you
> originally added "unit", you had said:
>
> >The most correct thing to do to accommodate block-oriented and
> >non-block-oriented IO would be to specify all the values in bytes.
> >However, I would like this view to be usable visually (as opposed to
> >just in scripts and by tools). The only current value of unit is
> >"block_size" which could potentially be combined with the value of the
> >GUC to get bytes.
>
> Is this still usable visually if you have to compare values across
> units? I don't really have any great ideas here (and maybe this is
> still the best option), just pointing it out.
>
> > Looking at pg_settings now though, I am confused about
> > how the units for wal_buffers is 8kB but then the value of wal_buffers
> > when I show it in psql is "16MB"...
>
> You mean the difference between
>
> maciek=# select setting, unit from pg_settings where name = 'wal_buffers';
>  setting | unit
> ---------+------
>  512     | 8kB
> (1 row)
>
> and
>
> maciek=# show wal_buffers;
>  wal_buffers
> -------------
>  4MB
> (1 row)
>
> ?
>
> Poking around, I think it looks like that's due to
> convert_int_from_base_unit (indirectly called from SHOW /
> current_setting):
>
> /*
>  * Convert an integer value in some base unit to a human-friendly
> unit.
>  *
>  * The output unit is chosen so that it's the greatest unit that can
> represent
>  * the value without loss.  For example, if the base unit is
> GUC_UNIT_KB, 1024
>  * is converted to 1 MB, but 1025 is represented as 1025 kB.
>  */

I've implemented a change using the same function pg_settings uses to
turn the build-time parameter BLCKSZ into 8kB (get_config_unit_name())
using the flag GUC_UNIT_BLOCKS. I am unsure if this is better or worse
than "block_size". I am feeling very conflicted about this column.

>
> > Though the units for the pg_stat_io view for block-oriented IO would be
> > the build-time values for block size, so it wouldn't line up exactly
> > with pg_settings.
>
> I don't follow--what would be the discrepancy?

I got confused.
You are right -- pg_settings does seem to use the build-time value of
BLCKSZ to derive this. I was confused because the description of
pg_settings says:

"The view pg_settings provides access to run-time parameters of the server."

- Melanie
Attachment

pgsql-hackers by date:
From: Andres Freund
Date: 19 October 2022, 19:21:30
Subject: Re: problems with making relfilenodes 56-bits
From: Peter Geoghegan
Date: 19 October 2022, 19:37:30
Subject: Re: shared memory stats ideas
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers

Attachment

Previous

Next