Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) - Mailing list pgsql-hackers
From | Melanie Plageman |
---|---|
Subject | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
Date | |
Msg-id | CAAKRu_agmNMyMbfffGegbgFpUUTL=k3U64n_v0gLQGW5DeDkhg@mail.gmail.com Whole thread Raw |
In response to | Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) (Maciek Sakrejda <m.sakrejda@gmail.com>) |
Responses |
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?) |
List | pgsql-hackers |
v34 is attached. I think the column names need discussion. Also, the docs need more work (I added a lot of new content there). I could use feedback on the column names and definitions and review/rephrasing ideas for the docs additions. On Mon, Oct 17, 2022 at 1:28 AM Maciek Sakrejda <m.sakrejda@gmail.com> wrote: > > On Thu, Oct 13, 2022 at 10:29 AM Melanie Plageman > <melanieplageman@gmail.com> wrote: > > I think that it makes sense to count both the initial buffers added to > > the ring and subsequent shared buffers added to the ring (either when > > the current strategy buffer is pinned or in use or when a bulkread > > rejects dirty strategy buffers in favor of new shared buffers) as > > strategy clocksweeps because of how the statistic would be used. > > > > Clocksweeps give you an idea of how much of your working set is cached > > (setting aside initially reading data into shared buffers when you are > > warming up the db). You may use clocksweeps to determine if you need to > > make shared buffers larger. > > > > Distinguishing strategy buffer clocksweeps from shared buffer > > clocksweeps allows us to avoid enlarging shared buffers if most of the > > clocksweeps are to bring in blocks for the strategy operation. > > > > However, I could see an argument that discounting strategy clocksweeps > > done because the current strategy buffer is pinned makes the number of > > shared buffer clocksweeps artificially low since those other queries > > using the buffer would have suffered a cache miss were it not for the > > strategy. And, in this case, you would take strategy clocksweeps > > together with shared clocksweeps to make your decision. And if we > > include buffers initially added to the strategy ring in the strategy > > clocksweep statistic, this number may be off because those blocks may > > not be needed in the main shared working set. But you won't know that > > until you try to reuse the buffer and it is pinned. So, I think we don't > > have a better option than counting initial buffers added to the ring as > > strategy clocksweeps (as opposed to as reuses). > > > > So, in answer to your question, no, I cannot think of a scenario like > > that. > > That analysis makes sense to me; thanks. I have made some major changes in this area to make the columns more useful. I have renamed and split "clocksweeps". It is now "evicted" and "freelist acquired". This makes it clear when a block must be evicted from a shared buffer must be and may help to identify misconfiguration of shared buffers. There is some nuance here that I tried to make clear in the docs. "freelist acquired" in a shared context is straightforward. "freelist acquired" in a strategy context is counted when a shared buffer is added to the strategy ring (not when it is reused). "freelist acquired" in the local buffer context is actually the initial allocation of a local buffer (in contrast with reuse). "evicted" in the shared IOContext is a block being evicted from a shared buffer in order to reuse that buffer when not using a strategy. "evicted" in a strategy IOContext is a block being evicted from a shared buffer in order to add that shared buffer to the strategy ring. This is in contrast with "reused" in a strategy IOContext which is when an existing buffer in the strategy ring has a block evicted in order to reuse that buffer in a strategy context. "evicted" in a local IOContext is when an existing local buffer has a block evicted in order to reuse that local buffer. "freelist_acquired" is confusing for local buffers but I wanted to distinguish between reuse/eviction of local buffers and initial allocation. "freelist_acquired" seemed more fitting because there is a clocksweep to find a local buffer and if it hasn't been allocated yet it is allocated in a place similar to where shared buffers acquire a buffer from the freelist. If I didn't count it here, I would need to make a new column only for local buffers called "allocated" or something like that. I chose not to call "evicted" "sb_evicted" because then we would need a separate "local_evicted". I could instead make "local_evicted", "sb_evicted", and rename "reused" to "strat_evicted". If I did that we would end up with separate columns for every IO Context describing behavior when a buffer is initially acquired vs when it is reused. It would look something like this: shared buffers: initial: freelist_acquired reused: sb_evicted local buffers: initial: allocated reused: local_evicted strategy buffers: initial: sb_evicted | freelist_acquired reused: strat_evicted replaced: sb_evicted | freelist_acquired This seems not too bad at first, but if you consider that later we will add other kinds of IO -- eg WAL IO or temporary file IO, we won't be able to use these existing columns and will need to add even more columns describing the exact behavior in those cases. I wanted to devise a paradigm which allowed for reuse of columns across IOContexts even if with slightly different meanings. I have also added the columns "repossessed" and "rejected". "rejected" is when a bulkread rejects a strategy buffer because it is dirty and requires flush. Seeing a lot of rejections could indicate you need to vacuum. "repossessed" is the number of times a strategy buffer was pinned or in use by another backend and had to be removed from the strategy ring and replaced with a new shared buffer. This gives you some indication that there is contention on blocks recently used by a strategy. I've also added some descriptions to the docs of how these columns might be used or what a large value in one of them may mean. I haven't added tests for repossessed or rejected yet. I can add tests for repossessed if we decide to keep it. Rejected is hard to write a test for because we can't guarantee checkpointer won't clean up the buffer before we can reject it > > > It also made me remember that I am incorrectly counting rejected buffers > > as reused. I'm not sure if it is a good idea to subtract from reuses > > when a buffer is rejected. Waiting until after it is rejected to count > > the reuse will take some other code changes. Perhaps we could also count > > rejections in the stats? > > I'm not sure what makes sense here. I have fixed the counting of rejected and have made a new column dedicated to rejected. > > > > From the io_context column description: > > > > > > + The autovacuum daemon, explicit <command>VACUUM</command>, > > > explicit > > > + <command>ANALYZE</command>, many bulk reads, and many bulk > > > writes use a > > > + fixed amount of memory, acquiring the equivalent number of > > > shared > > > + buffers and reusing them circularly to avoid occupying an > > > undue portion > > > + of the main shared buffer pool. > > > + </para></entry> > > > > > > I don't understand how this is relevant to the io_context column. > > > Could you expand on that, or am I just missing something obvious? > > > > > > > I'm trying to explain why those other IO Contexts exist (bulkread, > > bulkwrite, vacuum) and why they are separate from shared buffers. > > Should I cut it altogether or preface it with something like: these are > > counted separate from shared buffers because...? > > Oh I see. That makes sense; it just wasn't obvious to me this was > talking about the last three values of io_context. I think a brief > preface like that would be helpful (maybe explicitly with "these last > three values", and I think "counted separately"). I've done this. Thanks for the suggested wording. > > > > + <row> > > > + <entry role="catalog_table_entry"><para > > > role="column_definition"> > > > + <structfield>extended</structfield> <type>bigint</type> > > > + </para> > > > + <para> > > > + Extends of relations done by this > > > <varname>backend_type</varname> in > > > + order to write data in this <varname>io_context</varname>. > > > + </para></entry> > > > + </row> > > > > > > I understand what this is, but not why this is something I might want > > > to know about. > > > > Unlike writes, backends largely have to do their own extends, so > > separating this from writes lets us determine whether or not we need to > > change checkpointer/bgwriter to be more aggressive using the writes > > without the distraction of the extends. Should I mention this in the > > docs? The other stats views don't seems to editorialize at all, and I > > wasn't sure if this was an objective enough point to include in docs. > > Thanks for the clarification. Just to make sure I understand, you mean > that if I see a high extended count, that may be interesting in terms > of write activity, but I can't fix that by tuning--it's just the > nature of my workload? That is correct. > > > > That seems broadly reasonable, but pg_settings also has a 'unit' > > > field, and in that view, unit is '8kB' on my system--i.e., it > > > (presumably) reflects the block size. Is that something we should try > > > to be consistent with (not sure if that's a good idea, but thought it > > > was worth asking)? > > > > > > > I think this idea is a good option. I am wondering if it would be clear > > when mixed with non-block-oriented IO. Block-oriented IO would say 8kB > > (or whatever the build-time value of a block was) and non-block-oriented > > IO would say B or kB. The math would work out. > > Right, yeah. Although maybe that's a little confusing? When you > originally added "unit", you had said: > > >The most correct thing to do to accommodate block-oriented and > >non-block-oriented IO would be to specify all the values in bytes. > >However, I would like this view to be usable visually (as opposed to > >just in scripts and by tools). The only current value of unit is > >"block_size" which could potentially be combined with the value of the > >GUC to get bytes. > > Is this still usable visually if you have to compare values across > units? I don't really have any great ideas here (and maybe this is > still the best option), just pointing it out. > > > Looking at pg_settings now though, I am confused about > > how the units for wal_buffers is 8kB but then the value of wal_buffers > > when I show it in psql is "16MB"... > > You mean the difference between > > maciek=# select setting, unit from pg_settings where name = 'wal_buffers'; > setting | unit > ---------+------ > 512 | 8kB > (1 row) > > and > > maciek=# show wal_buffers; > wal_buffers > ------------- > 4MB > (1 row) > > ? > > Poking around, I think it looks like that's due to > convert_int_from_base_unit (indirectly called from SHOW / > current_setting): > > /* > * Convert an integer value in some base unit to a human-friendly > unit. > * > * The output unit is chosen so that it's the greatest unit that can > represent > * the value without loss. For example, if the base unit is > GUC_UNIT_KB, 1024 > * is converted to 1 MB, but 1025 is represented as 1025 kB. > */ I've implemented a change using the same function pg_settings uses to turn the build-time parameter BLCKSZ into 8kB (get_config_unit_name()) using the flag GUC_UNIT_BLOCKS. I am unsure if this is better or worse than "block_size". I am feeling very conflicted about this column. > > > Though the units for the pg_stat_io view for block-oriented IO would be > > the build-time values for block size, so it wouldn't line up exactly > > with pg_settings. > > I don't follow--what would be the discrepancy? I got confused. You are right -- pg_settings does seem to use the build-time value of BLCKSZ to derive this. I was confused because the description of pg_settings says: "The view pg_settings provides access to run-time parameters of the server." - Melanie
Attachment
pgsql-hackers by date: