Re: Report bytes and transactions actually sent downtream - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Report bytes and transactions actually sent downtream |
Date | |
Msg-id | CAJpy0uAxw_BKOTAtbnPdBcBCMS6jqLiaCraqLjwhRvuytb008A@mail.gmail.com Whole thread Raw |
In response to | Re: Report bytes and transactions actually sent downtream (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Report bytes and transactions actually sent downtream
|
List | pgsql-hackers |
On Wed, Aug 27, 2025 at 7:14 PM Bertrand Drouvot <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On Thu, Jul 24, 2025 at 12:24:26PM +0530, Ashutosh Bapat wrote: > > Here's the next patch which considers all the discussion so far. It > > adds four fields to pg_stat_replication_slots. > > - plugin - name of the output plugin > > Is this one needed? (we could get it with a join on pg_replication_slots) > In my opinion, when there are other plugin_* fields present, including the plugin name directly here seems like a better approach. So, +1 for the plugin field. > > - plugin_filtered_bytes - reports the amount of changes filtered > > out by the output plugin > > - plugin_sent_txns - the amount of transactions sent downstream by > > the output plugin > > - plugin_sent_bytes - the amount of data sent downstream by the > > outputplugin. > > > > There are some points up for a discussion: > > 1. pg_stat_reset_replication_slot() zeroes out the statistics entry by > > calling pgstat_reset() or pgstat_reset_of_kind() which don't know > > about the contents of the entry. So > > PgStat_StatReplSlotEntry::plugin_has_stats is set to false and plugin > > stats are reported as NULL, instead of zero, immediately after reset. > > This is the same case when the stats is queried immediately after the > > statistics is initialized and before any stats are reported. We could > > instead make it report > > zero, if we save the plugin_has_stats and restore it after reset. But > > doing that in pgstat_reset_of_kind() seems like an extra overhead + we > > will need to write a function to find all replication slot entries. I tried to think of an approach where we can differentiate between the cases 'not initialized' and 'reset' ones with the values. Say instead of plugin_has_stats, if we have plugin_stats_status, then we can maintain status like -1(not initialized), 0(reset). But this too will complicate the code further. Personally, I’m okay with NULL values appearing even after a reset, especially since the documentation explains this clearly. > > > 2. There's also a bit of asymmetry in the way sent_bytes is handled. > > The code which actually sends the logical changes to the downstream is > > part of the core code > > but the format of the change and hence the number of bytes sent is > > decided by the plugin. It's a stat related to plugin but maintained by > > the core code. The patch implements it as a plugin stat (so the > > corresponding column has "plugin" prefix > > The way it is done makes sense to me. > > > 3. The names of new columns have the prefix "plugin_" but the internal > > variables tracking those don't for the sake of brevity. If you prefer > > to have the same prefix for the internal variables, I can change that. > I am okay either way. Few comments: 1) postgres=# select slot_name, total_bytes,plugin_filtered_bytes,plugin_sent_bytes from pg_stat_replication_slots order by slot_name; slot_name | total_bytes | plugin_filtered_bytes | plugin_sent_bytes -----------+-------------+-----------------------+------------------- slot1 | 800636 | 793188 | 211 sub1 | 401496 | 132712 | 84041 sub2 | 401496 | 396184 | 674 sub3 | 401496 | 145912 | 79959 (4 rows) Currently it looks quite confusing. 'total_bytes' gives a sense that it has to be a sum of filtered and sent. But they are no way like that. In the thread earlier there was a proposal to change the name to reordered_txns, reordered_bytes. That looks better to me. It will give clarity without even someone digging into docs. 2) Tried to verify all filtered data tests, seems to work well. Also I tried tracking the usage of OutputPluginWrite() to see if there is any other place where data needs to be considered as filtered-data. Encountered this: send_relation_and_attrs has: if (!logicalrep_should_publish_column(att, columns, include_gencols_type)) continue; if (att->atttypid < FirstGenbkiObjectId) continue; But I don't think it needs to be considered as filtered data. This is mostly schema related info. But I wanted to confirm once. Thoughts? 3) +-- total_txns may vary based on the background activity but sent_txns should +-- always be 1 since the background transactions are always skipped. Filtered +-- bytes would be set only when there's a change that was passed to the plugin +-- but was filtered out. Depending upon the background transactions, filtered +-- bytes may or may not be zero. +SELECT slot_name, spill_txns = 0 AS spill_txns, spill_count = 0 AS spill_count, total_txns > 0 AS total_txns, total_bytes > 0 AS total_bytes, plugin_sent_txns, plugin_sent_bytes > 0 AS sent_bytes, plugin_filtered_bytes >= 0 AS filtered_bytes FROM pg_stat_replication_slots ORDER BY slot_name; In comment either we can say plugin_sent_txns instead of sent_txns or in the query we can fetch plugin_sent_txns AS sent_txns, so that we can relate comment and query. 4) + <literal>sentTxns</literal> is the number of transactions sent downstream + by the output plugin. <literal>sentBytes</literal> is the amount of data + sent downstream by the output plugin. + <function>OutputPluginWrite</function> is expected to update this counter + if <literal>ctx->stats</literal> is initialized by the output plugin. + <literal>filteredBytes</literal> is the size of changes in bytes that are + filtered out by the output plugin. Function + <literal>ReorderBufferChangeSize</literal> may be used to find the size of + filtered <literal>ReorderBufferChange</literal>. + </para> Either we can mention units as 'bytes' for both filteredBytes and sentBytes or for none. Currently filteredBytes says 'in bytes' while sentBytes does not. thanks Shveta
pgsql-hackers by date: