Re: Add memory_limit_hits to pg_stat_replication_slots - Mailing list pgsql-hackers

From Masahiko Sawada
Subject Re: Add memory_limit_hits to pg_stat_replication_slots
Date
Msg-id CAD21AoDYPbexAQB+_e34KyQycCCTkhBgRS8Y+To_=kD-z_SG1w@mail.gmail.com
Whole thread Raw
In response to Add memory_limit_hits to pg_stat_replication_slots  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Wed, Aug 27, 2025 at 12:26 AM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi hackers,
>
> I think that it's currently not always possible to determine how many times
> logical_decoding_work_mem has been reached.
>
> For example, say a transaction is made of 40 subtransactions, and I get:
>
>   slot_name   | spill_txns | spill_count | total_txns
> --------------+------------+-------------+------------
>  logical_slot |         40 |          41 |          1
> (1 row)
>
> Then I know that logical_decoding_work_mem has been reached one time (total_txns).
>
> But as soon as another transaction is decoded (that does not involve spilling):
>
>   slot_name   | spill_txns | spill_count | total_txns
> --------------+------------+-------------+------------
>  logical_slot |         40 |          41 |          2
> (1 row)
>
> Then we don't know if logical_decoding_work_mem has been reached one or two
> times.
>
> Please find attached a patch to $SUBJECT, to report the number of times the
> logical_decoding_work_mem has been reached.
>
> With such a counter one could get a ratio like total_txns/memory_limit_hits.
>
> That could help to see if reaching logical_decoding_work_mem is rare or
> frequent enough. If frequent, then maybe there is a need to adjust
> logical_decoding_work_mem.
>
> Based on my simple example above, one could say that it might be possible to get
> the same with:
>
> (spill_count - spill_txns) + (stream_count - stream_txns)
>
> but that doesn't appear to be the case with a more complicated example (277 vs 247):
>
>   slot_name   | spill_txns | spill_count | total_txns | stream_txns | stream_count | memory_limit_hits |
(spc-spct)+(strc-strt)
>
--------------+------------+-------------+------------+-------------+--------------+-------------------+------------------------
>  logical_slot |        405 |         552 |         19 |           5 |          105 |               277 |
     247 
> (1 row)
>
> Not sure I like memory_limit_hits that much, maybe work_mem_exceeded is better?
>
> Looking forward to your feedback,

Yes, it's a quite different situation in two cases: spilling 100
transactions in one ReorderBufferCheckMemoryLimit() call and spilling
1 transaction in each 100 ReorderBufferCheckMemoryLimit() calls, even
though spill_txn is 100 in both cases. And we don't have any
statistics to distinguish between these cases. I agree with the
statistics.

One minor comment is:

@@ -1977,6 +1978,7 @@ UpdateDecodingStats(LogicalDecodingContext *ctx)
  repSlotStat.stream_bytes = rb->streamBytes;
  repSlotStat.total_txns = rb->totalTxns;
  repSlotStat.total_bytes = rb->totalBytes;
+ repSlotStat.memory_limit_hits = rb->memory_limit_hits;

Since other statistics counter names are camel cases I think it's
better to follow that for the new counter.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: Expose custom planning data in EXPLAIN
Next
From: Noah Misch
Date:
Subject: Re: race condition in pg_class