Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
Date
Msg-id CAExHW5s5hnxKK_VE=q94sN8uU5BJCRpAescGuamheuM-Ua559A@mail.gmail.com
Whole thread Raw
In response to Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring  (Naga Appani <nagnrik@gmail.com>)
List pgsql-hackers
On Mon, Aug 4, 2025 at 11:46 AM Naga Appani <nagnrik@gmail.com> wrote:
> > In PG14+, the transaction wraparound is triggered if the size of the
> > directory exceeds 10GB. This function does not help monitoring that
> > condition. So a user will need to use du or pg_ls_multixactdir()
> > anyway, which defeats the purpose of this function being more
> > efficient than those methods. Am I correct? Can we also report the
> > size of the directory in this function?
>
> Correct, that is the intent of the function. The members count
> returned by this function already provides the necessary information
> to determine the directory size, since each member entry has a fixed
> size. The constants and formulas in [0] and discussed in [1] show that
> each group stores four bytes of flags plus four TransactionIds (20
> bytes total), yielding 409 groups per 8 KB page and a fixed
> members‑to‑bytes ratio. This means ~2 billion members corresponds to
> ~10 GB (aggressive autovacuum threshold) and ~4 billion members
> corresponds to ~20 GB (wraparound).

Would it be better to do that math in the function and output the
result? Users may not be able to read and understand the PostgreSQL
code or pgsql-hackers threads Or the constants may change across
versions. It will be more convenient for users if they get the output
from the function itself.

On Fri, Aug 8, 2025 at 6:05 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> ReadMultiXactCounts() is also incorrectly named with your proposal to
> expose oldestMultiXactId in the information returned to the caller,
> where the key point is to make sure that the information retrieved is
> consistent across a single LWLock acquisition.  So perhaps this should
> be named GetMultiXactInformation() or something similar?

+1

>
> The top of ReadMultiXactCounts() (or whatever its new name) should
> also document the information returned across a single call.  It looks
> inconsistent to return oldestMultiXactId if the oldestOffsetKnown is
> false.  What about oldestOffset itself?  Should it be returned for
> consistency with the counts and oldestMultiXactId?

+1

Some more comments on the patch
+ <literal>multixacts</literal> is the number of multixact IDs assigned,
+ <literal>members</literal> is the number of multixact member entries created,
+ and <literal>oldest_multixact</literal> is the oldest MultiXact ID
still in use.

Now that the name of the function is changed, we need the names to
indicate that they are counts e.g. num_mxids, num_members.

+ These values can be used to monitor multixact consumption and anticipate
+ autovacuum behavior. See <xref linkend="vacuum-for-multixact-wraparound"/>
+ for further details on multixact wraparound.
+ </para>
+
+ <para>
+ <literal>SELECT * FROM pg_get_multixact_stats();</literal>
+<programlisting>
+ multixacts | members | oldest_multixact
+------------+-------------+------------------
+ 182371396 | 2826221174 | 754321
+</programlisting>

This file doesn't provide usage examples of other functions. This
function doesn't seem to be an exception.

I think we should mention that the statistics may get stale as soon as
it's fetched, even with REPEATABLE READ isolation level.

+ linkend="guc-autovacuum-multixact-freeze-max-age"/>. Also, if either
+ the storage occupied by multixact members exceeds about 10GB or the number
+ of members created exceeds approximately 2 billion entries, aggressive vacuum

In case each member starts consuming more or less space than it does
today what would be the basis of triggering workaround? Space or
number of members? I think we should mention only that.

scans will occur more often for all tables, starting with those that
- have the oldest multixact-age. Both of these kinds of aggressive
+ have the oldest multixact-age. Both of these kinds of aggressive
scans will occur even if autovacuum is nominally disabled. The members storage
- area can grow up to about 20GB before reaching wraparound.
+ area can grow up to about 20GB or approximately 4 billion entries before
+ reaching wraparound.

Similar to above.

+ </para>
+
+ <para>
+ The <function>pg_get_multixact_stats()</function> function provides a way
+ to monitor multixact allocation and usage patterns in real time. By exposing

This is the right place to elaborate the usage of this function with an example.

+ counts of multixacts, member entries, and the oldest multixact ID, it helps:
+ <orderedlist>
+ <listitem>
+ <simpara>
+ Identify unusual multixact activity from concurrent row-level locks
+ or foreign key operations
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Monitor progress toward wraparound thresholds that trigger aggressive
+ autovacuum (approximately 2 billion members or 10GB storage)
+ </simpara>
+ </listitem>
+ <listitem>
+ <simpara>
+ Verify whether autovacuum is effectively managing multixact cleanup
+ before reaching critical thresholds
+ </simpara>
+ </listitem>
+ </orderedlist>
+ See <xref linkend="functions-info-snapshot"/> for details.

I think the second point here repeats what's already mentioned
earlier. It will be good to elaborate each point with an example
instead of just narration.

+/*
+ * pg_get_multixact_stats
+ *
+ * SQL-callable function to retrieve MultiXact statistics.
+ *
+ * Returns a composite row containing:
+ * - total number of MultiXact IDs created since startup,
+ * - total number of MultiXact members created,

... since startup or the number of existing members?

+ * - the oldest existing MultiXact ID.
+ *
+ * This is primarily useful for monitoring MultiXact usage and ensuring
+ * appropriate wraparound protection.

The last two lines are not required, I think. One of its usage is
monitoring but users may find other usages.

+
+step commit1: COMMIT;
+step commit2: COMMIT;
+step check:
+ SELECT
+ multixacts,
+ members,
+ oldest_multixact
+ FROM pg_get_multixact_stats();
+
+multixacts|members|oldest_multixact
+----------+-------+----------------
+ 1| 3| 1
+(1 row)

Vacuum may clean the multixact between commit2 and check, in which
case the result won't be stable.

--
Best Wishes,
Ashutosh Bapat



pgsql-hackers by date:

Previous
From: Rahila Syed
Date:
Subject: Re: Enhancing Memory Context Statistics Reporting
Next
From: Álvaro Herrera
Date:
Subject: Re: New commitfest app release on August 19th