Thread: Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

From

Andrew Johnson

Date:

10 June, 19:39:48

Hi Naga,

Thank you for the thoughtful feedback and for driving attention to
this issue. I appreciate you taking the time to review my patch.

You raise some good points about the trade-offs between a lightweight
function and the pgstat infrastructure. I actually think both
approaches have merit for different use cases, and they could
potentially coexist to serve the community better.

> I shared a patch [0] that adds a SQL-callable function exposing the same counters via ReadMultiXactCounts() without
complexity...introducingnew statistics infrastructure may be more than what's needed unless there's an additional use
caseI'm overlooking...A lightweight function seems better aligned with the nature of these metrics and the operational
usecases they serve, particularly for historical/ongoing diagnostics and periodic monitoring. 

I reviewed your patch in depth and I believe the pgstat approach I
took offers some advantages for continuous monitoring scenarios:

1. Performance under monitoring load: Many production environments,
including Metronome's, will poll these statistics frequently for
alerting. Using a direct call to pg_get_multixact_count() ->
ReadMultiXactCounts() acquires LWLocks, which could create significant
contention when multiple monitoring systems are polling frequently. In
high-throughput environments, this could become a bottleneck. The
pgstat view reads from shared memory snapshots without additional lock
acquisition, making it essentially free since we only update the
pgstat structure while we have the lock in the first place.

2. Consistency with existing patterns: PostgreSQL currently uses the
pgstat infrastructure for similar global, clusterwide metrics like
pg_stat_wal, pg_stat_wal_receiver, pg_stat_archiver, pg_stat_bgwriter,
and pg_stat_checkpointer. The multixact member count fits this same
pattern of cluster-wide resource monitoring.

3. Automatic updates: The stats update during natural multixact
operations (allocation, freeze threshold checks), providing current
data without requiring explicit polling of the underlying counters.

Your function approach has clear benefits for ad-hoc diagnostics and
simpler operational queries where call frequency is low. I also note
that your patch tracks both multixacts and members, which provides
valuable additional context.

I've also included isolation tests that verify the view accurately
reflects multixact member allocation, which helps ensure correctness
of the monitoring data.

Given our production experience with multixact membership exhaustion
at Metronome, both approaches would solve the core observability
problem.

I'm happy to keep discussing what the best approach for the community
is. It's great that more light is being shed on this particular issue.

[0] https://www.postgresql.org/message-id/CA%2BQeY%2BDTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg%40mail.gmail.com

--
Respectfully,

Andrew Johnson
Software Engineer
Metronome, Inc.

Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

From

Naga Appani

Date:

11 June, 04:22:36

On Tue, Jun 10, 2025 at 11:40 AM Andrew Johnson <andrewj@metronome.com> wrote:
>
> Hi Naga,
>
> Thank you for the thoughtful feedback and for driving attention to
> this issue. I appreciate you taking the time to review my patch.
>
> You raise some good points about the trade-offs between a lightweight
> function and the pgstat infrastructure. I actually think both
> approaches have merit for different use cases, and they could
> potentially coexist to serve the community better.
>
> > I shared a patch [0] that adds a SQL-callable function exposing the same counters via ReadMultiXactCounts() without
complexity...introducingnew statistics infrastructure may be more than what's needed unless there's an additional use
caseI'm overlooking...A lightweight function seems better aligned with the nature of these metrics and the operational
usecases they serve, particularly for historical/ongoing diagnostics and periodic monitoring. 
>
> I reviewed your patch in depth and I believe the pgstat approach I
> took offers some advantages for continuous monitoring scenarios:
>
> 1. Performance under monitoring load: Many production environments,
> including Metronome's, will poll these statistics frequently for
> alerting. Using a direct call to pg_get_multixact_count() ->
> ReadMultiXactCounts() acquires LWLocks, which could create significant
> contention when multiple monitoring systems are polling frequently. In
> high-throughput environments, this could become a bottleneck. The
> pgstat view reads from shared memory snapshots without additional lock
> acquisition, making it essentially free since we only update the
> pgstat structure while we have the lock in the first place.
>
> 2. Consistency with existing patterns: PostgreSQL currently uses the
> pgstat infrastructure for similar global, clusterwide metrics like
> pg_stat_wal, pg_stat_wal_receiver, pg_stat_archiver, pg_stat_bgwriter,
> and pg_stat_checkpointer. The multixact member count fits this same
> pattern of cluster-wide resource monitoring.
>
> 3. Automatic updates: The stats update during natural multixact
> operations (allocation, freeze threshold checks), providing current
> data without requiring explicit polling of the underlying counters.
>
> Your function approach has clear benefits for ad-hoc diagnostics and
> simpler operational queries where call frequency is low. I also note
> that your patch tracks both multixacts and members, which provides
> valuable additional context.
>
> I've also included isolation tests that verify the view accurately
> reflects multixact member allocation, which helps ensure correctness
> of the monitoring data.
>
> Given our production experience with multixact membership exhaustion
> at Metronome, both approaches would solve the core observability
> problem.
>
> I'm happy to keep discussing what the best approach for the community
> is. It's great that more light is being shed on this particular issue.
>
> [0] https://www.postgresql.org/message-id/CA%2BQeY%2BDTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg%40mail.gmail.com
>
> --
> Respectfully,
>
> Andrew Johnson
> Software Engineer
> Metronome, Inc.

Hi Andrew,

Thanks again for the thoughtful discussion and reviewing my patch.

I would like to directly address the concern around LWLock contention
and potential performance overhead from pg_get_multixact_count().

To evaluate this, I ran repeated invocations of both
pg_get_multixact_count() and pg_stat_multixact under sustained system
load. At the time of testing, the system showed a significantly high
load average with notable LWLock contention:

----------------------------------------------------------------------------------
$ uptime
 00:23:30 up  2:08,  4 users,  load average: 103.18, 117.93, 116.77
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
postgres=# SELECT
    a.wait_event_type,
    COUNT(*) AS count
FROM pg_stat_activity a
WHERE a.wait_event_type IS NOT NULL
GROUP BY a.wait_event_type
ORDER BY count DESC;
 wait_event_type | count
-----------------+-------
 LWLock          |  1978
 Client          |  1014
 Timeout         |    21
 Activity        |     6
(4 rows)
----------------------------------------------------------------------------------

To simulate realistic monitoring intervals — typically in the 100ms to
1s range — I used pgbench with -R 10 (10 queries per second) over a
60-second duration for both the function and the view. Here are the
full results:

----------------------------------------------------------------------------------
$ pgbench -n -T 60 -R 10 -f <(echo "SELECT * FROM
pg_get_multixact_count();") -h 127.0.0.1 -p 5593 -U postgres postgres
pgbench (18beta1)
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 579
number of failed transactions: 0 (0.000%)
latency average = 0.892 ms
latency stddev = 1.667 ms
rate limit schedule lag: avg 0.160 (max 8.789) ms
initial connection time = 7.106 ms
tps = 9.655158 (without initial connection time)
----------------------------------------------------------------------------------

----------------------------------------------------------------------------------
$ pgbench -n -T 60 -R 10 -f <(echo "SELECT * FROM pg_stat_multixact;")
-h 127.0.0.1 -p 5593 -U postgres postgres
pgbench (18beta1)
transaction type: /dev/fd/63
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
maximum number of tries: 1
duration: 60 s
number of transactions actually processed: 589
number of failed transactions: 0 (0.000%)
latency average = 0.378 ms
latency stddev = 0.273 ms
rate limit schedule lag: avg 0.137 (max 4.347) ms
initial connection time = 4.516 ms
tps = 9.846161 (without initial connection time)
----------------------------------------------------------------------------------

Even under heavy LWLock contention, both approaches maintained stable
performance. pg_get_multixact_count() executed comfortably under 1 ms
on average, and schedule lag remained low, demonstrating that it can
be safely used in periodic monitoring setups.

I also compared the live counter from the function to the stats
snapshot from the view:

----------------------------------------------------------------------------------
postgres=# SELECT
    'from_pg_get_multixact_count' AS source,
    members
FROM
    pg_get_multixact_count ()
UNION ALL
SELECT
    'from_pg_stat_multixact' AS source,
    members
FROM
    pg_stat_multixact;
           source            |  members
-----------------------------+-----------
 from_pg_get_multixact_count | 839438187
 from_pg_stat_multixact      | 839438011
(2 rows)
----------------------------------------------------------------------------------

While the values are quite close, I think it’s worth highlighting that
pg_get_multixact_count() returns real-time state by calling
ReadMultiXactCounts() at query time, whereas pg_stat_multixact reports
values from the statistics collector’s last sampling cycle. Although
the collector also calls ReadMultiXactCounts() internally, its updates
are asynchronous and may lag — either due to the
stats_fetch_consistency setting [1], or in environments with high load
or memory pressure where stats updates may be delayed.

This distinction was part of the motivation behind my earlier proposal
[0], which introduced a lightweight SQL-callable function using
ReadMultiXactCounts(). Since these counters are global, not
aggregatable per backend, and don’t reset meaningfully, it seemed
reasonable to expose them without adding new statistics
infrastructure.

I understand the appeal of following the pg_stat_* view pattern for
consistency and passive observability, and I think both approaches
could certainly coexist. The view based design offers a familiar
interface for long-term monitoring, while the function can complement
it in scenarios where timely visibility is helpful — such as
monitoring multixact activity more closely during periods of elevated
usage or investigating potential pressure before it escalates.

While the function and view can coexist, I believe the function
already addresses the majority of practical needs efficiently, without
introducing noticeable contention — even on heavily loaded systems —
and without adding the complexity of statistics infrastructure.

I would be happy to continue the discussion or help refine the direction.

[0] https://www.postgresql.org/message-id/CA%2BQeY%2BDTggHskCXOa39nag2sFds9BD-7k__zPbvL-_VVyJw7Sg%40mail.gmail.com
[1] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS

Best regards,
Naga

Re: [PATCH v1] Add pg_stat_multixact view for multixact membership usage monitoring

From

Andrew Johnson

Date:

15 June, 03:51:15

Hi Naga,

Thank you for your thoughtful response and for providing those benchmarks.

I'd like to address your concerns about the potential for statistical
lag with clarifications and empirical data:

> its updates are asynchronous and may lag — either due to the stats_fetch_consistency setting

When operators configure `stats_fetch_consistency` to keep statistical
data static during transactions, this is by design - it gives them
explicit control over the consistency/freshness tradeoff. As the
documentation notes, `pg_stat_clear_snapshot()` provides up-to-date
visibility when needed [0].

> [stats] may lag...in environments with high load or memory pressure

This is a valid concern. In order to investigate the extent of
potential statistical lag, I conducted a test to quantify the amount
of lag between `pg_stat_multixact` and `pg_get_multixact_count()`.

=Testing Methodology=
==Build Configuration==
```
configure --enable-cassert=no --enable-debug=no
```

==Test Schema (Multixact-Intensive Workload)==
```
DROP TABLE IF EXISTS orders CASCADE;
DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;

CREATE TABLE customers (
customer_id SERIAL PRIMARY KEY,
name TEXT
);

CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
name TEXT
);

CREATE TABLE orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER REFERENCES customers(customer_id),
product_id INTEGER REFERENCES products(product_id),
order_date TIMESTAMP DEFAULT NOW(),
amount DECIMAL(10,2)
);

INSERT INTO customers (name) VALUES ('Customer 1'), ('Customer 2');
INSERT INTO products (name) VALUES ('Product 1'), ('Product 2');

INSERT INTO orders (customer_id, product_id, amount)
SELECT
(RANDOM() + 1)::INTEGER,
(RANDOM() + 1)::INTEGER,
RANDOM() * 100
FROM generate_series(1, 100000);

CREATE INDEX idx_orders_customer ON orders(customer_id);
CREATE INDEX idx_orders_product ON orders(product_id);

ANALYZE;
```

==Load Generation (Terminal 1)==
```
pgbench -n -c 100 -j 100 -T 300 -f =(cat << 'EOF'
BEGIN;
SELECT o.*, c.name, p.name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN products p ON o.product_id = p.product_id
WHERE o.customer_id = 1
FOR SHARE;
COMMIT;
EOF
) postgres
```

==Lag Measurement (Terminal 2)==
```
psql postgres << 'EOF'
\timing on
DO $$
DECLARE
func_result BIGINT;
view_result BIGINT;
i INTEGER;
diff BIGINT;
timestamp_ms BIGINT;
BEGIN
RAISE NOTICE 'timestamp_ms,sample,function_members,view_members,diff,view_status';

FOR i IN 1..3000 LOOP
timestamp_ms := EXTRACT(EPOCH FROM clock_timestamp()) * 1000;

SELECT members INTO func_result FROM pg_get_multixact_count();

PERFORM pg_stat_clear_snapshot();
SELECT members INTO view_result FROM pg_stat_multixact;

diff := view_result - func_result;

RAISE NOTICE '%,%,%,%,%,%',
timestamp_ms,
i,
func_result,
view_result,
diff,
CASE WHEN diff > 0 THEN 'AHEAD'
WHEN diff < 0 THEN 'BEHIND'
ELSE 'EQUAL' END;

PERFORM pg_sleep(0.1);
END LOOP;
END $$;
EOF
```

==Results==

From 2,744 samples collected during 279 seconds of sustained 100-client load:

Lag Statistics:
- P95 Percentage Difference: 0.95%
- P50 Percentage Difference: 0.12%
- Mean Percentage Difference: 0.25%
- Max Percentage Difference: 1.30%

I've attached the results of my test to this email as a CSV file named
`lag_test_results.csv`.

From the data, the percentage differences between `pg_stat_multixact`
and `pg_get_multixact_count()` generally stay below 1% under heavy
load, maxing out at ~1.3%. The two statistics converge very quickly at
the end of the test when the heavy load is removed.

The data suggests that even under extreme load designed to maximize
multixact allocation, the operational lag remains marginal.

To address your other concerns:

> Since these counters are global, not aggregatable per backend

I understand this perspective, though I'd suggest that PostgreSQL
already uses the pgstats pattern for similar global metrics, such as:

- pg_stat_wal
- pg_stat_bgwriter
- pg_stat_checkpointer
- pg_stat_archiver

These all share similar characteristics with multixact member counts:
global scope and used for cluster-wide resource monitoring.

> it seemed reasonable to expose them without adding new statistics infrastructure.
> adding the complexity of statistics infrastructure.

The pgstats approach follows a well-established design principle:
ensuring that monitoring queries never impact the performance of the
monitored subsystem, provide reasonably up-to-date statistics, and
offer a predictable interface for future developers to extend when
adding related metrics. Given the marginal statistical lag, I believe
it's beneficial to avoid even the theoretical possibility of lock
contention.

> and don’t reset meaningfully,

You're right that these metrics don't reset in the traditional sense.
However, this characteristic doesn't necessarily disqualify the
approach given the reasons mentioned above.

I appreciate your feedback and believe both proposals address vital
observability gaps. I'm open to collaborating on a path forward that
meaningfully addresses the problem and serves the broader needs of the
Postgres community.

[0] https://www.postgresql.org/docs/current/monitoring-stats.html#MONITORING-STATS-VIEWS

--
Respectfully,

Andrew Johnson
Software Engineer
Metronome, Inc

Attachment

lag_test_results.csv