Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Álvaro Herrera
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id 202508061044.ptcyt7aqsaaa@alvherre.pgsql
Whole thread Raw
In response to LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  (Alexandra Wang <alexandra.wang.oss@gmail.com>)
List pgsql-hackers
On 2025-Aug-05, Alexandra Wang wrote:

> I'm bringing up a bug that was reported multiple times [1][2][3] in
> the bugs list here, for a broader audience.
> 
> The issue is that an ERROR like the one below occurs when trying to
> register any listener in the database.
> 
> test=# listen c21;
> ERROR:  58P01: could not access status of transaction 14279685
> DETAIL:  Could not open file "pg_xact/000D": No such file or directory.
> LOCATION:  SlruReportIOError, slru.c:1087

Oh, interesting problem.  Many thanks for the excellent write-up.

> My questions:
> 
> 1. Is it acceptable to drop notifications from the async queue if
> there are no active listeners? There might still be notifications that
> haven’t been read by any previous listener.

I'm somewhat wary of this idea -- could these inactive listeners become
active later and expect to be able to read their notifies?

> 2. If the answer to 1 is no, how can we teach VACUUM to respect the
> minimum xid stored in all AsyncQueueEntries?

Maybe we can have AsyncQueueAdvanceTail return the oldest XID of
listeners, and back off the pg_clog truncation based on that.  This
could be done by having a new boolean argument that says to look up the
XID from the PGPROC using BackendPidGetProc(QUEUE_BACKEND_PID) (which
would only be passed true by vac_update_datfrozenxid(), to avoid
overhead by other callers), then collect the oldest of those and return
it.

This does create the problem that an inactive listener could cause the
XID counter to stay far in the past.  Maybe we could try to avoid this
by adding more signalling (e.g, AsyncQueueAdvanceTail() itself could
send PROCSIG_NOTIFY_INTERRUPT signal?), and terminating backends that
are way overdue on reading notifies.  I'm not sure if this is really
needed or useful; consider a backend stuck on SIGSTOP (debugger or
whatever): it will just sit there forever.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/



pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: GB18030-2022 Support in PostgreSQL
Next
From: shveta malik
Date:
Subject: Re: Logical Replication of sequences