Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id a8892a0e-7660-42d1-98bb-b677f6c15069@iki.fi
Whole thread Raw
In response to Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  ("Matheus Alcantara" <matheusssilv97@gmail.com>)
List pgsql-hackers
On 05/11/2025 21:02, Matheus Alcantara wrote:
> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
>>> In case of an error on TransactionIdDidCommit() I think that we will
>>> have the same behavior as we advance the "current" position before of
>>> DidCommit call the PG_FINALLY block will set the backend position past
>>> the failing notification entry.
>>
>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
>> notifications that we have buffered in the local buffer but haven't
>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
>> notification, the one where TransactionIdDidCommit() failed.
>>
> Yeah, that's true.
> 
>>> How bad would be to store the notification entries within a list and
>>> store the next position with the notification entry and then wrap the
>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
>>> saved "next position" on PG_CATCH? Something like this:
>>
>> [ ...]
>>
>> That addresses the failure on NotifyMyFrontEnd, but not on
>> TransactionIdDidCommit.
>>
>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
>> should not really fail, after fixing the bug we're discussing.
>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
>> an otherwise idle connection. Or it could fail if the client connection
>> is lost, but then the backend is about to die anyway. And arguably
>> closing the connection is better than losing even a single notification,
>> anyway.
>>
> My only concern with making these errors FATAL is that if a notification
> entry causes a different, recoverable error, all subsequent messages
> will be lost. This is because if backend die and the user open a new
> connection and execute LISTEN on the channel it will not see these
> notifications past the one that caused the error. I'm not sure if we are
> completely safe from this case of a recoverable error, what do you
> think?

I did some more testing of the current behavior, using the encoding 
conversion to cause an error:

In backend A:

SET client_encoding='latin1';
LISTEN foo;

Backend b:

NOTIFY foo, 'ハ';

Observations:

- If the connection is idle when the notification is received, the ERROR 
is turned into FATAL anyway:

postgres=# SET client_encoding='latin1';
SET
postgres=# LISTEN foo;
LISTEN
postgres=# select 1; -- do the NOTIFY in another connection before this
ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8" 
has no equivalent in encoding "LATIN1"
FATAL:  terminating connection because protocol synchronization was lost
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

- If there are multiple notifications pending, we stop processing the 
subsequent notifications on the first error. The subsequent 
notifications will only be processed when another notify interrupt is 
received, i.e. when a backend sends yet another notification.

I'm getting more and more convinced that escalating all ERRORs to FATALs 
during notify processing is the right way to go. Attached is a new patch 
set that does that.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: John Naylor
Date:
Subject: Re: GiST README typos
Next
From: Thomas Munro
Date:
Subject: Re: [Patch] Windows relation extension failure at 2GB and 4GB