Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
Date
Msg-id 76c4407c-7a2a-4626-b999-d066ba2d8d2e@iki.fi
Whole thread Raw
In response to Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue  (Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>)
List pgsql-hackers
On 06/11/2025 13:20, Arseniy Mukhin wrote:
> On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> On 05/11/2025 21:02, Matheus Alcantara wrote:
>>> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
>>>>> In case of an error on TransactionIdDidCommit() I think that we will
>>>>> have the same behavior as we advance the "current" position before of
>>>>> DidCommit call the PG_FINALLY block will set the backend position past
>>>>> the failing notification entry.
>>>>
>>>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
>>>> notifications that we have buffered in the local buffer but haven't
>>>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
>>>> notification, the one where TransactionIdDidCommit() failed.
>>>>
>>> Yeah, that's true.
>>>
>>>>> How bad would be to store the notification entries within a list and
>>>>> store the next position with the notification entry and then wrap the
>>>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
>>>>> saved "next position" on PG_CATCH? Something like this:
>>>>
>>>> [ ...]
>>>>
>>>> That addresses the failure on NotifyMyFrontEnd, but not on
>>>> TransactionIdDidCommit.
>>>>
>>>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
>>>> should not really fail, after fixing the bug we're discussing.
>>>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
>>>> an otherwise idle connection. Or it could fail if the client connection
>>>> is lost, but then the backend is about to die anyway. And arguably
>>>> closing the connection is better than losing even a single notification,
>>>> anyway.
>>>>
>>> My only concern with making these errors FATAL is that if a notification
>>> entry causes a different, recoverable error, all subsequent messages
>>> will be lost. This is because if backend die and the user open a new
>>> connection and execute LISTEN on the channel it will not see these
>>> notifications past the one that caused the error. I'm not sure if we are
>>> completely safe from this case of a recoverable error, what do you
>>> think?
>>
>> I did some more testing of the current behavior, using the encoding
>> conversion to cause an error:
>>
>> In backend A:
>>
>> SET client_encoding='latin1';
>> LISTEN foo;
>>
>> Backend b:
>>
>> NOTIFY foo, 'ハ';
>>
>> Observations:
>>
>> - If the connection is idle when the notification is received, the ERROR
>> is turned into FATAL anyway:
>>
>> postgres=# SET client_encoding='latin1';
>> SET
>> postgres=# LISTEN foo;
>> LISTEN
>> postgres=# select 1; -- do the NOTIFY in another connection before this
>> ERROR:  character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
>> has no equivalent in encoding "LATIN1"
>> FATAL:  terminating connection because protocol synchronization was lost
>> server closed the connection unexpectedly
>>          This probably means the server terminated abnormally
>>          before or while processing the request.
>>
>> - If there are multiple notifications pending, we stop processing the
>> subsequent notifications on the first error. The subsequent
>> notifications will only be processed when another notify interrupt is
>> received, i.e. when a backend sends yet another notification.
>>
>> I'm getting more and more convinced that escalating all ERRORs to FATALs
>> during notify processing is the right way to go. Attached is a new patch
>> set that does that.
>
> One point about removing try/catch: when we execute LISTEN for the
> very first time, we read the queue from the min(listener's pos) up to
> the head. listenChannels is empty at the moment, so we never call
> NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have
> some problematic entry in the queue that we need to process during
> initial LISTEN, it could block creation of new listeners. Is it
> something we need to worry about?

Hmm, I don't see how it could block the creation of new listeners.
Exec_ListenPreCommit() starts from "max(listener's pos)" over existing
listeners, not min(). Am I missing something?

That said, Exec_ListenPreCommit() could start from 'min' among all
listeners, if there are no listeners for the same backend. Per attached
patch. I don't think it makes much difference though.

Another small change we could make is to check for listenChannels == NIL
before calling TransactionIdDidCommit.

- Heikki
Attachment

pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Re: Refactor StringInfo usage in subscriptioncmds.c
Next
From: Amit Kapila
Date:
Subject: Re: Refactor StringInfo usage in subscriptioncmds.c