On 05/11/2025 21:02, Matheus Alcantara wrote:
> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote:
>>> In case of an error on TransactionIdDidCommit() I think that we will
>>> have the same behavior as we advance the "current" position before of
>>> DidCommit call the PG_FINALLY block will set the backend position past
>>> the failing notification entry.
>>
>> With my patch, if TransactionIdDidCommit() fails, we will lose all the
>> notifications that we have buffered in the local buffer but haven't
>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single
>> notification, the one where TransactionIdDidCommit() failed.
>>
> Yeah, that's true.
>
>>> How bad would be to store the notification entries within a list and
>>> store the next position with the notification entry and then wrap the
>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the
>>> saved "next position" on PG_CATCH? Something like this:
>>
>> [ ...]
>>
>> That addresses the failure on NotifyMyFrontEnd, but not on
>> TransactionIdDidCommit.
>>
>> IMHO we should just make these errors FATAL. TransactionIdDidCommit()
>> should not really fail, after fixing the bug we're discussing.
>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on
>> an otherwise idle connection. Or it could fail if the client connection
>> is lost, but then the backend is about to die anyway. And arguably
>> closing the connection is better than losing even a single notification,
>> anyway.
>>
> My only concern with making these errors FATAL is that if a notification
> entry causes a different, recoverable error, all subsequent messages
> will be lost. This is because if backend die and the user open a new
> connection and execute LISTEN on the channel it will not see these
> notifications past the one that caused the error. I'm not sure if we are
> completely safe from this case of a recoverable error, what do you
> think?
I did some more testing of the current behavior, using the encoding
conversion to cause an error:
In backend A:
SET client_encoding='latin1';
LISTEN foo;
Backend b:
NOTIFY foo, 'ハ';
Observations:
- If the connection is idle when the notification is received, the ERROR
is turned into FATAL anyway:
postgres=# SET client_encoding='latin1';
SET
postgres=# LISTEN foo;
LISTEN
postgres=# select 1; -- do the NOTIFY in another connection before this
ERROR: character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8"
has no equivalent in encoding "LATIN1"
FATAL: terminating connection because protocol synchronization was lost
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
- If there are multiple notifications pending, we stop processing the
subsequent notifications on the first error. The subsequent
notifications will only be processed when another notify interrupt is
received, i.e. when a backend sends yet another notification.
I'm getting more and more convinced that escalating all ERRORs to FATALs
during notify processing is the right way to go. Attached is a new patch
set that does that.
- Heikki