Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers
| From | Arseniy Mukhin |
|---|---|
| Subject | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
| Date | |
| Msg-id | CAE7r3MKpxCmeArYtjZSq-Ui=oqAv9LYMKXh_HQWf4LRj4=H2vg@mail.gmail.com Whole thread Raw |
| In response to | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue (Álvaro Herrera <alvherre@kurilemu.de>) |
| Responses |
Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue
|
| List | pgsql-hackers |
On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > On 05/11/2025 21:02, Matheus Alcantara wrote: > > On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote: > >>> In case of an error on TransactionIdDidCommit() I think that we will > >>> have the same behavior as we advance the "current" position before of > >>> DidCommit call the PG_FINALLY block will set the backend position past > >>> the failing notification entry. > >> > >> With my patch, if TransactionIdDidCommit() fails, we will lose all the > >> notifications that we have buffered in the local buffer but haven't > >> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single > >> notification, the one where TransactionIdDidCommit() failed. > >> > > Yeah, that's true. > > > >>> How bad would be to store the notification entries within a list and > >>> store the next position with the notification entry and then wrap the > >>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the > >>> saved "next position" on PG_CATCH? Something like this: > >> > >> [ ...] > >> > >> That addresses the failure on NotifyMyFrontEnd, but not on > >> TransactionIdDidCommit. > >> > >> IMHO we should just make these errors FATAL. TransactionIdDidCommit() > >> should not really fail, after fixing the bug we're discussing. > >> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on > >> an otherwise idle connection. Or it could fail if the client connection > >> is lost, but then the backend is about to die anyway. And arguably > >> closing the connection is better than losing even a single notification, > >> anyway. > >> > > My only concern with making these errors FATAL is that if a notification > > entry causes a different, recoverable error, all subsequent messages > > will be lost. This is because if backend die and the user open a new > > connection and execute LISTEN on the channel it will not see these > > notifications past the one that caused the error. I'm not sure if we are > > completely safe from this case of a recoverable error, what do you > > think? > > I did some more testing of the current behavior, using the encoding > conversion to cause an error: > > In backend A: > > SET client_encoding='latin1'; > LISTEN foo; > > Backend b: > > NOTIFY foo, 'ハ'; > > Observations: > > - If the connection is idle when the notification is received, the ERROR > is turned into FATAL anyway: > > postgres=# SET client_encoding='latin1'; > SET > postgres=# LISTEN foo; > LISTEN > postgres=# select 1; -- do the NOTIFY in another connection before this > ERROR: character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8" > has no equivalent in encoding "LATIN1" > FATAL: terminating connection because protocol synchronization was lost > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > > - If there are multiple notifications pending, we stop processing the > subsequent notifications on the first error. The subsequent > notifications will only be processed when another notify interrupt is > received, i.e. when a backend sends yet another notification. > > I'm getting more and more convinced that escalating all ERRORs to FATALs > during notify processing is the right way to go. Attached is a new patch > set that does that. > One point about removing try/catch: when we execute LISTEN for the very first time, we read the queue from the min(listener's pos) up to the head. listenChannels is empty at the moment, so we never call NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have some problematic entry in the queue that we need to process during initial LISTEN, it could block creation of new listeners. Is it something we need to worry about? Best regards, Arseniy Mukhin
pgsql-hackers by date: