Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers
| From | Heikki Linnakangas |
|---|---|
| Subject | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
| Date | |
| Msg-id | 76c4407c-7a2a-4626-b999-d066ba2d8d2e@iki.fi Whole thread Raw |
| In response to | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue (Arseniy Mukhin <arseniy.mukhin.dev@gmail.com>) |
| List | pgsql-hackers |
On 06/11/2025 13:20, Arseniy Mukhin wrote: > On Thu, Nov 6, 2025 at 1:05 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> >> On 05/11/2025 21:02, Matheus Alcantara wrote: >>> On Wed Nov 5, 2025 at 9:59 AM -03, Heikki Linnakangas wrote: >>>>> In case of an error on TransactionIdDidCommit() I think that we will >>>>> have the same behavior as we advance the "current" position before of >>>>> DidCommit call the PG_FINALLY block will set the backend position past >>>>> the failing notification entry. >>>> >>>> With my patch, if TransactionIdDidCommit() fails, we will lose all the >>>> notifications that we have buffered in the local buffer but haven't >>>> passed to NotifyMyFrontEnd() yet. On 'master', you only lose a single >>>> notification, the one where TransactionIdDidCommit() failed. >>>> >>> Yeah, that's true. >>> >>>>> How bad would be to store the notification entries within a list and >>>>> store the next position with the notification entry and then wrap the >>>>> NotifyMyFrontEnd() call within a PG_TRY and update the "current" to the >>>>> saved "next position" on PG_CATCH? Something like this: >>>> >>>> [ ...] >>>> >>>> That addresses the failure on NotifyMyFrontEnd, but not on >>>> TransactionIdDidCommit. >>>> >>>> IMHO we should just make these errors FATAL. TransactionIdDidCommit() >>>> should not really fail, after fixing the bug we're discussing. >>>> NotifyMyFrontEnd() could fail on OOM, but that seems pretty unlikely on >>>> an otherwise idle connection. Or it could fail if the client connection >>>> is lost, but then the backend is about to die anyway. And arguably >>>> closing the connection is better than losing even a single notification, >>>> anyway. >>>> >>> My only concern with making these errors FATAL is that if a notification >>> entry causes a different, recoverable error, all subsequent messages >>> will be lost. This is because if backend die and the user open a new >>> connection and execute LISTEN on the channel it will not see these >>> notifications past the one that caused the error. I'm not sure if we are >>> completely safe from this case of a recoverable error, what do you >>> think? >> >> I did some more testing of the current behavior, using the encoding >> conversion to cause an error: >> >> In backend A: >> >> SET client_encoding='latin1'; >> LISTEN foo; >> >> Backend b: >> >> NOTIFY foo, 'ハ'; >> >> Observations: >> >> - If the connection is idle when the notification is received, the ERROR >> is turned into FATAL anyway: >> >> postgres=# SET client_encoding='latin1'; >> SET >> postgres=# LISTEN foo; >> LISTEN >> postgres=# select 1; -- do the NOTIFY in another connection before this >> ERROR: character with byte sequence 0xe3 0x83 0x8f in encoding "UTF8" >> has no equivalent in encoding "LATIN1" >> FATAL: terminating connection because protocol synchronization was lost >> server closed the connection unexpectedly >> This probably means the server terminated abnormally >> before or while processing the request. >> >> - If there are multiple notifications pending, we stop processing the >> subsequent notifications on the first error. The subsequent >> notifications will only be processed when another notify interrupt is >> received, i.e. when a backend sends yet another notification. >> >> I'm getting more and more convinced that escalating all ERRORs to FATALs >> during notify processing is the right way to go. Attached is a new patch >> set that does that. > > One point about removing try/catch: when we execute LISTEN for the > very first time, we read the queue from the min(listener's pos) up to > the head. listenChannels is empty at the moment, so we never call > NotifyMyFrontEnd. But we do call TransactionIdDidCommit. So if we have > some problematic entry in the queue that we need to process during > initial LISTEN, it could block creation of new listeners. Is it > something we need to worry about? Hmm, I don't see how it could block the creation of new listeners. Exec_ListenPreCommit() starts from "max(listener's pos)" over existing listeners, not min(). Am I missing something? That said, Exec_ListenPreCommit() could start from 'min' among all listeners, if there are no listeners for the same backend. Per attached patch. I don't think it makes much difference though. Another small change we could make is to check for listenChannels == NIL before calling TransactionIdDidCommit. - Heikki
Attachment
pgsql-hackers by date: