Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue |
Date | |
Msg-id | CAD21AoBAcr6nAx5C4soYO3iR9dda_k8hFLok1Psd0R070qs9hg@mail.gmail.com Whole thread Raw |
In response to | Re: LISTEN/NOTIFY bug: VACUUM sets frozenxid past a xid in async queue ("Matheus Alcantara" <matheusssilv97@gmail.com>) |
List | pgsql-hackers |
On Sat, Sep 6, 2025 at 7:15 AM Matheus Alcantara <matheusssilv97@gmail.com> wrote: > > On Fri Sep 5, 2025 at 9:56 AM -03, Arseniy Mukhin wrote: > >> Ok, I agree that that this may happen but I don't see this as a common > >> case to fix the issue based on this behaviour. I think that we check the > >> transaction status also to skip notifications that were added on the > >> queue by transactions that are not fully committed yet, and I see this > >> scenario as a more common case but I could be wrong. > >> > > > > IIUC we don't need clog to check if the notification transaction is > > still in progress, we use a snapshot for that (XidInMVCCSnapshot()). > > If we see that the notification transaction is still in progress, we > > postpone processing of that notification. If we see that notification > > transaction is not in progress, then we check its status with > > TransactionIdDidCommit() (using clog) and only then fail if clog > > doesn't have notification transcation data (as a quick check, we can > > see in the stack trace that the failure occurs during the call to > > TransactionIdDidCommit, not XidInMVCCSnapshot). > > > > So we need to check notification transaction status only to understand > > if a transaction was committed and we can send notification or it was > > aborted (somewhere in between PreCommit_Notify and AtCommit_Notify) > > and we should skip notification. And the solution that Yura Sokolov > > suggested is to introduce a 'commited' flag, so we can check it > > instead of calling TransactionIdDidCommit. Another option is to make > > aborted transactions to clean up after themselves in AtAbort_Notify(), > > so if listener see notification in the queue and transaction that > > wrote it is not in progress, then such notification was always created > > by committed transaction and we also don't need to call > > TransactionIdDidCommit. One way for aborted transactions to clean up > > is to set its notifications dbOid to 0, so listeners skip them. > > Another option I think is to set queue HEAD back to the position where > > it was when the aborted transaction started to write to the queue. It > > also makes such aborted transaction notifications 'invisible' for > > listeners. We have a global lock and there is always only one writer, > > so it sounds possible. And there is another option which is the patch > > that Rishu Bagga is working on. In the patch all transactions write to > > the listen/notify queue only after they wrote a commit record to the > > WAL, so all notifications in the queue are always by committed > > transactions and we don't need to call TransactionIdDidCommit. > > > You are right, I missed the XidInMVCCSnapshot() call, sorry about that > and thanks very much for all the explanation! > > I'll keep on hold the development of new patch versions for this thread > and focus on review and test the patch from Rishu at [1] to see if we > can make progress using the WAL approach. While the WAL-based approach discussed on another thread is promising, I think it would not be acceptable for back branches as it requires quite a lot of refactoring. Given that this is a long-standing bug in listen/notify, I think we can continue discussing how to fix the issue on backbranches on this thread. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: