RE: Conflict detection for update_deleted in logical replication - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: Conflict detection for update_deleted in logical replication |
Date | |
Msg-id | TY4PR01MB16907EEBB6D0CC953EC191E599403A@TY4PR01MB16907.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Conflict detection for update_deleted in logical replication (shveta malik <shveta.malik@gmail.com>) |
Responses |
Re: Conflict detection for update_deleted in logical replication
Re: Conflict detection for update_deleted in logical replication |
List | pgsql-hackers |
On Friday, September 5, 2025 2:01 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Thu, Sep 4, 2025 at 3:30 PM Zhijie Hou (Fujitsu) > <houzj.fnst@fujitsu.com> wrote: > > > > Hi, > > > > As reported by Robert[1], it is worth adding a test for the race condition in > > the RecordTransactionCommitPrepared() function to reduce the risk of > future code > > changes: > > > > /* > > * Note it is important to set committs value after marking ourselves > as > > * in the commit critical section (DELAY_CHKPT_IN_COMMIT). > This is because > > * we want to ensure all transactions that have acquired commit > timestamp > > * are finished before we allow the logical replication client to > advance > > * its xid which is used to hold back dead rows for conflict > detection. > > * See comments atop worker.c. > > */ > > committs = GetCurrentTimestamp(); > > > > While writing the test, I noticed a bug that conflict-relevant data could be > > prematurely removed before applying prepared transactions on the publisher > that > > are in the commit phase. This occurred because > GetOldestActiveTransactionId() > > was called on the publisher, which failed to account for the backend > executing > > COMMIT PREPARED, as this backend does not have an xid stored in > PGPROC. > > > > Since this issue overlaps with the race condition related to timestamp > > acquisition, I've prepared a bug fix along with a test for the race condition. > > The 0001 patch fixes this issue by introducing a new function that iterates > over > > global transactions and identifies prepared transactions during the commit > > phase. 0002 added injection points and tap-test to test the bug and > timestamp > > acquisition. > > > > Thank You for the patches.Verified 001, works well. Just one minor comment: > > 1) > + PGPROC *proc = GetPGProcByNumber(gxact->pgprocno); > + PGPROC *commitproc; > We get first proc and then commitproc. proc is used only to get > databaseId, why don't we get databaseId from commitproc itself? I think it's OK to directly refer to commitproc, so changed. > > ~~ > > Few trivial comments for 002: Thanks for the comments. > 1) > +# Test that publisher's transactions marked with > DELAY_CHKPT_IN_COMMIT prevent > +# concurrently deleted tuples on the subscriber from being removed. > > Here shall we also mention something like: > This test also acts as a safeguard to prevent developers from moving > the timestamp acquisition before marking DELAY_CHKPT_IN_COMMIT in > RecordTransactionCommitPrepared. Added. > > 2) > # This test depends on an injection point to block the transaction commit after > # marking DELAY_CHKPT_IN_COMMIT flag. > > Shall we say: > ..to block the prepared transaction commit.. Changed. > > 3) > +# Create a publisher node. Disable autovacuum to stablized the tests related > to > +# manual VACUUM and transaction ID. > > to stablized --> to stabilize Fixed. > > 4) > +# Confirm that the dead tuple can be removed now > +my ($cmdret, $stdout, $stderr) = > + $node_subscriber->psql('postgres', qq(VACUUM (verbose) public.tab;)); > + > +ok($stderr =~ qr/1 are dead but not yet removable/, > + 'the deleted column is non-removable'); > > can be removed now --> cannot be removed now Fixed. Here are v2 patches which addressed above comments. Best Regards, Hou zj
Attachment
pgsql-hackers by date: