RE: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: Synchronizing slots from primary to standby
Date
Msg-id OS0PR01MB571697AE205D17054349A1DF943E2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Whole thread Raw
In response to RE: Synchronizing slots from primary to standby  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: Synchronizing slots from primary to standby
RE: Synchronizing slots from primary to standby
List pgsql-hackers
On Tuesday, April 2, 2024 8:35 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
> 
> On Monday, April 1, 2024 7:30 PM Amit Kapila <amit.kapila16@gmail.com>
> wrote:
> >
> > On Mon, Apr 1, 2024 at 6:26 AM Zhijie Hou (Fujitsu)
> > <houzj.fnst@fujitsu.com>
> > wrote:
> > >
> > > On Friday, March 29, 2024 2:50 PM Amit Kapila
> > > <amit.kapila16@gmail.com>
> > wrote:
> > > >
> > >
> > > >
> > > >
> > > > 2.
> > > > +extern XLogRecPtr pg_logical_replication_slot_advance(XLogRecPtr
> > moveto,
> > > > +   bool *found_consistent_point);
> > > > +
> > > >
> > > > This API looks a bit awkward as the functionality doesn't match
> > > > the name. How about having a function with name
> > > > LogicalSlotAdvanceAndCheckReadynessForDecoding(moveto,
> > > > ready_for_decoding) with the same functionality as your patch has
> > > > for
> > > > pg_logical_replication_slot_advance() and then invoke it both from
> > > > pg_logical_replication_slot_advance and slotsync.c. The function
> > > > name is too big, we can think of a shorter name. Any ideas?
> > >
> > > How about LogicalSlotAdvanceAndCheckDecodingState() Or just
> > > LogicalSlotAdvanceAndCheckDecoding()?
> > >
> >
> > It is about snapbuild state, so how about naming the function as
> > LogicalSlotAdvanceAndCheckSnapState()?
> 
> It looks better to me, so changed.
> 
> >
> > I have made quite a few cosmetic changes in comments and code. See
> > attached. This is atop your latest patch. Can you please review and
> > include these changes in the next version?
> 
> Thanks, I have reviewed and merged them.
> Attach the V5 patch set which addressed above comments and ran pgindent.

I added one test in 040_standby_failover_slots_sync.pl in 0002 patch, which can
reproduce the data loss issue consistently on my machine. It may not reproduce
in some rare cases if concurrent xl_running_xacts are written by bgwriter, but
I think it's still valuable if it can verify the fix in most cases. The test
will fail if directly applied on HEAD, and will pass after applying atop of
0001.

Best Regards,
Hou zj

Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Introduce XID age and inactive timeout based replication slot invalidation
Next
From: Nazir Bilal Yavuz
Date:
Subject: Re: Use streaming read API in ANALYZE