On Fri, Jul 4, 2025 at 7:42 PM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> On Fri, Jul 4, 2025 at 9:23 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
> > > >
> > >
> > > How about this:
> > > We change the following sentence in the third paragraph
> > > To confirm that the standby server is indeed ready for failover <new
> > > addition> so that a given PostgreSQL subscriber can continue logical
> > > replication </new addition>, follow ... . <new addition> A
> > > non-PostgreSQL downstream may need to device a different way to find
> > > the slots corresponding to its subscriptions or use the next section.
> > >
> > > Then add a separate paragraph at the end or a separate section like below.
> > >
> > > In order to check whether a standby server is ready for failover so
> > > that all the subscribers, PostgreSQL as well as non-PostgreSQL, can
> > > continue logical replication, follow these steps make sure that all
> > > the replication slots, on the primary server, that have property
> > > failover = true are synchronized to the standby server.
> > > 1. On the primary server run following query
> > > select slot_name from pg_replication_slots where failover and NOT temporary
> > >
> > > 2. Check that the logical replication slots identified above exist on
> > > the standby server and are ready for failover.
> > > SELECT slot_name, (synced AND NOT temporary AND NOT conflicting) AS
> > > failover_ready
> > > FROM pg_replication_slots
> > > WHERE slot_name IN
> > >
> > > Does that look good?
> > >
> >
> > Yes, something on these lines sounds like an improvement. Would you
> > like to propose a patch or want Shveta or me to do the same?
>
> How about something like attached.
Thanks for the patch.
> I couldn't figure out whether the query on primary to fetch all the
> slots to be synchronized should filter based on invalidation_reason
> and conflicting or not. According to synchronize_slots(), it seems
> that we retain invalidated slots on standby when failover = true and
> they would remain with synced = true on standby. Is that right?
>
Yes, that’s correct. We sync the invalidation status of replication
slots from the primary to the standby and then stop synchronizing any
slots that have been marked as invalidated, retaining synced flag as
true. IMO, there's no need to filter out conflicting slots on the
primary, because instead of excluding them there and showing
everything as failover-ready on the standby, the correct approach is
to reflect the actual state on standby.This means conflicting slots
will appear as non-failover-ready on the standby. That’s why Step 3
also considers conflicting flag in its evaluation.
1)
+/* primary # */ SELECT array_agg(quote_literal(r.slot_name)) AS slots
+ FROM pg_replication_slots r
+ WHERE r.failover AND NOT r.temporary;
On primary, there is no need to check temporary-status. We do not
allow setting failover as true for temporary slots.
2)
Although not directly related to the concerns addressed in the given
patch, I think it would be helpful to add a note in the original doc
stating that Steps 1 and 2 should be executed on each subscriber node
that will be served by the standby after failover.
I have attached a top-up patch with the above changes and a few more
trivial changes. Please include it if you find it okay.
thanks
Shveta