Re: failover logical replication slots - Mailing list pgsql-hackers

From Fabrice Chapuis
Subject Re: failover logical replication slots
Date
Msg-id CAA5-nLC2__W71QmQtZ37Cm0-6jf5ZJUkjbb2QqrR1HYTNB3M=g@mail.gmail.com
Whole thread Raw
In response to Re: failover logical replication slots  (Fabrice Chapuis <fabrice636861@gmail.com>)
List pgsql-hackers
Hi Amit,
Here is a proposed solution to handle the problem of creating the logical replication slot on standby after a switchover.
Thank you for your comments and help on this issue

Regards

Fabrice

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 656e66e..296840a 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -627,6 +627,7 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
        ReplicationSlot *slot;
        XLogRecPtr      latestFlushPtr;
        bool            slot_updated = false;
+       bool            overwriting_failover_slot = true; /* could be a GUC */

        /*
         * Make sure that concerned WAL is received and flushed before syncing
@@ -654,19 +655,37 @@ synchronize_one_slot(RemoteSlot *remote_slot, Oid remote_dbid)
        if ((slot = SearchNamedReplicationSlot(remote_slot->name, true)))
        {
                bool            synced;
+               bool            failover_status = remote_slot->failover;;

                SpinLockAcquire(&slot->mutex);
                synced = slot->data.synced;
                SpinLockRelease(&slot->mutex);

-               /* User-created slot with the same name exists, raise ERROR. */
-               if (!synced)
-                       ereport(ERROR,
+               if (!synced){
+                       /*
+                        * Check if we need to overwrite an existing failover slot and
+                        * if slot has the failover flag set to true
+                        * and the sync_replication_slots is on,
+                        * other check could be added here */
+                       if (overwriting_failover_slot && failover_status && sync_replication_slots){
+
+                               /* Get rid of a replication slot that is no longer wanted */
+                               ReplicationSlotDrop(remote_slot->name, true);
+                               ereport(WARNING,
+                                       errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+                                       errmsg("slot \"%s\" already exists"
+                                               " on the standby but it will be dropped because overwriting_failover_slot is set to true",
+                                               remote_slot->name));
+                               return false; /* Going back to the main loop after droping the failover slot */
+                       }
+                       /* User-created slot with the same name exists, raise ERROR. */
+                       else
+                               ereport(ERROR,
                                        errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
                                        errmsg("exiting from slot synchronization because same"
                                                   " name slot \"%s\" already exists on the standby",
                                                   remote_slot->name));
-
+               }
                /*
                 * The slot has been synchronized before.
                 *


On Thu, Jun 12, 2025 at 4:27 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
yes of course, maybe for PG 19

Regards,
Fabrice

On Thu, Jun 12, 2025 at 12:31 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Thu, Jun 12, 2025 at 3:53 PM Fabrice Chapuis <fabrice636861@gmail.com> wrote:
>
> However, the problem still persists: it is currently not possible to perform an automatic switchover after creating a new subscription.
>
> Would it be reasonable to consider adding a GUC to address this issue?
> I can propose a patch in that sense if it seems appropriate.
>

Yeah, we can consider that, though I don't know at this stage if GUC
is the only way, but I hope you understand that it will be for PG19.

--
With Regards,
Amit Kapila.

pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: CHECKPOINT unlogged data
Next
From: Tom Lane
Date:
Subject: Re: What is a typical precision of gettimeofday()?