Re: Issue with logical replication slot during switchover - Mailing list pgsql-hackers

From Fabrice Chapuis
Subject Re: Issue with logical replication slot during switchover
Date
Msg-id CAA5-nLC=3QMehrktEB1MOrAk_ynG_u=pD_iUOup4MmkN2Xv0sA@mail.gmail.com
Whole thread Raw
In response to Re: Issue with logical replication slot during switchover  (Alexander Kukushkin <cyberdemn@gmail.com>)
Responses Re: Issue with logical replication slot during switchover
List pgsql-hackers
Hi Alexander,
I understand your use case but I think it won't work like this because of how is implemented get_local_synced_slots()
this function will return slots which synced flag to true only. 

in point 4  Start node1: synced flag is false and failover flag is true then it won't enter the loop and the slot will not be dropped

static void
drop_local_obsolete_slots(List *remote_slot_list)
{
List    *local_slots = get_local_synced_slots();

foreach_ptr(ReplicationSlot, local_slot, local_slots)
{
/* Drop the local slot if it is not required to be retained. */
if (!local_sync_slot_required(local_slot, remote_slot_list)) => will check of remote slot exist and if local slot is not invalidated
{
bool synced_slot;
...

We could change the function get_local_synced_slots  to make it work

get_local_synced_slots(void)
{
List    *local_slots = NIL;

LWLockAcquire(ReplicationSlotControlLock, LW_SHARED);

for (int i = 0; i < max_replication_slots; i++)
{
ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i];

/* Check if it is a synchronized slot */
if (s->in_use && (s->data.failover || s->data.synced))
{
Assert(SlotIsLogical(s));
local_slots = lappend(local_slots, s);

Thanks for your feedback

Regards,

Fabrice
....
On Mon, Nov 3, 2025 at 8:51 AM Alexander Kukushkin <cyberdemn@gmail.com> wrote:
Hi Fabrice,

On Fri, 31 Oct 2025 at 17:45, Fabrice Chapuis <fabrice636861@gmail.com> wrote:
Thanks for your large analyze and explanation Alexander. If we go in the direction you propose and leave the option to use a suplementary flag allow_overwrite, may I propose you some modifications in the patch v0 you have attached:
 
 Why modifying this function?
 
 drop_local_obsolete_slots(List *remote_slot_list)

Think about the following case:
1. We have node1 (primary) and node2 (standby), with slot foo(failover=true)
2. We stop node1 for maintenance and promote node2
3. DROP replication slot on node2
4. Start node1

In this scenario the logical slot "foo" will be left unattended and it needs to be dropped, because it doesn't exist on the primary.

As I said earlier, IMO it is only the "failover" flag that effectively indicates the purpose of the slot. The sync flag is purely informative.

Regards,
--
Alexander Kukushkin

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: Report bytes and transactions actually sent downtream
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] Add error message for out-of-memory in passwordFromFile()