Re: failover logical replication slots - Mailing list pgsql-hackers
From | Fabrice Chapuis |
---|---|
Subject | Re: failover logical replication slots |
Date | |
Msg-id | CAA5-nLDDHcDiYawPoQ9W6w4qUQ-EysFtOWFoWnFgZDunHtQH6A@mail.gmail.com Whole thread Raw |
In response to | RE: failover logical replication slots ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>) |
List | pgsql-hackers |
Thanks for your reply.
The problem I see is that after creating a new subscription, we have:
1) if a failover occurs, on the new primary node, the failover and sync flags are both set to true, so there's no problem.
2) when the old node returns as a secondary in the cluster, the failover flag is set to true and the sync flag is set to false then
the error message is generated: ERROR: exiting from slot synchronization because same name slot "sub_test" already exists on the standby
the error message is generated: ERROR: exiting from slot synchronization because same name slot "sub_test" already exists on the standby
Why not change the value of the synced flag when the standby is joining the cluster ? If the slot on the primary node has the same name as the slot on the secondary node and the failover flag is set to true,
if ((slot = SearchNamedReplicationSlot(remote_slot->name, true))) {
slot->data.synced = true
...
Thanks for your feedback
Thanks for your feedback
On Wed, Jun 11, 2025 at 6:48 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
On Tue, Jun 10, 2025 at 11:46 PM Fabrice Chapuis wrote:
> I'm working with logical replication in a PostgreSQL 17 setup, and I'm
> exploring the new option to make replication slots failover safe in a highly
> available environment using physical standby nodes managed by Patroni.
>
> After a switchover, I encounter an error message in the PostgreSQL logs and observe unexpected behavior.
> Here are the different steps I followed:
>
> 1) Setting up a new subscription
>
> Logical replication is established between two databases on the same PostgreSQL instance.
>
> A logical replication slot is created on the source database:
>
> SELECT pg_create_logical_replication_slot('sub_test', 'pgoutput', false, false, true);
>
> A subscription is then configured on the target database:
>
> CREATE SUBSCRIPTION sub_test CONNECTION 'dbname=test host=localhost port=5432 user=user_test'
> PUBLICATION pub_test WITH (create_slot=false, copy_data=false, failover=true);
>
> The logical replication slot is active and in failover mode.
>
> 2) Starting the physical standby
>
> A logical replication slot is successfully created on the standby
>
> 3) Cluster switchover
>
> The switchover is initiated using the Patroni command:
>
> patronictl switchover
>
> The operation completes successfully, and roles are reversed in the cluster.
> ...
> 4) Issue encountered
> After the switchover, an error appears in the PostgreSQL logs:
>
> 2025-06-10 16:40:58.996 CEST [739829]: [1-1] user=,db=,client=,application= LOG: slot sync worker started
> 2025-06-10 16:40:59.011 CEST [739829]: [2-1] user=,db=,client=,application= ERROR: exiting from slot synchronization because same name slot "sub_test" already exists on the standby
> ...
> 5) Dropping the slot
>
> If the slot on the standby is deleted, it is then recreated with synced = true, and at that point, it successfully resynchronizes with the primary slot. Everything works correctly.
>
> Question:
> Why does the synced flag fail to change to true, even though sync_replication_slots is enabled (on)?
Thank you for reporting this. This behavior is expected because overwriting
existing slots on standbys is not permitted for now. Doing so poses a risk of
rendering slots created by users for other purposes unusable.
However, if needed, we could permit overwriting when the existing slot has
failover=true, given that enabling failover for slots on standbys is currently
disallowed, but this assumption might change in the future if we support
enabling failover to allow slot syncing to cascading standbys. Alternatively,
we could introduce options, such as a GUC, to control whether to overwrite
existing slots though not sure if it's worth it.
From a database user's perspective, it's necessary to clean up any leftover
slots on a new standby following a switchover, regardless of whether the
failover slot feature is supported. Because those leftover slots could lead to
excessive WAL accumulation.
Best Regards,
Hou zj
pgsql-hackers by date: