Re: Improve pg_sync_replication_slots() to wait for primary to advance - Mailing list pgsql-hackers

From Japin Li
Subject Re: Improve pg_sync_replication_slots() to wait for primary to advance
Date
Msg-id SYAPR01MB30383D32734FB370337D5FE6B6DCA@SYAPR01MB3038.ausprd01.prod.outlook.com
Whole thread Raw
List pgsql-hackers
On Fri, 28 Nov 2025 at 15:46, Ajin Cherian <itsajin@gmail.com> wrote:
> On Wed, Nov 26, 2025 at 7:42 PM shveta malik <shveta.malik@gmail.com>
> wrote:
>>
>>  A few comments:
>>
>> 1)
>> +/*
>> + * Structure holding parameters that need to be freed on error in
>> + * pg_sync_replication_slots()
>> + */
>> +typedef struct SlotSyncApiFailureParams
>> +{
>> + WalReceiverConn *wrconn;
>> + List *slot_names;
>> +} SlotSyncApiFailureParams;
>> +
>>
>> We can get rid of it now as we do not use it.
>>
>
> Removed.
>
>> 2)
>> ProcessSlotSyncInterrupts():
>>
>> + if (!AmLogicalSlotSyncWorkerProcess())
>> + ereport(ERROR,
>> + errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
>> + errmsg("cannot continue replication slots synchronization"
>> +    " as standby promotion is triggered"));
>> + else
>> + {
>>
>> Can we please reverse the if-else i.e. first worker and then API.
>> Negated if-condition can be avoided in this case.
>>
>
> Changed.
>
>> 3)
>>
>> slotsync_reread_config():
>> + /* Worker-specific check for sync_replication_slots change */
>>
>> Now since we check for both API and worker, this comment is not
>> needed.
>>
>
> Removed.
>
>> 4)
>> - ereport(LOG,
>> - errmsg("replication slot synchronization worker will restart
>> because
>> of a parameter change"));
>>
>> - /*
>> - * Reset the last-start time for this worker so that the postmaster
>> - * can restart it without waiting for
>> SLOTSYNC_RESTART_INTERVAL_SEC.
>> - */
>> - SlotSyncCtx->last_start_time = 0;
>> + ereport(AmLogicalSlotSyncWorkerProcess() ? LOG : ERROR,
>> + errmsg("replication slot synchronization will stop because of a
>> parameter change"));
>>
>> Here, we should retain the same old message for worker i.e. 'worker
>> will restart..'. instead of 'synchronization will stop'. I find the
>> old message better in this case.
>>
>> 5)
>> slotsync_reread_config() is slightly difficult to follow.
>> I think in the case of API, we can display a common error message
>> instead of 2 different messages for 'sync_replication_slot change'
>> and
>> the rest of the parameters. We can mark if any of the parameters
>> changed in both 'if' blocks and if the current process has not
>> exited,
>> then at the end based on 'parameter-changed', we can deal with API
>> by
>> giving a common message. Something like:
>>
>> /*
>>  * If we have reached here with a parameter change, we must be
>> running
>>  * in SQL function, emit error in such a case.
>>  */
>> if (parameter_changed (new variable))
>> {
>> Assert (!AmLogicalSlotSyncWorkerProcess);
>> ereport(ERROR,
>> errmsg("replication slot synchronization will stop because of a
>> parameter change"));
>> }
>>
>
> Fixed as above.
>
> I've addressed the above comments as well as rebased the patch based
> on changes in commit 76b7872 in patch v26
>

1.
Initialize slot_persistence_pending to false (to avoid uninitialized values, or
initialize to true by mistaken) in update_and_persist_local_synced_slot(). This
aligns with the handling of found_consistent_snapshot and remote_slot_precedes
in update_local_synced_slot().

diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c
index 20eada3393..c55ba11f17 100644
--- a/src/backend/replication/logical/slotsync.c
+++ b/src/backend/replication/logical/slotsync.c
@@ -617,6 +617,9 @@ update_and_persist_local_synced_slot(RemoteSlot *remote_slot, Oid remote_dbid,
        bool            found_consistent_snapshot = false;
        bool            remote_slot_precedes = false;

+       if (slot_persistence_pending)
+               *slot_persistence_pending = false;
+
        /* Slotsync skip stats are handled in function update_local_synced_slot() */
        (void) update_local_synced_slot(remote_slot, remote_dbid,
                                                                        &found_consistent_snapshot,

2.
This change seems unnecessary。

 static void
-slotsync_reread_config(void)
+slotsync_reread_config()

 static void
-ProcessSlotSyncInterrupts(void)
+ProcessSlotSyncInterrupts()

3.
Since we are already caching the result of AmLogicalSlotSyncWorkerProcess() in
a local worker variable, how about applying this replacement:
s/if (AmLogicalSlotSyncWorkerProcess())/if (worker)/g?

+    bool        worker = AmLogicalSlotSyncWorkerProcess();
+    bool        parameter_changed = false;

-    Assert(sync_replication_slots);
+    if (AmLogicalSlotSyncWorkerProcess())
+        Assert(sync_replication_slots);

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.



pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
Next
From: vignesh C
Date:
Subject: Re: Add support for specifying tables in pg_createsubscriber.