Hi Ajin,
On Thu, Sep 18, 2025 at 4:16 PM Ajin Cherian <itsajin@gmail.com> wrote:
>
> On Fri, Sep 12, 2025 at 1:56 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > The approach seems valid and should work, but introducing a new file
> > like promote.inprogress for this purpose might be excessive. We can
> > first try analyzing existing information to determine whether we can
> > distinguish between the two scenarios -- a primary in crash recovery
> > immediately after a promotion attempt versus a regular primary. If we
> > are unable to find any way, we can revisit the idea.
> >
>
> I needed a way to reset slots not only during promotion, but also
> after a crash that occurs while slots are being reset, so there would
> be a fallback mechanism to clear them again on startup. As Shveta
> pointed out, it wasn’t trivial to tell apart a standby restarting
> after crashing during promotion from a primary restarting after a
> crash. So I decided to just reset slots every time primary (or a
> standby after promotion) restarts.
>
> Because this fallback logic will run on every primary restart, it was
> important to minimize overhead added by the patch. After some
> discussion, I placed the reset logic in RestoreSlotFromDisk(), which
> is invoked by StartupReplicationSlots() whenever the server starts.
> Because RestoreSlotFromDisk() already loops through all slots, this
> adds minimum extra work; but also ensures the synced flag is cleared
> when running on a primary.
>
> The next challenge was finding a reliable flag to distinguish
> primaries from standbys, since we really don’t want to reset the flag
> on a standby. I tested StandbyMode, RecoveryInProgress(), and
> InRecovery. But during restarts, both RecoveryInProgress() and
> InRecovery are always true on both primary and standby. In all my
> testing, StandbyMode was the only variable that consistently
> differentiated between the two, which is what I used.
>
+/*
+ * ResetSyncedSlots()
+ *
+ * Reset all replication slots that have synced=true to synced=false.
+ */
I feel this is not correct, we are force resetting sync flag status
for all logical slots, not just the one that is set to true.
--
@@ -2664,6 +2715,10 @@ RestoreSlotFromDisk(const char *name)
memcpy(&slot->data, &cp.slotdata,
sizeof(ReplicationSlotPersistentData));
+ /* reset synced flag if this is a primary server */
+ if (!StandbyMode)
+ slot->data.synced = false;
+
I think you also need to ensure that you are only doing this for the
logical slots, it's currently just checking if the slot is in-use or
not.
--
With Regards,
Ashutosh Sharma.