Re: Clear logical slot's 'synced' flag on promotion of standby - Mailing list pgsql-hackers

From Ashutosh Sharma
Subject Re: Clear logical slot's 'synced' flag on promotion of standby
Date
Msg-id CAE9k0P=ODwH5aB-skBgffvDS010Jo1h=wGpLpE0aCqnqfx2+xg@mail.gmail.com
Whole thread Raw
In response to Re: Clear logical slot's 'synced' flag on promotion of standby  (shveta malik <shveta.malik@gmail.com>)
Responses Re: Clear logical slot's 'synced' flag on promotion of standby
List pgsql-hackers
On Thu, Sep 11, 2025 at 9:17 AM shveta malik <shveta.malik@gmail.com> wrote:
>
> On Tue, Sep 9, 2025 at 2:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
> >
> > Hi,
> >
> >
> > + * required resources. Clear any leftover 'synced' flags on replication
> > + * slots when in crash recovery on the primary. The DB_IN_CRASH_RECOVERY
> > + * state check ensures that this code is only reached when a standby
> > + * server crashes during promotion.
> >   */
> >   StartupReplicationSlots();
> > + if (ControlFile->state == DB_IN_CRASH_RECOVERY)
> >
> > I believe the primary server can also enter the DB_IN_CRASH_RECOVERY
> > state. For example, if the primary is already in crash recovery and
> > crashes again while in crash recovery, it will restart in the
> > DB_IN_CRASH_RECOVERY state, no?
> >
>
> Yes, good point. I think we can differentiate the two cases based on
> the timeline change. A regular primary won't have a timeline change,
> whereas a promoted standby that failed during promotion will show a
> timeline change immediately upon restart. Thoughts?
>

Will there be any issues if we clear the sync status immediately after
the standby.signal file is removed from the standby server?

We could maybe introduce a temporary "promote.inprogress" marker file
on disk before removing standby.signal. The sequence would be:

1) Create promote.inprogress.
2) Unlink standby.signal
3) Clear the sync slot status.
4) Remove promote.inprogress.

This way, if the server crashes after standby.signal is removed but
before the sync status is cleared, the presence of promote.inprogress
would indicate that the standby was in the middle of promotion and
crashed before slot cleanup. On restart, we could use that marker to
detect the incomplete promotion and finish clearing the sync flags.

If the crash happens at a later stage, the server will no longer start
as a standby anyway, and by then the sync flags would already have
been reset.

This is just a thought and it may sound a bit naive. Let me know if I
am overlooking something.

--
With Regards,
Ashutosh Sharma.



pgsql-hackers by date:

Previous
From: Kouber Saparev
Date:
Subject: Re: BF mamba failure
Next
From: Peter Eisentraut
Date:
Subject: Re: Only one version can be installed when using extension_control_path