On Tue, Sep 9, 2025 at 2:19 PM Ashutosh Sharma <ashu.coek88@gmail.com> wrote:
>
> Hi,
>
>
> + * required resources. Clear any leftover 'synced' flags on replication
> + * slots when in crash recovery on the primary. The DB_IN_CRASH_RECOVERY
> + * state check ensures that this code is only reached when a standby
> + * server crashes during promotion.
> */
> StartupReplicationSlots();
> + if (ControlFile->state == DB_IN_CRASH_RECOVERY)
>
> I believe the primary server can also enter the DB_IN_CRASH_RECOVERY
> state. For example, if the primary is already in crash recovery and
> crashes again while in crash recovery, it will restart in the
> DB_IN_CRASH_RECOVERY state, no?
>
Yes, good point. I think we can differentiate the two cases based on
the timeline change. A regular primary won't have a timeline change,
whereas a promoted standby that failed during promotion will show a
timeline change immediately upon restart. Thoughts?
In the worst-case scenario, even if we end up running the Reset
function during a regular primary's crash recovery, it shouldn't cause
any harm. (That said, I'm not suggesting we shouldn't fix it). What
concerns me more is the possibility of running it on a regular
standby, as it could disrupt slot synchronization. I attempted to
simulate a scenario where a regular standby ends up in
DB_IN_CRASH_RECOVERY after a crash, but I couldn't reproduce it. Do
you know of any situation where this could happen? The absence of
comments for these states makes it challenging to follow the flow.
> --
>
> With this change are we saying that on primary the synced flag must be
> always false. Because the postgres doc on pg_replication_slots says:
>
> "The value of this column has no meaning on the primary server; the
> column value on the primary is default false for all slots but may (if
> leftover from a promoted standby) also be true."
>
The doc needs change.
thanks
Shveta