On Wed, Oct 29, 2025 at 02:36:10PM +0800, Xuneng Zhou wrote:
> Should we wrap this assertion within #ifdef USE_ASSERT_CHECKING?
>
> #ifdef USE_ASSERT_CHECKING
> {
> WalRcvData *walrcv = WalRcv;
> WalRcvState state;
> SpinLockAcquire(&walrcv->mutex);
> state = walrcv->walRcvState;
> SpinLockRelease(&walrcv->mutex);
> Assert(state != WALRCV_STOPPING);
> }
> #endif
Yep. Or provide a wrapper routine that's able to retrieve the state
of a WAL receiver, then add it in xlogrecovery.c within an Assert().
It may be better to only do that on HEAD, the back-branches stress me
a bit when we enforce such new assumptions.. Perhaps Noah has a
different opinion.
> However, I think Michael’s approach is better — it’s simpler, more
> straightforward, and captures the buggy behavior well. I plan to
> incorporate it into the next version of the patch.
Thanks!
One weakness of this approach is that this test would not be able to
work if the error message is reworded for some reason. On stable
branches, I doubt that this error message will ever change, but it
could be updated to something else on HEAD.
Hence, I would suggest one extra improvement for HEAD: let's also add
a check for a valid case where a WAL receiver is stopped. We have one
in 040_standby_failover_slots_sync.pl, where primary_conninfo is
updated in standby1 and its configuration reloaded, so I'd be tempted
to add a valid check in the server logs of $standby1. One difference
is that we should grab the offset of the logs before the reload, then
make sure that the log entry gets created with a wait_for_log(), as
there may be a delay between the reload and the moment the log entry
is generated.
--
Michael