Hi,
On Thu, Oct 30, 2025 at 7:36 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 29, 2025 at 02:36:10PM +0800, Xuneng Zhou wrote:
> > Should we wrap this assertion within #ifdef USE_ASSERT_CHECKING?
> >
> > #ifdef USE_ASSERT_CHECKING
> > {
> > WalRcvData *walrcv = WalRcv;
> > WalRcvState state;
> > SpinLockAcquire(&walrcv->mutex);
> > state = walrcv->walRcvState;
> > SpinLockRelease(&walrcv->mutex);
> > Assert(state != WALRCV_STOPPING);
> > }
> > #endif
>
> Yep. Or provide a wrapper routine that's able to retrieve the state
> of a WAL receiver, then add it in xlogrecovery.c within an Assert().
> It may be better to only do that on HEAD, the back-branches stress me
> a bit when we enforce such new assumptions.. Perhaps Noah has a
> different opinion.
Using a wrapper function seems better here.
> > However, I think Michael’s approach is better — it’s simpler, more
> > straightforward, and captures the buggy behavior well. I plan to
> > incorporate it into the next version of the patch.
>
> Thanks!
>
> One weakness of this approach is that this test would not be able to
> work if the error message is reworded for some reason. On stable
> branches, I doubt that this error message will ever change, but it
> could be updated to something else on HEAD.
>
> Hence, I would suggest one extra improvement for HEAD: let's also add
> a check for a valid case where a WAL receiver is stopped. We have one
> in 040_standby_failover_slots_sync.pl, where primary_conninfo is
> updated in standby1 and its configuration reloaded, so I'd be tempted
> to add a valid check in the server logs of $standby1. One difference
> is that we should grab the offset of the logs before the reload, then
> make sure that the log entry gets created with a wait_for_log(), as
> there may be a delay between the reload and the moment the log entry
> is generated.
Thanks for your suggestion!
I’ve incorporated both tests and the assertion into the patch applied to master.
For earlier stable branches, the test in
040_standby_failover_slots_sync.pl and the assertion for
WAL receiver state checking are not included (if Noah agrees too). Please check.
Best,
Xuneng