Re: BUG #19093: Behavioral change in walreceiver termination between PostgreSQL 14.17 and 14.18 - Mailing list pgsql-bugs

From Xuneng Zhou
Subject Re: BUG #19093: Behavioral change in walreceiver termination between PostgreSQL 14.17 and 14.18
Date
Msg-id CABPTF7XE--Dc97LXj5Xum1U4v5=CQnrVYFbVjWTWu4J7gCNXAg@mail.gmail.com
Whole thread Raw
In response to Re: BUG #19093: Behavioral change in walreceiver termination between PostgreSQL 14.17 and 14.18  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #19093: Behavioral change in walreceiver termination between PostgreSQL 14.17 and 14.18
List pgsql-bugs
Hi,

On Thu, Oct 30, 2025 at 7:36 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Oct 29, 2025 at 02:36:10PM +0800, Xuneng Zhou wrote:
> > Should we wrap this assertion within #ifdef USE_ASSERT_CHECKING?
> >
> > #ifdef USE_ASSERT_CHECKING
> > {
> > WalRcvData *walrcv = WalRcv;
> > WalRcvState state;
> > SpinLockAcquire(&walrcv->mutex);
> > state = walrcv->walRcvState;
> > SpinLockRelease(&walrcv->mutex);
> > Assert(state != WALRCV_STOPPING);
> > }
> > #endif
>
> Yep.  Or provide a wrapper routine that's able to retrieve the state
> of a WAL receiver, then add it in xlogrecovery.c within an Assert().
> It may be better to only do that on HEAD, the back-branches stress me
> a bit when we enforce such new assumptions..  Perhaps Noah has a
> different opinion.

Using a wrapper function seems better here.

> > However, I think Michael’s approach is better — it’s simpler, more
> > straightforward, and captures the buggy behavior well. I plan to
> > incorporate it into the next version of the patch.
>
> Thanks!
>
> One weakness of this approach is that this test would not be able to
> work if the error message is reworded for some reason.  On stable
> branches, I doubt that this error message will ever change, but it
> could be updated to something else on HEAD.
>
> Hence, I would suggest one extra improvement for HEAD: let's also add
> a check for a valid case where a WAL receiver is stopped.  We have one
> in 040_standby_failover_slots_sync.pl, where primary_conninfo is
> updated in standby1 and its configuration reloaded, so I'd be tempted
> to add a valid check in the server logs of $standby1.  One difference
> is that we should grab the offset of the logs before the reload, then
> make sure that the log entry gets created with a wait_for_log(), as
> there may be a delay between the reload and the moment the log entry
> is generated.

Thanks for your suggestion!

I’ve incorporated both tests and the assertion into the patch applied to master.
For earlier stable branches, the test in
040_standby_failover_slots_sync.pl and the assertion for
WAL receiver state checking are not included (if Noah agrees too). Please check.

Best,
Xuneng

Attachment

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #19099: Conditional DELETE from partitioned table with non-updatable partition raises internal error
Next
From: Paul A Jungwirth
Date:
Subject: Re: BUG #19098: Can't create unique gist index, where pg_indexes says that WITHOUT OVERLAPS does exacly that