Home > mailing lists

Re: [PATCH] Fix fragile walreceiver test. - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: [PATCH] Fix fragile walreceiver test.
Date	November 5, 2025 10:55:46
Msg-id	aQsDAkt6yblQxGgM@paquier.xyz Whole thread Raw
In response to	Re: [PATCH] Fix fragile walreceiver test. (Xuneng Zhou <xunengzhou@gmail.com>)
Responses	Re: [PATCH] Fix fragile walreceiver test. Re: [PATCH] Fix fragile walreceiver test.
List	pgsql-hackers

Tree view

On Wed, Nov 05, 2025 at 03:30:30PM +0800, Xuneng Zhou wrote:
> On Wed, Nov 5, 2025 at 2:50 PM Michael Paquier <michael@paquier.xyz> wrote:
>> Timing issue then, the buildfarm has not been complaining on this one
>> AFAIK, there have been no recoveryCheck failures reported:
>> https://buildfarm.postgresql.org/cgi-bin/show_failures.pl

drongo has just reported one failure, so I stand corrected:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2025-11-05%2003%3A50%3A50

And one log rotation should be enough before the restart.

>> Hmm.  The reason why I didn't use a PID matching check (mentioned at
>> [1]) is that this is not entirely bullet-proof.  On a very slow
>> machine, one could assume that standby_1 generates some records and
>> that these are replayed by standby_2 *before* the PID of the WAL
>> receiver is retrieved.  This could lead to false positives in some
>> cases, and a bunch of buildfarm members are very slow.  You have a
>> point that these would unlikely happen in normal runs, so a PID
>> matching check would be relevant most of the time anyway, even if the
>> original PID has been fetched after the TLI jump has been processed in
>> standby_2.  I'd rather keep the log check, TBH, bypassing it with an
>> extra rotate_logfile() before the restart of standby_2.
>
> I’ve also prepared a patch for this method.

That's exactly what I have done a couple of minutes ago, and noticed
your message before applying the fix so I've listed you are a
co-author on this one.

I have also kept the PID check after pondering a bit about it.  A TLI
jump could be replayed before we grab the initial PID, but in most
cases it should be able to do its work correctly.
--
Michael

Attachment

signature.asc

pgsql-hackers by date:

From: Nishant Sharma
Date: 05 November 2025, 10:32:07
Subject: Re: [PATCH] Add pg_get_tablespace_ddl() function to reconstruct CREATE TABLESPACE statement

From: Alexander Lakhin
Date: 05 November 2025, 11:00:01
Subject: Re: ubsan

Re: [PATCH] Fix fragile walreceiver test. - Mailing list pgsql-hackers

Attachment

Previous

Next