Home > mailing lists

[PATCH] Fix fragile walreceiver test. - Mailing list pgsql-hackers

From	Bryan Green
Subject	[PATCH] Fix fragile walreceiver test.
Date	November 5 09:03:29
Msg-id	9d00b597-d64a-4f1e-802e-90f9dc394c70@gmail.com Whole thread Raw
Responses	Re: [PATCH] Fix fragile walreceiver test.
List	pgsql-hackers

Tree view

The recovery/004_timeline_switch test has been failing for me on
Windows. The test is wrong.

The test does this:

    $node_standby_2->restart;
    # ... timeline switch happens ...
    ok( !$node_standby_2->log_contains(
            "FATAL: .* terminating walreceiver process due to
administrator command"
        ),
        'WAL receiver should not be stopped across timeline jumps');

Problem: restart() kills the walreceiver (as it should), which writes
that exact FATAL message to the log. The test then searches the log and
finds it.

The test has a comment claiming "a new log file is used on node
restart". TAP tests use pg_ctl with a fixed filename that gets reused
across restarts. No log rotation.

I added logging to confirm what's actually happening. The walreceiver
works correctly - same PID handles both timelines:

    2025-11-04 23:05:28.539 CST walreceiver[83824] LOG:  started
streaming WAL from primary at 0/03000000 on timeline 1
    2025-11-04 23:05:28.543 CST startup[42764] LOG:  new target timeline
is 2
    2025-11-04 23:05:28.544 CST walreceiver[83824] LOG:  restarted WAL
        streaming at 0/03000000 on timeline 2

That's PID 83824 throughout. Works fine.

Earlier in the same log, from the restart:

    2025-11-04 23:05:27.261 CST walreceiver[52440] FATAL:  terminating
        walreceiver process due to administrator command

Different PID (52440), expected shutdown. This is what the test finds.

The fix is obvious: check that the walreceiver PID stays constant.
That's what we actually care about anyway.

This matters because changes to I/O behavior elsewhere in the code can
make this test fail spuriously. I hit it while working on O_CLOEXEC
handling for Windows.

Patch attached.
-- 
Bryan Green
EDB: https://www.enterprisedb.com

Attachment

0001-Fix-timing-dependent-failure-in-recovery-004_timelin.patch

pgsql-hackers by date:

From: Michael Paquier
Date: 05 November, 08:44:39
Subject: Re: [BUG] temporary file usage report with extended protocol and unnamed portals

From: Paul A Jungwirth
Date: 05 November, 09:18:40
Subject: GiST README typos

[PATCH] Fix fragile walreceiver test. - Mailing list pgsql-hackers

Attachment

Previous

Next