Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication - Mailing list pgsql-hackers

From Mircea Cadariu
Subject Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication
Date
Msg-id d616631a-6621-4221-9452-4dd832d013cd@gmail.com
Whole thread Raw
In response to Re: pg_recvlogical: Prevent flushed data from being re-sent after restarting replication  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-hackers
Hi,

On 25/11/2025 17:16, Fujii Masao wrote:
> Thanks for writing the test case and turning it into a patch. I agree that
> we should add a regression test to ensure the reported issue doesn't recur.
Thanks for your feedback, updated patch is attached. Again, I checked 
that it fails in master, but passes with your patch.
> It looks like the v1 patch you attached accidentally includes
> the patch file itself. Could you remove that?
Whoops, not sure what happened there, fixed.
> +    '--fsync-interval', '1',
> +    '--status-interval', '1',
>
> Wouldn't it be safer to use a larger value (e.g., 100) for --status-interval?
> With a very small interval, the status feedback might happen before
> the walsender is terminated and pg_recvlogical reconnects, which could
> prevent the duplicate data from appearing even without the patch.
Yes indeed good one. I actually had it set to 60 in the previous version 
I sent earlier.

> +use IPC::Run qw(start);
> <snip>
> +my $recv = start [
>
> For simplicity, would it be better to avoid "use IPC::Run qw(start)" and
> just call "IPC::Run::start" directly?
Indeed, done.

> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> +    "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
>          SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?

> +# Wait only for initial connection
> +$node->poll_query_until('postgres',
> +    "SELECT active_pid IS NOT NULL FROM pg_replication_slots WHERE
> slot_name = 'reconnect_test'");
>
> This is unlikely, but pg_recvlogical's connection could be terminated
> immediately after connecting, before receiving any data. If that happens,
> the test might behave unexpectedly. To make the test more robust,
> should we instead poll on:
>
>          SELECT pg_read_file('$outfile') ~ 'INSERT'
>
> instead, to ensure that some data has actually been received before
> terminating the backend?
>
>
>
>> Secondly I noticed in function sendFeedback at line 166, the startpos is
>> set to output_written_lsn. This seems to counter conceptually the change
>> you made in the patch, however it seems to not affect correctness. Shall
>> we remove this line as to avoid confusion?
> Isn't this necessary when - is specified for --file, causing 
> OutputFsync() to be skipped? Regards, 
Yes, for sure. Would really like to avoid introducing flake in CI due to 
this test.

> Isn't this necessary when - is specified for --file, causing 
> OutputFsync() to be skipped? 
Upon another look, indeed. When writing to a regular file (--file -) 
that assignment is redundant but harmless. But like you said, when 
writing to stdout, without that line, startpos would never be updated.

> Additionally, when the --no-loop option is used, I found that 
> pg_recvlogical
> could previously exit without flushing written data, risking data loss.
> The attached patch fixes this issue by also ensuring that all data is 
> flushed
> to disk before exiting with --no-loop.

Should we think of some kind of test also for this part?

-- 
Thanks,
Mircea Cadariu

Attachment

pgsql-hackers by date:

Previous
From: atma ram
Date:
Subject: Question on PostgreSQL Table Partitioning – Performance of Queries That Do Not Use the Partition Key
Next
From: Laurenz Albe
Date:
Subject: Re: System views for versions reporting