Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size - Mailing list pgsql-bugs
From | Kyotaro Horiguchi |
---|---|
Subject | Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size |
Date | |
Msg-id | 20210719.111318.2042379313472032754.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size (Alvaro Herrera <alvherre@alvh.no-ip.org>) |
Responses |
Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
|
List | pgsql-bugs |
At Sat, 17 Jul 2021 10:28:09 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in > On 2021-Jul-16, Alvaro Herrera wrote: > > > The buildfarm has remained green so far, but clearly this is something > > we need to fix. Maybe it's as simple as adding the loop we use below, > > starting at line 219. > > There are a few failures of this on buildfarm now, .. > I am running the test in a loop with the attached; if it doesn't fail in > a few more rounds I'll push it. > > There are two instances of a different failure: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-17%2013%3A39%3A43 > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-07-16%2021%3A14%3A14 > > # Failed test 'check that segments have been removed' > # at t/019_replslot_limit.pl line 213. > # got: '000000010000000000000021' > # expected: '000000010000000000000022' > # Looks like you failed 1 test of 19. > [23:55:14] t/019_replslot_limit.pl .............. > Dubious, test returned 1 (wstat 256, 0x100) > > I'm afraid about this not being something we can fix with some > additional wait points ... Sorry for the mistake. It seems to me the cause the above is that segment removal happens *after* invalidation. Since (at least currently) the "slot is invalidated" warning issued only at the time just before WAL-removal, we should expect that old segments are gone after the checkpoint-ending log, which should be seen after WAL-removal. If not, that shows that there's a bug. What do you think about the attached? regards. -- Kyotaro Horiguchi NTT Open Source Software Center From c52d7931e95cc24804f9aac4c9bf3a388c7e461f Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Date: Mon, 19 Jul 2021 10:58:01 +0900 Subject: [PATCH v1] Remove possible instability of new replication slot test code The last fix for the same left another possible timing unstability between actual segment removal and the invalidation log. Make it steady by waiting for checkpoint-ending log, which is issued after the segment removal. --- src/test/recovery/t/019_replslot_limit.pl | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 026da02ff1..a5d8140807 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -11,7 +11,7 @@ use TestLib; use PostgresNode; use File::Path qw(rmtree); -use Test::More tests => $TestLib::windows_os ? 15 : 19; +use Test::More tests => $TestLib::windows_os ? 16 : 20; use Time::HiRes qw(usleep); $ENV{PGDATABASE} = 'postgres'; @@ -201,6 +201,19 @@ $result = $node_primary->safe_psql( is($result, "rep1|f|t|lost|", 'check that the slot became inactive and the state "lost" persists'); +# Make sure the current checkpoint ended +my $checkpoint_ended = 0; +for (my $i = 0; $i < 10000; $i++) +{ + if (find_in_log($node_primary, "checkpoint complete: ", $logstart)) + { + $checkpoint_ended = 1; + last; + } + usleep(100_000); +} +ok($checkpoint_ended, 'make sure checkpoint ended'); + # The invalidated slot shouldn't keep the old-segment horizon back; # see bug #17103: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org # Test for this by creating a new slot and comparing its restart LSN -- 2.27.0
pgsql-bugs by date: