Home > mailing lists

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size - Mailing list pgsql-bugs

From	Kyotaro Horiguchi
Subject	Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
Date	July 19, 2021 02:13:18
Msg-id	20210719.111318.2042379313472032754.horikyota.ntt@gmail.com Whole thread Raw
In response to	Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Responses	Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size
List	pgsql-bugs

Tree view

At Sat, 17 Jul 2021 10:28:09 -0400, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2021-Jul-16, Alvaro Herrera wrote:
> 
> > The buildfarm has remained green so far, but clearly this is something
> > we need to fix.  Maybe it's as simple as adding the loop we use below,
> > starting at line 219.
> 
> There are a few failures of this on buildfarm now,
..
> I am running the test in a loop with the attached; if it doesn't fail in
> a few more rounds I'll push it.
> 
> There are two instances of a different failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=kittiwake&dt=2021-07-17%2013%3A39%3A43
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet&dt=2021-07-16%2021%3A14%3A14
> 
> #   Failed test 'check that segments have been removed'
> #   at t/019_replslot_limit.pl line 213.
> #          got: '000000010000000000000021'
> #     expected: '000000010000000000000022'
> # Looks like you failed 1 test of 19.
> [23:55:14] t/019_replslot_limit.pl .............. 
> Dubious, test returned 1 (wstat 256, 0x100)
> 
> I'm afraid about this not being something we can fix with some
> additional wait points ...

Sorry for the mistake.  It seems to me the cause the above is that
segment removal happens *after* invalidation. Since (at least
currently) the "slot is invalidated" warning issued only at the time
just before WAL-removal, we should expect that old segments are gone
after the checkpoint-ending log, which should be seen after
WAL-removal.  If not, that shows that there's a bug.

What do you think about the attached?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
From c52d7931e95cc24804f9aac4c9bf3a388c7e461f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Date: Mon, 19 Jul 2021 10:58:01 +0900
Subject: [PATCH v1] Remove possible instability of new replication slot test
 code

The last fix for the same left another possible timing unstability
between actual segment removal and the invalidation log. Make it
steady by waiting for checkpoint-ending log, which is issued after the
segment removal.
---
 src/test/recovery/t/019_replslot_limit.pl | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index 026da02ff1..a5d8140807 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -11,7 +11,7 @@ use TestLib;
 use PostgresNode;
 
 use File::Path qw(rmtree);
-use Test::More tests => $TestLib::windows_os ? 15 : 19;
+use Test::More tests => $TestLib::windows_os ? 16 : 20;
 use Time::HiRes qw(usleep);
 
 $ENV{PGDATABASE} = 'postgres';
@@ -201,6 +201,19 @@ $result = $node_primary->safe_psql(
 is($result, "rep1|f|t|lost|",
     'check that the slot became inactive and the state "lost" persists');
 
+# Make sure the current checkpoint ended
+my $checkpoint_ended = 0;
+for (my $i = 0; $i < 10000; $i++)
+{
+    if (find_in_log($node_primary, "checkpoint complete: ", $logstart))
+    {
+        $checkpoint_ended = 1;
+        last;
+    }
+    usleep(100_000);
+}
+ok($checkpoint_ended, 'make sure checkpoint ended');
+
 # The invalidated slot shouldn't keep the old-segment horizon back;
 # see bug #17103: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org
 # Test for this by creating a new slot and comparing its restart LSN
-- 
2.27.0

pgsql-bugs by date:

From: PG Bug reporting form
Date: 18 July 2021, 21:00:01
Subject: BUG #17113: Assert failed on calling a function fixed after an extension reload

From: Andrey Borodin
Date: 19 July 2021, 07:10:52
Subject: Re: CREATE INDEX CONCURRENTLY does not index prepared xact's data

Re: BUG #17103: WAL segments are not removed after exceeding max_slot_wal_keep_size - Mailing list pgsql-bugs

Previous

Next