Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers
From | Kyotaro Horiguchi |
---|---|
Subject | Re: [HACKERS] Restricting maximum keep segments by repslots |
Date | |
Msg-id | 20200428.171815.1687900483771598932.horikyota.ntt@gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Restricting maximum keep segments by repslots (Alvaro Herrera <alvherre@2ndquadrant.com>) |
Responses |
Re: [HACKERS] Restricting maximum keep segments by repslots
|
List | pgsql-hackers |
At Mon, 27 Apr 2020 19:40:07 -0400, Alvaro Herrera <alvherre@2ndquadrant.com> wrote in > On 2020-Apr-08, Kyotaro Horiguchi wrote: > > > I understand how it happens. > > > > The latch triggered by checkpoint request by CHECKPOINT command has > > been absorbed by ConditionVariableSleep() in > > InvalidateObsoleteReplicationSlots. The attached allows checkpointer > > use MyLatch for other than checkpoint request while a checkpoint is > > running. > > Hmm, that explanation makes sense, but I couldn't reproduce it with the > steps you provided. Perhaps I'm missing something. Sorry for the incomplete reproducer. A checkpoint needs to be running simultaneously for the manual checkpoint to hang up. The following is the complete sequence. 1. Build a primary database cluster with the following setup, then start it. max_slot_wal_keep_size=0 max_wal_size=32MB min_wal_size=32MB 2. Build a replica from the primary creating a slot, then start it. $ pg_basebackup -R -C -S s1 -D... 3. Try the following commands. Try several times if it succeeds. =# create table tt(); drop table tt; select pg_switch_wal();checkpoint; It is evidently stochastic, but it works quite reliably for me. > Anyway I think this patch should fix it also -- instead of adding a new > flag, we just rely on the existing flags (since do_checkpoint must have > been set correctly from the flags earlier in that block.) Since the added (!do_checkpoint) check is reached with do_checkpoint=false at server start and at archive_timeout intervals, the patch makes checkpointer run a busy-loop at that timings, and that loop lasts until a checkpoint is actually executed. What we need to do here is not forgetting the fact that the latch has been set even if the latch itself gets reset before reaching to WaitLatch. > I think it'd be worth to verify this bugfix in a new test. Would you > have time to produce that? I could try in a couple of days ... The attached patch on 019_replslot_limit.pl does the commands above automatically. It sometimes succeed but fails in most cases, at least for me. regards. -- Kyotaro Horiguchi NTT Open Source Software Center diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl index 32dce54522..c8ec4bb363 100644 --- a/src/test/recovery/t/019_replslot_limit.pl +++ b/src/test/recovery/t/019_replslot_limit.pl @@ -8,7 +8,7 @@ use TestLib; use PostgresNode; use File::Path qw(rmtree); -use Test::More tests => 13; +use Test::More tests => 14; use Time::HiRes qw(usleep); $ENV{PGDATABASE} = 'postgres'; @@ -181,6 +181,36 @@ ok($failed, 'check that replication has been broken'); $node_standby->stop; +my $node_master2 = get_new_node('master2'); +$node_master2->init(allows_streaming => 1); +$node_master2->append_conf('postgresql.conf', qq( +min_wal_size = 32MB +max_wal_size = 32MB +log_checkpoints = yes +)); +$node_master2->start; +$node_master2->safe_psql('postgres', "SELECT pg_create_physical_replication_slot('rep1')"); +$backup_name = 'my_backup2'; +$node_master2->backup($backup_name); + +$node_master2->stop; +$node_master2->append_conf('postgresql.conf', qq( +max_slot_wal_keep_size = 0 +)); +$node_master2->start; + +$node_standby = get_new_node('standby_2'); +$node_standby->init_from_backup($node_master2, $backup_name, has_streaming => 1); +$node_standby->append_conf('postgresql.conf', "primary_slot_name = 'rep1'"); +$node_standby->start; +my @result = + split('\n', $node_master2->safe_psql('postgres', " + CREATE TABLE tt(); DROP TABLE tt; + SELECT pg_switch_wal(); + CHECKPOINT; + SELECT 'finished';", timeout=>'5')); +is($result[1], 'finished', 'check if checkpoint command is not blocked'); + ##################################### # Advance WAL of $node by $n segments sub advance_wal
pgsql-hackers by date: