Re: [HACKERS] Restricting maximum keep segments by repslots - Mailing list pgsql-hackers
| From | Alvaro Herrera |
|---|---|
| Subject | Re: [HACKERS] Restricting maximum keep segments by repslots |
| Date | |
| Msg-id | 20200427223342.GA23152@alvherre.pgsql Whole thread Raw |
| In response to | Re: [HACKERS] Restricting maximum keep segments by repslots (Kyotaro Horiguchi <horikyota.ntt@gmail.com>) |
| Responses |
Re: [HACKERS] Restricting maximum keep segments by repslots
|
| List | pgsql-hackers |
On 2020-Apr-08, Kyotaro Horiguchi wrote:
> At Wed, 08 Apr 2020 09:37:10 +0900 (JST), Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote in
>
> Just avoiding starting replication when restart_lsn is invalid is
> sufficient (the attached, which is equivalent to a part of what the
> invalidated flag did). I thing that the error message needs a Hint but
> it looks on the subscriber side as:
>
> [22086] 2020-04-08 10:35:04.188 JST ERROR: could not receive data from WAL stream: ERROR: replication slot "s1" is
invalidated
> HINT: The slot exceeds the limit by max_slot_wal_keep_size.
>
> I don't think it is not clean.. Perhaps the subscriber should remove
> the trailing line of the message from the publisher?
Thanks for the fix! I propose two changes:
1. reword the error like this:
ERROR: replication slot "regression_slot3" cannot be advanced
DETAIL: This slot has never previously reserved WAL, or has been invalidated
2. use the same error in one other place, to wit
pg_logical_slot_get_changes() and pg_replication_slot_advance(). I
made the DETAIL part the same in all places, but the ERROR line is
adjusted to what each callsite is doing.
I do think that this change in test_decoding is a bit unpleasant:
-ERROR: cannot use physical replication slot for logical decoding
+ERROR: cannot get changes from replication slot "repl"
The test is
-- check that we're detecting a streaming rep slot used for logical decoding
SELECT 'init' FROM pg_create_physical_replication_slot('repl');
SELECT data FROM pg_logical_slot_get_changes('repl', NULL, NULL, 'include-xids', '0', 'skip-empty-xacts', '1');
> > On the other hand, physical replication doesn't break by invlidation.
> > [...]
>
> If we don't mind that standby can reconnect after a walsender
> termination due to the invalidation, we don't need to do something for
> this. Restricting max_slot_wal_keep_size to be larger than a certain
> threshold would reduce the chance we see that behavior.
Yeah, I think you're referring to the fact that StartReplication()
doesn't verify the restart_lsn of the slot; and if we do add a check, a
few tests that rely on physical replication start to fail. This patch
only adds a comment in that spot. But I don't (yet) know what the
consequences of this are, or whether it can be fixed by setting a valid
restart_lsn ahead of time. This test in pg_basebackup fails, for
example:
# Running: pg_basebackup -D
/home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl-X stream -S slot1
pg_basebackup: error: could not send replication command "START_REPLICATION": ERROR: cannot read from replication slot
"slot1"
DETAIL: This slot has never previously reserved WAL, or has been invalidated
pg_basebackup: error: child process exited with exit code 1
pg_basebackup: removing data directory
"/home/alvherre/Code/pgsql-build/master/src/bin/pg_basebackup/tmp_check/tmp_test_EwIj/backupxs_sl"
not ok 95 - pg_basebackup -X stream with replication slot runs
# Failed test 'pg_basebackup -X stream with replication slot runs'
# at t/010_pg_basebackup.pl line 461.
Anyway I think the current patch can be applied as is -- and if we want
physical replication to have some other behavior, we can patch for that
afterwards.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
pgsql-hackers by date: