RE: issue with synchronized_standby_slots - Mailing list pgsql-hackers
From | Zhijie Hou (Fujitsu) |
---|---|
Subject | RE: issue with synchronized_standby_slots |
Date | |
Msg-id | TY4PR01MB16907911406A0DED47818733C940FA@TY4PR01MB16907.jpnprd01.prod.outlook.com Whole thread Raw |
In response to | Re: issue with synchronized_standby_slots (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: issue with synchronized_standby_slots
|
List | pgsql-hackers |
On Tuesday, September 9, 2025 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote: > > On Mon, Sep 8, 2025 at 2:56 PM Alexander Kukushkin > <cyberdemn@gmail.com> wrote: > > > > Recently we also hit this problem. > > > > I think in a current state check_synchronized_standby_slots() and > validate_sync_standby_slots() functions are not very useful: > > - When the hook is executed from postmaster it only checks that > synchronized_standby_slots contains a valid list, but doesn't check that > replication slots exists, because MyProc is NULL. It happens both, on start and > on reload. > > - When executed from other backends set_config_with_handle() is called > with elevel = 0, and therefore elevel becomes DEBUG3, which results in no > useful error/warning messages. > > > > There are a couple of places where check_synchronized_standby_slots() > failure is not ignored: > > 1. alter system set synchronized_standby_slots='invalid value'; 2. > > Parallel workers, because RestoreGUCState() calls > set_config_option_ext()->set_config_with_handle() with elevel=ERROR. As a > result, parallel workers fail to start with the error. > > > > With parallel workers it is actually even worse - we get the error even in case > of standby: > > 1. start standby with synchronized_standby_slots referring to > > non-existing slots 2. SET parallel_setup_cost, parallel_tuple_cost, > > and min_parallel_table_scan_size to 0 3. Run select * from pg_namespace; > and observe following error: > > ERROR: invalid value for parameter "synchronized_standby_slots": "a1,b1" > > DETAIL: replication slot "a1" does not exist > > CONTEXT: while setting parameter "synchronized_standby_slots" to > "a1,b1" > > parallel worker > > > > I see the same behaviour for default_table_access_method and > default_tablespace. For example, see failure cases: > postgres=# Alter system set default_table_access_method='missing'; > ERROR: invalid value for parameter "default_table_access_method": > "missing" > DETAIL: Table access method "missing" does not exist. > > postgres=# SET parallel_setup_cost=0; > SET > postgres=# SET parallel_tuple_cost=0; > SET > postgres=# Set min_parallel_table_scan_size to 0; SET postgres=# select * > from pg_namespace; > ERROR: invalid value for parameter "default_table_access_method": > "missing" > DETAIL: Table access method "missing" does not exist. > CONTEXT: while setting parameter "default_table_access_method" to > "missing" > parallel worker > > OTOH, there is no ERROR on reload or restart. > > It is fair to argue that invalid GUC values should be ignored in certain cases like > parallel query but we should have the same solution for other similar > parameters as well. > > As for the synchronized_standby_slots, we can follow the behavior similar to > check_synchronous_standby_names and just give parsing ERRORs. Any > non-existent slot related errors can be given when that parameter is later used. I agree. For synchronized_standby_slots, I think it is acceptable to report only parsing errors, because slots could be dropped even after validating the slot existence during GUC loading. Additionally, we would report WARNINGs for non-existent slots during the wait function anyway (e.g., in StandbySlotsHaveCaughtup()). Best Regards, Hou zj
pgsql-hackers by date: