RE: issue with synchronized_standby_slots - Mailing list pgsql-hackers

From Zhijie Hou (Fujitsu)
Subject RE: issue with synchronized_standby_slots
Date
Msg-id TY4PR01MB16907911406A0DED47818733C940FA@TY4PR01MB16907.jpnprd01.prod.outlook.com
Whole thread Raw
In response to Re: issue with synchronized_standby_slots  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: issue with synchronized_standby_slots
List pgsql-hackers
On Tuesday, September 9, 2025 1:30 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> On Mon, Sep 8, 2025 at 2:56 PM Alexander Kukushkin
> <cyberdemn@gmail.com> wrote:
> >
> > Recently we also hit this problem.
> >
> > I think in a current state check_synchronized_standby_slots() and
> validate_sync_standby_slots() functions are not very useful:
> > - When the hook is executed from postmaster it only checks that
> synchronized_standby_slots contains a valid list, but doesn't check that
> replication slots exists, because MyProc is NULL. It happens both, on start and
> on reload.
> > - When executed from other backends set_config_with_handle() is called
> with elevel = 0, and therefore elevel becomes DEBUG3, which results in no
> useful error/warning messages.
> >
> > There are a couple of places where check_synchronized_standby_slots()
> failure is not ignored:
> > 1. alter system set synchronized_standby_slots='invalid value'; 2.
> > Parallel workers, because RestoreGUCState() calls
> set_config_option_ext()->set_config_with_handle() with elevel=ERROR. As a
> result, parallel workers fail to start with the error.
> >
> > With parallel workers it is actually even worse - we get the error even in case
> of standby:
> > 1. start standby with synchronized_standby_slots referring to
> > non-existing slots 2. SET parallel_setup_cost, parallel_tuple_cost,
> > and min_parallel_table_scan_size to 0 3. Run  select * from pg_namespace;
> and observe following error:
> > ERROR:  invalid value for parameter "synchronized_standby_slots": "a1,b1"
> > DETAIL:  replication slot "a1" does not exist
> > CONTEXT:  while setting parameter "synchronized_standby_slots" to
> "a1,b1"
> > parallel worker
> >
> 
> I see the same behaviour for default_table_access_method and
> default_tablespace. For example, see failure cases:
> postgres=# Alter system set default_table_access_method='missing';
> ERROR:  invalid value for parameter "default_table_access_method":
> "missing"
> DETAIL:  Table access method "missing" does not exist.
> 
> postgres=# SET parallel_setup_cost=0;
> SET
> postgres=# SET parallel_tuple_cost=0;
> SET
> postgres=# Set min_parallel_table_scan_size to 0; SET postgres=# select *
> from pg_namespace;
> ERROR:  invalid value for parameter "default_table_access_method":
> "missing"
> DETAIL:  Table access method "missing" does not exist.
> CONTEXT:  while setting parameter "default_table_access_method" to
> "missing"
> parallel worker
> 
> OTOH, there is no ERROR on reload or restart.
> 
> It is fair to argue that invalid GUC values should be ignored in certain cases like
> parallel query but we should have the same solution for other similar
> parameters as well.
> 
> As for the synchronized_standby_slots, we can follow the behavior similar to
> check_synchronous_standby_names and just give parsing ERRORs. Any
> non-existent slot related errors can be given when that parameter is later used.

I agree. For synchronized_standby_slots, I think it is acceptable to report only
parsing errors, because slots could be dropped even after validating the slot
existence during GUC loading. Additionally, we would report WARNINGs for
non-existent slots during the wait function anyway (e.g., in
StandbySlotsHaveCaughtup()).

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: issue with synchronized_standby_slots
Next
From: jian he
Date:
Subject: Re: pg_restore --no-policies should not restore policies' comment