Re: issue with synchronized_standby_slots - Mailing list pgsql-hackers

From Fabrice Chapuis
Subject Re: issue with synchronized_standby_slots
Date
Msg-id CAA5-nLCSgJTUHQRg=m41uniTYsRkxWjNjry7BdxzBh-1q0kf7g@mail.gmail.com
Whole thread Raw
In response to RE: issue with synchronized_standby_slots  ("Zhijie Hou (Fujitsu)" <houzj.fnst@fujitsu.com>)
Responses Re: issue with synchronized_standby_slots
List pgsql-hackers
Thanks for your reply Zhijie,

I understand that the error  invalid value for parameter will be diplayed in case of bad value for the GUC synchronized_standby_slots or if a standby node configured is not up and running.
But the problem I noticed is that statements could not execute normally and error code is returned to the applcation.
This append after an upgrade from PG 14 to PG 17.
I could try to reproduce the issue

Regards,

Fabrice

On Fri, Sep 5, 2025 at 6:07 AM Zhijie Hou (Fujitsu) <houzj.fnst@fujitsu.com> wrote:
On Thursday, September 4, 2025 9:27 PM Fabrice Chapuis <fabrice636861@gmail.com>  wrote:
> With PG 17.5 and using logical replication failover slots. When trying to
> change the value of synchronized_standby_slots, node2 was not running then the
> error  invalid value for parameter "synchronized_standby_slots": "node1,node2"
> was generated. The problem is that statement were affected by this and they
> can't execute.
>
> STATEMENT:  select service_period,sp1_0.address_line_1 from tbl1  where http://sp1_0.vn=$1 order by sp1_0.start_of_period
> 2025-08-24 13:14:29.417 CEST [848477]: [1-1] user=,db=,client=,application= ERROR:  invalid value for parameter "synchronized_standby_slots": "node1,node2"
> 2025-08-24 13:14:29.417 CEST [848477]: [2-1] user=,db=,client=,application= DETAIL:  replication slot "s029054a" does not exist
> 2025-08-24 13:14:29.417 CEST [848477]: [3-1] user=,db=,client=,application= CONTEXT:  while setting parameter "synchronized_standby_slots" to "node1,node2"
> 2025-08-24 13:14:29.418 CEST [777453]: [48-1] user=,db=,client=,application= LOG:  background worker "parallel worker" (PID 848476) exited with exit code 1
> 2025-08-24 13:14:29.418 CEST [777453]: [49-1] user=,db=,client=,application= LOG:  background worker "parallel worker" (PID 848477) exited with exit code 1
>
> Is this issue already observed

Thank you for reporting this issue. It seems you've added a nonexistent slot to
synchronized_standby_slots before the server startup. The server does not verify
the existence of slots at startup due to the absence of slot shared information,
allowing the server to start successfully. However, when the parallel apply
worker starts, it re-verifies the GUC setting, resulting in the ERROR you saw.

I think this scenario is not necessarily a bug, as adding nonexistent slots to GUC is
disallowed. Such slots can block the logical failover slot's advancement,
increasing the risk of disk bloat due to WAL or dead rows, which is why we added
the ERROR. There are precedents for this kind of behavior, like
default_table_access_method and default_tablespace, which prevent queries if
invalid values are set before server startup.

To resolve the issue, you can remove the invalid slot from the GUC and add it
back after creating the physical slot.

I also thought about how to improve user experience for this, but it's not
feasible to verify slot existence at startup because replication has not been
restored to shared memory during GUC checks. Another option might be to simply
remove slot existence/type checks from GUC validation.

Best Regards,
Hou zj

pgsql-hackers by date:

Previous
From: Alastair Turner
Date:
Subject: Re: Proposal: Conflict log history table for Logical Replication
Next
From: Junwang Zhao
Date:
Subject: Re: Reduce "Var IS [NOT] NULL" quals during constant folding