I'm working with logical replication in a PostgreSQL 17 setup, and I'm exploring the new synchronized_standby_slots parameter to make replication slots failover safe in a highly available environment using physical standby nodes managed by Patroni.
While testing this feature, I encountered a blocking behavior, when a standby is listed in synchronized_standby_slots and that standby goes offline, logical replication on the primary stops progressing. From what I understand, the primary node waits for the standby to acknowledge received wal records, effectively stalling WAL decoding for the logical slot. I noticed that the failover slot on the standby continue to be synced.
This raises several questions about the tradeoffs and implications of using this feature:
What are the risks or limitations if synchronized_standby_slots is left empty (the default)? Is there a risk of data loss or inconsistency for logical subscribers in such cases?
Is it expected behavior that any failure of a standby listed in synchronized_standby_slots stalls logical decoding on the primary? If so, are there any ways to avoid blocking WAL decoding while still having slot synchronization?
Patroni is managing FO slots better than native Postgres impletmentation?