Thread: Make wal_receiver_timeout configurable per subscription
Hi, When multiple subscribers connect to different publisher servers, it can be useful to set different wal_receiver_timeout values for each connection to better detect failures. However, this isn't currently possible, which limits flexibility in managing subscriptions. To address this, I'd like to propose making wal_receiver_timeout configurable per subscription. One approach is to add wal_receiver_timeout as a parameter to CREATE SUBSCRIPTION command, storing it in pg_subscription so each logical replication worker can use its specific value. Another option is to change the wal_receiver_timeout's GUC context from PGC_SIGHUP to PGC_USERSET. This would allow setting different values via ALTER ROLE SET command for each subscription owner - effectively enabling per-subscription configuration. Since this approach is simpler and likely sufficient, I'd prefer starting with this. Thought? BTW, this could be extended in the future to other GUCs used by logical replication workers, such as wal_retrieve_retry_interval. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Fri, May 16, 2025 at 9:11 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
Hi,
When multiple subscribers connect to different publisher servers,
it can be useful to set different wal_receiver_timeout values for
each connection to better detect failures. However, this isn't
currently possible, which limits flexibility in managing subscriptions.
Hi,+1 for the idea.
One approach is to add wal_receiver_timeout as a parameter to
CREATE SUBSCRIPTION command, storing it in pg_subscription
so each logical replication worker can use its specific value.
Another option is to change the wal_receiver_timeout's GUC context
from PGC_SIGHUP to PGC_USERSET. This would allow setting different
values via ALTER ROLE SET command for each subscription owner -
effectively enabling per-subscription configuration. Since this
approach is simpler and likely sufficient, I'd prefer starting with this.
Thought?
Both ways LGTM,for starters we can go with changing GUC's context.
BTW, this could be extended in the future to other GUCs used by
logical replication workers, such as wal_retrieve_retry_interval.
+1 for extending this idea for other GUCs as well.
On Fri, May 16, 2025 at 9:11 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > When multiple subscribers connect to different publisher servers, > it can be useful to set different wal_receiver_timeout values for > each connection to better detect failures. However, this isn't > currently possible, which limits flexibility in managing subscriptions. > > To address this, I'd like to propose making wal_receiver_timeout > configurable per subscription. > > One approach is to add wal_receiver_timeout as a parameter to > CREATE SUBSCRIPTION command, storing it in pg_subscription > so each logical replication worker can use its specific value. > > Another option is to change the wal_receiver_timeout's GUC context > from PGC_SIGHUP to PGC_USERSET. This would allow setting different > values via ALTER ROLE SET command for each subscription owner - > effectively enabling per-subscription configuration. Since this > approach is simpler and likely sufficient, I'd prefer starting with this. > Thought? > The GUC wal_receiver_interval is also used for physical replication and logical launcher, so won't making it userset can impact those cases as well, but maybe that is okay. However, for the specific case you are worried about, isn't it better to make it a subscription option as that won't have a chance to impact any other cases? IIUC, the reason you are worried is because different publishers can have different network latencies with subscribers, so they may want different timing for feedback/keepalive messages. -- With Regards, Amit Kapila.
On Mon, May 19, 2025 at 2:48 AM Amit Kapila <amit.kapila16@gmail.com> wrote: > The GUC wal_receiver_interval is also used for physical replication > and logical launcher, so won't making it userset can impact those > cases as well, but maybe that is okay. However, for the specific case > you are worried about, isn't it better to make it a subscription > option as that won't have a chance to impact any other cases? The advantage of Fujii-san's proposal is that it is very simple to implement. A subscription option would indeed be better, but it would also be considerably more complex. Why not start simple and if someone wants to do the work to add something more complicated, that is fine? -- Robert Haas EDB: http://www.enterprisedb.com
On Tue, 20 May 2025 at 03:16, Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, May 19, 2025 at 11:19:48AM -0400, Robert Haas wrote: > > The advantage of Fujii-san's proposal is that it is very simple to > > implement. A subscription option would indeed be better, but it would > > also be considerably more complex. Why not start simple and if someone > > wants to do the work to add something more complicated, that is fine? > > Logically, adding that as an option of CREATE SUBSCRIPTION would just > be a duplication of what a connection strings are already able to do > with "options='-c foo=fooval'", isn't it? Although the value is set in the session that creates the subscription, it will not be used by the apply worker because the launcher process, which starts the apply worker after subscription creation, is unaware of session-specific settings. > It seems to me that the issue of downgrading wal_receiver_timeout to > become user-settable is if we're OK to allow non-superusers play with > it in the code path where it's used currently. Knowing that physical > WAL receivers are only spawned in a controlled manner by the startup > process, this does not sound like an issue. If we set the wal_receiver_timeout configuration using ALTER ROLE for the subscription owner's role, the apply worker will start with that value. However, any changes made via ALTER ROLE ... SET wal_receiver_timeout will not take effect for an already running apply worker unless the subscription is disabled and re-enabled. In contrast, this is handled automatically during CREATE SUBSCRIPTION, where parameter changes are detected. Regards, Vignesh
On 2025/05/20 18:13, vignesh C wrote: > If we set the wal_receiver_timeout configuration using ALTER ROLE for > the subscription owner's role, the apply worker will start with that > value. However, any changes made via ALTER ROLE ... SET > wal_receiver_timeout will not take effect for an already running apply > worker unless the subscription is disabled and re-enabled. In > contrast, this is handled automatically during CREATE SUBSCRIPTION, > where parameter changes are detected. Yes, this is one of the limitations of the user-settable wal_receiver_timeout approach. If we want to change the timeout used by the apply worker without restarting it, storing the value in pg_subscription (similar to how synchronous_commit is handled) would be a better solution. In that case, for example, we could set the default value of pg_subscription.wal_receiver_timeout to -1, meaning the apply worker should use the global wal_receiver_timeout from postgresql.conf. If the value is 0 or higher, the apply worker would use the value stored in pg_subscription. On further thought, another downside of the user-settable approach is that it doesn't work for parameters like wal_retrieve_retry_interval, which is used by the logical replication launcher not the apply worker. So if we want to support per-subscription control for non-apply workers, storing the settings in pg_subscription might be more appropriate. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Wed, May 21, 2025 at 6:04 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > > On 2025/05/20 18:13, vignesh C wrote: > > If we set the wal_receiver_timeout configuration using ALTER ROLE for > > the subscription owner's role, the apply worker will start with that > > value. However, any changes made via ALTER ROLE ... SET > > wal_receiver_timeout will not take effect for an already running apply > > worker unless the subscription is disabled and re-enabled. In > > contrast, this is handled automatically during CREATE SUBSCRIPTION, > > where parameter changes are detected. > > Yes, this is one of the limitations of the user-settable wal_receiver_timeout > approach. If we want to change the timeout used by the apply worker without > restarting it, storing the value in pg_subscription (similar to how > synchronous_commit is handled) would be a better solution. > > In that case, for example, we could set the default value of > pg_subscription.wal_receiver_timeout to -1, meaning the apply worker should > use the global wal_receiver_timeout from postgresql.conf. If the value is 0 > or higher, the apply worker would use the value stored in pg_subscription. > Yeah, I had a similar idea in my mind. > > On further thought, another downside of the user-settable approach is that > it doesn't work for parameters like wal_retrieve_retry_interval, which is > used by the logical replication launcher not the apply worker. So if we > want to support per-subscription control for non-apply workers, storing > the settings in pg_subscription might be more appropriate. > Yeah, that could be an option, but one might not want to keep such variables different for each subscription. Do you think one would like to prefer specifying variables that only apply to the subscriber-node in a way other than GUC? I always have this question whenever I see GUCs like max_sync_workers_per_subscription, which are specific to only subscriber nodes. -- With Regards, Amit Kapila.