Home > mailing lists

Re: High CPU consumption in cascade replication with large number of walsenders - Mailing list pgsql-hackers

From	Alexey Makhmutov
Subject	Re: High CPU consumption in cascade replication with large number of walsenders
Date	September 16, 2025 18:20:31
Msg-id	cab50f41-d335-426d-b313-8a28abc7febc@postgrespro.ru Whole thread Raw
In response to	Re: High CPU consumption in cascade replication with large number of walsenders (Alexander Korotkov <aekorotkov@gmail.com>)
Responses	Re: High CPU consumption in cascade replication with large number of walsenders
List	pgsql-hackers

Tree view

Hi, Alexander!

Thank you very much for looking at the patch and providing valuable 
feedback!

 > This approach makes sense to me.  Do you think it might have corner 
cases?  I suggest the test scenario might include some delay between 
"UPDATE" queries.  Then we can see how changing of this delay interacts 
with cascade_replication_batch_delay.

The effect of 'cascade_replication_batch_delay' setting could be more 
easily observed by manually changing a single row in the primary 
database ('A' instance in the test) and then observing the delay before 
such change became visible on the 'C' instance. Something like following:
On C instance:
  select c0 where test_repli_test_t1 where id=0 \watch 1
On A instance, first set the initial value:
  update test_repli_test_t1 set c0=0 where id=0;
... and then update the row and wait for it to became visible on C instance:
  update test_repli_test_t1 set c0=c0+1 where id=0;

In my tests with enabled batching and without enabling delay limit (i.e. 
by setting the 'cascade_replication_batch_delay' to 0), the change 
became visible in about 5-6 seconds (as walsender on B instance seems to 
wake up by itself anyway). With 'cascade_replication_batch_delay' set to 
500 (ms) the value became visible almost immediately.

 > This comment tells about logical walsenders, but they same will be 
applied to physical walsenders, right?

Yes, this item probably needs some clarification. In this code path we 
are dealing with logical walsenders, as physical walsenders are notified 
in XLogWalRcvFlush. However, when TLI changes, this code will notify 
both physical and logical walsenders. So, I've changed the comment now 
to describe this behavior more clearly.

Another question is whether we really need to notify physical walsenders 
at this point. This was the logic of the original code, so I kept it 
when adding batching support. However, it seems that physical sender 
should not be very interested in knowing that logical decoding has 
discovered change in timeline ID, as it should be either already 
notified by walreceiver or discover it by itself in the stored WAL data 
if recovery was invoked at startup. So, maybe the better approach here 
is just to keep notifications for logical walsenders only.

 > I see these two GUCs are both PGC_POSTMASTER.  Could they be PGC_SIGHUP?

This is a good suggestion. I've tried to implement support for 
PGC_SIGHUP context in the new patch version. Now the current batch 
should be flushed immediately as parameters are changed and then new 
values will be used for processing once next WAL record is applied. This 
also makes testing a little simpler: if we start test script for longer 
interval (i.e. 300 seconds instead of 60), then it's possible to see how 
CPU load is changed on the fly as batching is enabled or disabled.

 > Also I think there is a typo in the the description of 
cascade_replication_batch_size, it must say "0 disables".

Sure, thanks for catching this!

 > I also think these GUCs should be in the sample file, possibly 
disabled by default because it only make sense to set up them with high 
number of cascaded walsenders.

Yes, it was my intention for having 'cascade_replication_batch_size' 
disabled by default as it was described in the mail message, but I 
forget to actually set it to '0' in the previous patch version. Thank 
you for noticing this! The 'cascade_replication_batch_delay' is working 
only if batching is enabled (i.e. batch size is set to value greater 
than 1), so a value of 500 (ms) seems to be a reasonable default settings.
I've also added both values to the sample configuration in the new patch 
version, as suggested.

The new patch version with changes described above and rebased on top of 
current master is attached.

Thank you again for looking on this proposal!

Thanks,
Alexey

Attachment

v2-0001-Implement-batching-for-WAL-records-notification-duri.patch

pgsql-hackers by date:

From: Srirama Kucherlapati
Date: 16 September 2025, 18:13:30
Subject: RE: AIX support

From: Andres Freund
Date: 16 September 2025, 18:25:14
Subject: Re: Incorrect result of bitmap heap scan.

Re: High CPU consumption in cascade replication with large number of walsenders - Mailing list pgsql-hackers

Attachment

Previous

Next