Home > mailing lists

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From	Drouvot, Bertrand
Subject	Re: Synchronizing slots from primary to standby
Date	August 4, 2023 09:14:33
Msg-id	d0372b74-d7c5-0216-bc0f-23439eb56579@gmail.com Whole thread Raw
In response to	Re: Synchronizing slots from primary to standby (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>)
Responses	Re: Synchronizing slots from primary to standby
List	pgsql-hackers

Tree view

Hi,

On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
> On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote:
>>
>>> 2. All candidate standbys will start one slot sync worker per logical
>>> slot which might not be scalable.
>>
>> Yeah, that doesn't sound like a good idea but IIRC, the proposed patch
>> is using one worker per database (for all slots corresponding to a
>> database).
> 
> Right. It's based on one worker for each database.
> 
>>> Is having one (or a few more - not
>>> necessarily one for each logical slot) worker for all logical slots
>>> enough?
>>
>> I guess for a large number of slots the is a possibility of a large
>> gap in syncing the slots which probably means we need to retain
>> corresponding WAL for a much longer time on the primary. If we can
>> prove that the gap won't be large enough to matter then this would be
>> probably worth considering otherwise, I think we should find a way to
>> scale the number of workers to avoid the large gap.
> 
> I think the gap is largely determined by the time taken to advance
> each slot and the amount of WAL that each logical slot moves ahead on
> primary. 

Sorry to be late, but I gave a second thought and I wonder if we really need this design.
(i.e start a logical replication background worker on the standby to sync the slots).

Wouldn't that be simpler to "just" update the sync slots "metadata"
as the https://github.com/EnterpriseDB/pg_failover_slots module (mentioned by Peter
up-thread) is doing?
(making use of LogicalConfirmReceivedLocation(), LogicalIncreaseXminForSlot()
and LogicalIncreaseRestartDecodingForSlot(), If I read synchronize_one_slot() correctly).

> I've measured the time it takes for
> pg_logical_replication_slot_advance with different amounts WAL on my
> system. It took 2595ms/5091ms/31238ms to advance the slot by
> 3.7GB/7.3GB/13GB respectively. To put things into perspective here,
> imagine there are 3 logical slots to sync for a single slot sync
> worker and each of them are in need of advancing the slot by
> 3.7GB/7.3GB/13GB of WAL. The slot sync worker gets to slot 1 again
> after 2595ms+5091ms+31238ms (~40sec), gets to slot 2 again after
> advance time of slot 1 with amount of WAL that the slot has moved
> ahead on primary during 40sec, gets to slot 3 again after advance time
> of slot 1 and slot 2 with amount of WAL that the slot has moved ahead
> on primary and so on. If WAL generation on the primary is pretty fast,
> and if the logical slot moves pretty fast on the primary, the time it
> takes for a single sync worker to sync a slot can increase.

That would be way "faster" and we would probably not need to
worry that much about the number of "sync" workers (if it/they "just" has/have
to sync slot's "metadata") as proposed above.

Thoughts?

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: tender wang
Date: 04 August 2023, 09:04:29
Subject: Re: [BUG] Fix DETACH with FK pointing to a partitioned table fails

From: "Drouvot, Bertrand"
Date: 04 August 2023, 09:17:45
Subject: Re: Split index and table statistics into different types of stats

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

Previous

Next