Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | Drouvot, Bertrand |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | d0372b74-d7c5-0216-bc0f-23439eb56579@gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
|
List | pgsql-hackers |
Hi, On 7/28/23 4:39 PM, Bharath Rupireddy wrote: > On Mon, Jul 24, 2023 at 9:00 AM Amit Kapila <amit.kapila16@gmail.com> wrote: >> >>> 2. All candidate standbys will start one slot sync worker per logical >>> slot which might not be scalable. >> >> Yeah, that doesn't sound like a good idea but IIRC, the proposed patch >> is using one worker per database (for all slots corresponding to a >> database). > > Right. It's based on one worker for each database. > >>> Is having one (or a few more - not >>> necessarily one for each logical slot) worker for all logical slots >>> enough? >> >> I guess for a large number of slots the is a possibility of a large >> gap in syncing the slots which probably means we need to retain >> corresponding WAL for a much longer time on the primary. If we can >> prove that the gap won't be large enough to matter then this would be >> probably worth considering otherwise, I think we should find a way to >> scale the number of workers to avoid the large gap. > > I think the gap is largely determined by the time taken to advance > each slot and the amount of WAL that each logical slot moves ahead on > primary. Sorry to be late, but I gave a second thought and I wonder if we really need this design. (i.e start a logical replication background worker on the standby to sync the slots). Wouldn't that be simpler to "just" update the sync slots "metadata" as the https://github.com/EnterpriseDB/pg_failover_slots module (mentioned by Peter up-thread) is doing? (making use of LogicalConfirmReceivedLocation(), LogicalIncreaseXminForSlot() and LogicalIncreaseRestartDecodingForSlot(), If I read synchronize_one_slot() correctly). > I've measured the time it takes for > pg_logical_replication_slot_advance with different amounts WAL on my > system. It took 2595ms/5091ms/31238ms to advance the slot by > 3.7GB/7.3GB/13GB respectively. To put things into perspective here, > imagine there are 3 logical slots to sync for a single slot sync > worker and each of them are in need of advancing the slot by > 3.7GB/7.3GB/13GB of WAL. The slot sync worker gets to slot 1 again > after 2595ms+5091ms+31238ms (~40sec), gets to slot 2 again after > advance time of slot 1 with amount of WAL that the slot has moved > ahead on primary during 40sec, gets to slot 3 again after advance time > of slot 1 and slot 2 with amount of WAL that the slot has moved ahead > on primary and so on. If WAL generation on the primary is pretty fast, > and if the logical slot moves pretty fast on the primary, the time it > takes for a single sync worker to sync a slot can increase. That would be way "faster" and we would probably not need to worry that much about the number of "sync" workers (if it/they "just" has/have to sync slot's "metadata") as proposed above. Thoughts? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
pgsql-hackers by date: