Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers
From | shveta malik |
---|---|
Subject | Re: Synchronizing slots from primary to standby |
Date | |
Msg-id | CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6=g@mail.gmail.com Whole thread Raw |
In response to | Re: Synchronizing slots from primary to standby ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>) |
Responses |
Re: Synchronizing slots from primary to standby
Re: Synchronizing slots from primary to standby |
List | pgsql-hackers |
On Tue, Aug 8, 2023 at 11:11 AM Drouvot, Bertrand <bertranddrouvot.pg@gmail.com> wrote: > > Hi, > > On 8/8/23 7:01 AM, shveta malik wrote: > > On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand > > <bertranddrouvot.pg@gmail.com> wrote: > >> > >> Hi, > >> > >> On 8/4/23 1:32 PM, shveta malik wrote: > >>> On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand > >>> <bertranddrouvot.pg@gmail.com> wrote: > >>>> On 7/28/23 4:39 PM, Bharath Rupireddy wrote: > >> > > > > Agreed. That is why in v10,v11 patches, we have different infra for > > sync-slot worker i.e. it is not relying on "logical replication > > background worker" anymore. > > yeah saw that, looks like the right way to go to me. > > >> Maybe we should start some tests/benchmark with only one sync worker to get numbers > >> and start from there? > > > > Yes, we can do that performance testing to figure out the difference > > between the two modes. I will try to get some statistics on this. > > > > Great, thanks! > We (myself and Ajin) performed the tests to compute the lag in standby slots as compared to primary slots with different number of slot-sync workers configured. 3 DBs were created, each with 30 tables and each table having one logical-pub/sub configured. So this made a total of 90 logical replication slots to be synced. Then the workload was run for aprox 10 mins. During this workload, at regular intervals, primary and standby slots' lsns were captured (from pg_replication_slots) and compared. At each capture, the intent was to know how much is each standby's slot lagging behind corresponding primary's slot by taking the distance between confirmed_flush_lsn of primary and standby slot. Then we took the average (integer value) of this distance over the span of 10 min workload and this is what we got: With max_slot_sync_workers=1, average-lag = 42290.3563 With max_slot_sync_workers=2, average-lag = 24585.1421 With max_slot_sync_workers=3, average-lag = 14964.9215 This shows that more workers have better chances to keep logical replication slots in sync for this case. Another statistics if it interests you is, we ran a frequency test as well (this by changing code, unit test sort of) to figure out the 'total number of times synchronization done' with different number of sync-slots workers configured. Same 3 DBs setup with each DB having 30 logical replication slots. With 'max_slot_sync_workers' set at 1, 2 and 3; total number of times synchronization done was 15874, 20205 and 23414 respectively. Note: this is not on the same machine where we captured lsn-gap data, it is on a little less efficient machine but gives almost the same picture. Next we are planning to capture this data for a lesser number of slots like 10,30,50 etc. It may happen that the benefit of multi-workers over single workers in such cases could be less, but let's have the data to verify that. Thanks Ajin for jointly working on this. thanks Shveta
pgsql-hackers by date: