Home > mailing lists

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

From	shveta malik
Subject	Re: Synchronizing slots from primary to standby
Date	August 14, 2023 09:52:24
Msg-id	CAJpy0uD2F43avuXy_yQv7Wa3kpUwioY_Xn955xdmd6vX0ME6=g@mail.gmail.com Whole thread Raw
In response to	Re: Synchronizing slots from primary to standby ("Drouvot, Bertrand" <bertranddrouvot.pg@gmail.com>)
Responses	Re: Synchronizing slots from primary to standby Re: Synchronizing slots from primary to standby
List	pgsql-hackers

Tree view

On Tue, Aug 8, 2023 at 11:11 AM Drouvot, Bertrand
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On 8/8/23 7:01 AM, shveta malik wrote:
> > On Mon, Aug 7, 2023 at 3:17 PM Drouvot, Bertrand
> > <bertranddrouvot.pg@gmail.com> wrote:
> >>
> >> Hi,
> >>
> >> On 8/4/23 1:32 PM, shveta malik wrote:
> >>> On Fri, Aug 4, 2023 at 2:44 PM Drouvot, Bertrand
> >>> <bertranddrouvot.pg@gmail.com> wrote:
> >>>> On 7/28/23 4:39 PM, Bharath Rupireddy wrote:
> >>
> >
> > Agreed. That is why in v10,v11 patches, we have different infra for
> > sync-slot worker i.e. it is not relying on "logical replication
> > background worker" anymore.
>
> yeah saw that, looks like the right way to go to me.
>
> >> Maybe we should start some tests/benchmark with only one sync worker to get numbers
> >> and start from there?
> >
> > Yes, we can do that performance testing to figure out the difference
> > between the two modes. I will try to get some statistics on this.
> >
>
> Great, thanks!
>

We (myself and Ajin) performed the tests to compute the lag in standby
slots as compared to primary slots with different number of slot-sync
workers configured.

3 DBs were created, each with 30 tables and each table having one
logical-pub/sub configured. So this made a total of 90 logical
replication slots to be synced. Then the workload was run for aprox 10
mins. During this workload, at regular intervals, primary and standby
slots' lsns were captured (from pg_replication_slots) and compared. At
each capture, the intent was to know how much is each standby's slot
lagging behind corresponding primary's slot by taking the distance
between confirmed_flush_lsn of primary and standby slot. Then we took
the average (integer value) of this distance over the span of 10 min
workload and this is what we got:

With max_slot_sync_workers=1, average-lag =  42290.3563
With max_slot_sync_workers=2, average-lag =  24585.1421
With max_slot_sync_workers=3, average-lag =  14964.9215

This shows that more workers have better chances to keep logical
replication slots in sync for this case.

Another statistics if it interests you is, we ran a frequency test as
well (this by changing code, unit test sort of) to figure out the
'total number of times synchronization done' with different number of
sync-slots workers configured. Same 3 DBs setup with each DB having 30
logical replication slots. With 'max_slot_sync_workers' set at 1, 2
and 3; total number of times synchronization done was 15874, 20205 and
23414 respectively. Note: this is not on the same machine where we
captured lsn-gap data, it is on  a little less efficient machine but
gives almost the same picture.

Next we are planning to capture this data for a lesser number of slots
like 10,30,50 etc. It may happen that the benefit of multi-workers
over single workers in such cases could be less, but let's have the
data to verify that.

Thanks Ajin for jointly working on this.

thanks
Shveta

pgsql-hackers by date:

From: Nazir Bilal Yavuz
Date: 14 August 2023, 09:46:47
Subject: Re: Add PG CI to older PG releases

From: Amit Kapila
Date: 14 August 2023, 10:07:43
Subject: Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

Re: Synchronizing slots from primary to standby - Mailing list pgsql-hackers

Previous

Next