Home > mailing lists

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

From	shiy.fnst@fujitsu.com
Subject	RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Date	March 2, 2023 08:06:53
Msg-id	OSZPR01MB6310F013D78D0C67EF7544DAFDB29@OSZPR01MB6310.jpnprd01.prod.outlook.com Whole thread Raw
In response to	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher (Önder Kalacı <onderkalaci@gmail.com>)
Responses	Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
List	pgsql-hackers

Tree view

On Wed, Mar 1, 2023 9:22 PM Önder Kalacı <onderkalaci@gmail.com> wrote:
> 
> Hi Andres, Amit, Shi Yu, all
> 
> Andres Freund <mailto:andres@anarazel.de>, 28 Şub 2023 Sal, 21:39 tarihinde şunu yazdı:
> Hi,
> 
> On 2023-02-25 16:00:05 +0530, Amit Kapila wrote:
> > On Tue, Feb 21, 2023 at 7:55 PM Önder Kalacı <mailto:onderkalaci@gmail.com> wrote:
> > >> I think this overhead seems to be mostly due to the need to perform
> > >> tuples_equal multiple times for duplicate values.
> 
> I think more work needs to be done to determine the source of the
> overhead. It's not clear to me why there'd be an increase in tuples_equal()
> calls in the tests upthread.
> 
> You are right, looking closely, in fact, we most of the time do much less 
> tuples_equal() with index scan.
> 
> I've done some profiling with perf, and created flame graphs for the apply worker, with the
> test described above: -- case 1 (All values are duplicated). I used the following commands:
> - perf record -F 99 -p 122555 -g -- sleep 60
> -  perf script | ./http://stackcollapse-perf.pl > out.perf-folded
> -  ./http://flamegraph.pl out.perf-folded > perf_[index|seq]_scan.svg
> 
> I attached both flame graphs. I do not see anything specific regarding what the patch does, but
> instead the difference mostly seems to come down to index scan vs sequential scan related
> functions. As I continue to investigate, I thought it might be useful to share the flame graphs
> so that more experienced hackers could comment on the difference.   
> 
> Regarding my own end-to-end tests: In some runs, the sequential scan is indeed faster for case-1. But, 
> when I execute update tbl set a=a+1; for 50 consecutive times, and measure end to end performance, I see
> much better results for index scan, only case-1 is on-par as mostly I'd expect.
> 
> Case-1, running the update 50 times and waiting all changes applied
> • index scan: 2minutes 36 seconds
> • sequential scan: 2minutes 30 seconds
> Case-2, running the update 50 times and waiting all changes applied
> • index scan: 1 minutes, 2 seconds
> • sequential scan: 2minutes 30 seconds
> Case-7, running the update 50 times and waiting all changes applied
> • index scan: 6 seconds
> • sequential scan: 2minutes 26seconds
> 
> 
> > # Result
> The time executing update (the average of 3 runs is taken, the unit is
> milliseconds):
> 
> Shi Yu, could it be possible for you to re-run the tests with some more runs, and share the average?
> I suspect maybe your test results have a very small pool size, and some runs are making
> the average slightly problematic.
>  
> In my tests, I shared the total time, which is probably also fine.
>

Thanks for your reply, I re-tested (based on
v25_0001_use_index_on_subs_when_pub_rep_ident_full.patch) and took the average
of 100 runs. The results are as follows. The unit is milliseconds.

case1
sequential scan: 1348.57
index scan: 3785.15

case2
sequential scan: 1350.26
index scan: 1754.01

case3
sequential scan: 1350.13
index scan: 1340.97

There was still some degradation in the first two cases. There are some gaps in
our test results. Some information about my test is as follows.

a. Some parameters specified in postgresql.conf.
shared_buffers = 8GB
checkpoint_timeout = 30min
max_wal_size = 20GB
min_wal_size = 10GB
autovacuum = off

b. Executed SQL.
I executed TRUNCATE and INSERT before each UPDATE. I am not sure if you did the
same, or just executed 50 consecutive UPDATEs. If the latter one, there would be
lots of old tuples and this might have a bigger impact on sequential scan. I
tried this case (which executes 50 consecutive UPDATEs) and also saw that the
overhead is smaller than before.


Besides, I looked into the regression of this patch with `gprof`. Some results
are as follows. I think with single buffer lock, sequential scan can scan
multiple tuples (see heapgettup()), while index scan can only scan one tuple. So
in case1, which has lots of duplicate values and more tuples need to be scanned,
index scan takes longer time.

- results of `gprof`
case1:
master
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  1.37      0.66     0.01   654312     0.00     0.00  LWLockAttemptLock
  0.00      0.73     0.00   573358     0.00     0.00  LockBuffer
  0.00      0.73     0.00    10014     0.00     0.06  heap_getnextslot

patched
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  9.70      1.27     0.36 50531459     0.00     0.00  LWLockAttemptLock
  3.23      2.42     0.12 100259200     0.00     0.00  LockBuffer
  6.20      1.50     0.23 50015101     0.00     0.00  heapam_index_fetch_tuple
  4.04      2.02     0.15 50015101     0.00     0.00  index_fetch_heap
  1.35      3.21     0.05    10119     0.00     0.00  index_getnext_slot

case7:
master
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  2.67      0.60     0.02   654582     0.00     0.00  LWLockAttemptLock
  0.00      0.75     0.00   573488     0.00     0.00  LockBuffer
  0.00      0.75     0.00    10014     0.00     0.06  heap_getnextslot

patched
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
  0.00      0.12     0.00   241979     0.00     0.00  LWLockAttemptLock
  0.00      0.12     0.00   180884     0.00     0.00  LockBuffer
  0.00      0.12     0.00    10101     0.00     0.00  heapam_index_fetch_tuple
  0.00      0.12     0.00    10101     0.00     0.00  index_fetch_heap
  0.00      0.12     0.00    10119     0.00     0.00  index_getnext_slot

Regards,
Shi Yu

pgsql-hackers by date:

From: Michael Paquier
Date: 02 March 2023, 07:59:35
Subject: Re: a very minor bug and a couple of comment changes for basebackup.c

From: Peter Eisentraut
Date: 02 March 2023, 08:12:41
Subject: Re: Timeline ID hexadecimal format

RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher - Mailing list pgsql-hackers

Previous

Next