[HACKERS] Multiple table synchronizations are processed serially - Mailing list pgsql-hackers
From | Masahiko Sawada |
---|---|
Subject | [HACKERS] Multiple table synchronizations are processed serially |
Date | |
Msg-id | CAD21AoC2KJdavS7MFffmSsRc1dn3Vg_0xmuc=UpBrZ-_MUxh-Q@mail.gmail.com Whole thread Raw |
Responses |
Re: [HACKERS] Multiple table synchronizations are processed serially
|
List | pgsql-hackers |
Hi, While testing table synchronization in logical replication, I found that multiple table synchronizations for a subscription are processed serially due to lock wait. I setup pgbench tables on publisher (scalefactor = 1000) and on subscriber, and truncated these tables on subscriber, and then created publication for all pgbench tables. When I created a subscription I got the following log messages. 2017-05-19 13:12:56.311 JST [54970] LOG: logical replication apply for subscription hoge_sub started 2017-05-19 13:12:56.314 JST [54970] LOG: starting logical replication worker for subscription "hoge_sub" 2017-05-19 13:12:56.317 JST [54972] LOG: logical replication sync for subscription hoge_sub, table pgbench_accounts started 2017-05-19 13:12:57.315 JST [54970] LOG: starting logical replication worker for subscription "hoge_sub" 2017-05-19 13:12:57.318 JST [54974] LOG: logical replication sync for subscription hoge_sub, table pgbench_branches started 2017-05-19 13:12:58.317 JST [54970] LOG: starting logical replication worker for subscription "hoge_sub" 2017-05-19 13:12:58.320 JST [54976] LOG: logical replication sync for subscription hoge_sub, table pgbench_history started 2017-05-19 13:12:59.319 JST [54970] LOG: starting logical replication worker for subscription "hoge_sub" 2017-05-19 13:12:59.322 JST [54978] LOG: logical replication sync for subscription hoge_sub, table pgbench_tellers started Seems all four table sync workers are launched at the almost same time, but three workers of them get stuck in idle transaction state when creating replication slot. That is these waiting workers cannot proceed its work until first connected table sync worker finishes. ps command shows the followings. 54898 ? Ss 0:00 postgres: bgworker: logical replication launcher 54970 ? Ss 0:00 postgres: bgworker: logical replication worker for subscription 16432 54971 ? Ss 0:00 postgres: wal sender process masahiko [local] idle 54972 ? Rs 10:38 postgres: bgworker: logical replication worker for subscription 16432 sync 16395 54973 ? Ss 1:52 postgres: wal sender process masahiko [local] COPY 54974 ? Ss 0:00 postgres: bgworker: logical replication worker for subscription 16432 sync 16398 54975 ? Ss 0:00 postgres: wal sender process masahiko [local] idle in transaction waiting 54976 ? Ss 0:00 postgres: bgworker: logical replication worker for subscription 16432 sync 16389 54977 ? Ss 0:00 postgres: wal sender process masahiko [local] idle in transaction waiting 54978 ? Ss 0:00 postgres: bgworker: logical replication worker for subscription 16432 sync 16392 54979 ? Ss 0:00 postgres: wal sender process masahiko [local] idle in transaction waiting Also, here is backtrace of the waiting wal sender process. #0 0x00007f24b4030783 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x000000000084c58d in WaitEventSetWaitBlock (set=0x25ed8e0, cur_timeout=-1, occurred_events=0x7fff1ea322f0, nevents=1) at latch.c:1045 #2 0x000000000084c454 in WaitEventSetWait (set=0x25ed8e0, timeout=-1, occurred_events=0x7fff1ea322f0, nevents=1, wait_event_info=50331652) at latch. c:997 #3 0x000000000084bbe5 in WaitLatchOrSocket (latch=0x7f24ad385024, wakeEvents=1, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:385 #4 0x000000000084bac6 in WaitLatch (latch=0x7f24ad385024, wakeEvents=1, timeout=0, wait_event_info=50331652) at latch.c:339 #5 0x0000000000863632 in ProcSleep (locallock=0x25f2fb0, lockMethodTable=0xbdb020) at proc.c:1255 #6 0x000000000085c83c in WaitOnLock (locallock=0x25f2fb0, owner=0x2604dd8) at lock.c:1702 #7 0x000000000085b541 in LockAcquireExtended (locktag=0x7fff1ea32800, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000', reportMemoryError=1 '\001\ ') at lock.c:998 #8 0x000000000085ab46 in LockAcquire (locktag=0x7fff1ea32800, lockmode=5, sessionLock=0 '\000', dontWait=0 '\000') at lock.c:688 #9 0x0000000000859a16 in XactLockTableWait (xid=598, rel=0x0, ctid=0x0, oper=XLTW_None) at lmgr.c:587 #10 0x000000000080d87a in SnapBuildWaitSnapshot (running=0x25ed8a8, cutoff=599) at snapbuild.c:1400 #11 0x000000000080d64c in SnapBuildFindSnapshot (builder=0x26ab530, lsn=14202458264, running=0x25ed8a8) at snapbuild.c:1311 #12 0x000000000080d1bb in SnapBuildProcessRunningXacts (builder=0x26ab530, lsn=14202458264, running=0x25ed8a8) at snapbuild.c:1106 #13 0x00000000007fb779 in DecodeStandbyOp (ctx=0x25f7090, buf=0x7fff1ea32930) at decode.c:314 #14 0x00000000007fb28f in LogicalDecodingProcessRecord (ctx=0x25f7090, record=0x25f7350) at decode.c:117 #15 0x00000000007ff0d5 in DecodingContextFindStartpoint (ctx=0x25f7090) at logical.c:458 #16 0x00000000008141c8 in CreateReplicationSlot (cmd=0x25ecba8) at walsender.c:941 #17 0x0000000000814ec5 in exec_replication_command (cmd_string=0x26511e0 "CREATE_REPLICATION_SLOT \"hoge_sub_16432_sync_16398\" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT") at walsender.c:1508 #18 0x000000000087d2d5 in PostgresMain (argc=1, argv=0x25fb2b8, dbname=0x25fb100 "postgres", username=0x25fb0d8 "masahiko") at postgres.c:4071 #19 0x00000000007e0ffd in BackendRun (port=0x25f37f0) at postmaster.c:4334 #20 0x00000000007e077d in BackendStartup (port=0x25f37f0) at postmaster.c:4006 #21 0x00000000007dcd86 in ServerLoop () at postmaster.c:1743 #22 0x00000000007dc44a in PostmasterMain (argc=5, argv=0x25cceb0) at postmaster.c:1351 #23 0x00000000007184a2 in main (argc=5, argv=0x25cceb0) at main.c:228 It also blocks subscriptions that will be created later. I'd say it's not an expected behavior. Any thoughts? Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
pgsql-hackers by date: