BUG #17580: use pg_terminate_backend to terminate a wal sender process may wait a long time - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #17580: use pg_terminate_backend to terminate a wal sender process may wait a long time |
Date | |
Msg-id | 17580-849c1d5b6d7eb422@postgresql.org Whole thread Raw |
Responses |
Re: BUG #17580: use pg_terminate_backend to terminate a wal sender process may wait a long time
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 17580 Logged by: whale song Email address: 13207961@qq.com PostgreSQL version: 10.5 Operating system: centos Description: 1) Create a table for test create table test666(id int primary key, number int); 2) Use pg_recvlogical to subscribe 2.1) Create a replicate slot pg_recvlogical -h 9.223.28.176 -p 11002 -d test -U tbase --slot wal2json_pg_test02 --create-slot -P wal2json 2.2) Start to subscribe, and use filter-tables to filter test table test666 pg_recvlogical -h 9.223.28.176 -p 11002 -d test -U tbase --slot wal2json_pg_test02 --start -o pretty-print=1 -o format-version=2 -o filter-tables="public.test666" -f - 3) Insert a lot of data to the table in one transaction insert into test666 values (generate_series(1000000001,2000000000), generate_series(1000000001,2000000000)); It may cause a long time, maybe 50 minutes. 4) Use pg_replication_slots to get wal sender process pid(active_pid) select *,pg_current_wal_lsn() from pg_replication_slots ; 5) When step 3 finish, use pstack to watch wal sender process stack, when it is loop in ReorderBufferCommit, kill pg_recvlogical which start in step 2.2 pstack 80743 #0 0x0000000000c400d9 in ResourceArrayAdd (resarr=0x1576970, value=139941599240896) at resowner.c:359 #1 0x0000000000c3fef4 in ResourceArrayEnlarge (resarr=0x1576970) at resowner.c:252 #2 0x0000000000c411f4 in ResourceOwnerEnlargeRelationRefs (owner=0x15768f0) at resowner.c:1129 #3 0x0000000000be0899 in RelationIncrementReferenceCount (rel=0x7f46b14fe2c0) at relcache.c:2589 #4 0x0000000000be05da in RelationIdGetRelation (relationId=1177785) at relcache.c:2451 #5 0x00007f4e0d649b9e in pg_decode_write_tuple (ctx=0x13fc5a8, relation=0x7f46b14fcf38, tuple=0x315b910, kind=PGOUTPUTJSON_IDENTITY) at wal2json.c:1503 #6 0x00007f4e0d64a514 in pg_decode_write_change (ctx=0x13fc5a8, txn=0x15957a8, relation=0x7f46b14fcf38, change=0x33bc9f8) at wal2json.c:1747 #7 0x00007f4e0d64a9db in pg_decode_change_v2 (ctx=0x13fc5a8, txn=0x15957a8, relation=0x7f46b14fcf38, change=0x33bc9f8) at wal2json.c:1865 #8 0x00007f4e0d648b2b in pg_decode_change (ctx=0x13fc5a8, txn=0x15957a8, relation=0x7f46b14fcf38, change=0x33bc9f8) at wal2json.c:1098 #9 0x00000000009cf3d1 in change_cb_wrapper (cache=0x1410fb8, txn=0x15957a8, relation=0x7f46b14fcf38, change=0x33bc9f8) at logical.c:730 #10 0x00000000009d75df in ReorderBufferCommit (rb=0x1410fb8, xid=174028801, commit_lsn=44579135148872, end_lsn=44579135148936, commit_time=712222669188947, origin_id=0, origin_lsn=0) at reorderbuffer.c:1445 #11 0x00000000009cb15b in DecodeCommit (ctx=0x13fc5a8, buf=0x7fff2574d600, parsed=0x7fff2574d530, xid=174028801) at decode.c:713 #12 0x00000000009ca76b in DecodeXactOp (ctx=0x13fc5a8, buf=0x7fff2574d600) at decode.c:335 #13 0x00000000009ca401 in LogicalDecodingProcessRecord (ctx=0x13fc5a8, record=0x13fc880) at decode.c:175 #14 0x00000000009ead45 in XLogSendLogical () at walsender.c:3152 #15 0x00000000009ea09f in WalSndLoop (send_data=0x9eaca9 <XLogSendLogical>) at walsender.c:2520 #16 0x00000000009e8aed in StartLogicalReplication (cmd=0x151bb48) at walsender.c:1445 #17 0x00000000009e944e in exec_replication_command (cmd_string=0x1520408 "START_REPLICATION SLOT subs_f524p217b7_dn0001 LOGICAL 285C/F8FFEA50 ( \"format-version\" '2' )") at walsender.c:1887 #18 0x0000000000a71ed8 in PostgresMain (argc=1, argv=0x1419f58, dbname=0x1419f28 "prd_jxcard_lingwang", username=0x1419f08 "mingchen_read") at postgres.c:6118 #19 0x00000000009b0faa in BackendRun (port=0x14135b0) at postmaster.c:5071 #20 0x00000000009b0753 in BackendStartup (port=0x14135b0) at postmaster.c:4743 #21 0x00000000009ac3ae in ServerLoop () at postmaster.c:1995 #22 0x00000000009aba35 in PostmasterMain (argc=5, argv=0x13dc770) at postmaster.c:1603 #23 0x0000000000878c44 in main (argc=5, argv=0x13dc770) at main.c:233 6) Use pg_replication_slots to query the replication slots infomation, the active_pid is still active. Use pg_terminate_backend to terminate active_pid, it returns true, but the active_pid is still active. 7) The active_pid wal sender process exit after a long time waiting I think maybe we should add a CHECK_FOR_INTERRUPTS in the loop code of the ReorderBufferCommit function. Likes: void ReorderBufferCommit(…………) { ………… PG_TRY(); { ………… while ((change = ReorderBufferIterTXNNext(rb, iterstate)) != NULL) { Relation relation = NULL; Oid reloid; /* Add a CHECK_FOR_INTERRUPTS here? */ ………… } ………… }
pgsql-bugs by date: