BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query - Mailing list pgsql-bugs
From | PG Bug reporting form |
---|---|
Subject | BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query |
Date | |
Msg-id | 15585-324ff6a93a18da46@postgresql.org Whole thread Raw |
Responses |
Re: BUG #15585: infinite DynamicSharedMemoryControlLock waiting inparallel query
|
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 15585 Logged by: Sergei Kornilov Email address: sk@zsrv.org PostgreSQL version: 10.6 Operating system: Ubuntu 16.04.2 LTS Description: Hello I have encountered issue with infinite parallel query: many processes waiting LWLock DynamicSharedMemoryControlLock (client backend and several background worker) and one client backend with IPC BgWorkerShutdown waiting forever (some hours) and unkillable. Initially this issue was spotted 1 jan on 10.5 version, then i shutdown database (immediate, i was unable shutdown cleanly), install gdb, dbg packages and upgrade to 10.6. Yesterday i have this issue again and save some backtraces (query was hang about 0.5 hour). It seems a bit similar to BUG #15290 [1], but this server was already 10.5 and another LWLock. Parallel leader: datid | 16400 datname | db pid | 28927 usesysid | 852050794 usename | user application_name | ... client_addr | 127.0.0.1 client_hostname | client_port | 58326 backend_start | 2019-01-09 10:54:02.50649+00 xact_start | 2019-01-09 10:56:55.802265+00 query_start | 2019-01-09 10:56:55.80524+00 state_change | 2019-01-09 10:56:55.805243+00 wait_event_type | IPC wait_event | BgWorkerShutdown state | active backend_xid | backend_xmin | 2047764791 query | SELECT ... backend_type | client backend #0 0x00007f198e58b9f3 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:84 #1 0x0000555d7fe393d1 in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffe64b3c320, cur_timeout=-1, set=0x555d8108fec8) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/storage/ipc/latch.c:1048 #2 WaitEventSetWait (set=set@entry=0x555d8108fec8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffe64b3c320, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=134217728) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/storage/ipc/latch.c:1000 #3 0x0000555d7fe39844 in WaitLatchOrSocket (latch=0x7f15ab5a80b4, wakeEvents=wakeEvents@entry=17, sock=sock@entry=-1, timeout=-1, timeout@entry=0, wait_event_info=wait_event_info@entry=134217728) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/storage/ipc/latch.c:385 #4 0x0000555d7fe398f5 in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=17, timeout=timeout@entry=0, wait_event_info=wait_event_info@entry=134217728) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/storage/ipc/latch.c:339 #5 0x0000555d7fddce3b in WaitForBackgroundWorkerShutdown (handle=0x555d810b2db0) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/bgworker.c:1147 #6 0x0000555d7fc0211d in WaitForParallelWorkersToExit (pcxt=0x555d810b2938, pcxt=0x555d810b2938) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/access/transam/parallel.c:659 #7 0x0000555d7fc02e47 in DestroyParallelContext (pcxt=0x555d810b2938) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/access/transam/parallel.c:741 #8 0x0000555d7fc03458 in AtEOXact_Parallel (isCommit=isCommit@entry=0 '\000') at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/access/transam/parallel.c:1010 #9 0x0000555d7fc0e807 in AbortTransaction () at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/access/transam/xact.c:2566 #10 0x0000555d7fc0ef65 in AbortCurrentTransaction () at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/access/transam/xact.c:3125 #11 0x0000555d7fe5c7cd in PostgresMain (argc=1, argv=argv@entry=0x555d81009a08, dbname=0x555d810370f8 "shard_sw_production", username=0x555d81037128 "startwire_background") at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/tcop/postgres.c:3908 #12 0x0000555d7fb93509 in BackendRun (port=0x555d81032d80) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:4405 #13 BackendStartup (port=0x555d81032d80) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:4077 #14 ServerLoop () at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:1755 #15 0x0000555d7fdeb6bb in PostmasterMain (argc=5, argv=<optimized out>) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:1363 #16 0x0000555d7fb94905 in main (argc=5, argv=0x555d81007850) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/main/main.c:228 One another locked process: datid | 16400 datname | db pid | 22070 usesysid | 852050794 usename | user application_name | ... client_addr | 127.0.0.1 client_hostname | client_port | 57762 backend_start | 2019-01-09 10:16:45.773862+00 xact_start | 2019-01-09 10:57:38.902604+00 query_start | 2019-01-09 10:57:38.90572+00 state_change | 2019-01-09 10:57:38.905724+00 wait_event_type | LWLock wait_event | DynamicSharedMemoryControlLock state | active backend_xid | backend_xmin | 2047768195 query | SELECT ... backend_type | client backend #0 0x00007f199087a827 in futex_abstimed_wait_cancelable (private=128, abstime=0x0, expected=0, futex_word=0x7f11d620c438) at ../sysdeps/unix/sysv/linux/futex-internal.h:205 #1 do_futex_wait (sem=sem@entry=0x7f11d620c438, abstime=0x0) at sem_waitcommon.c:111 #2 0x00007f199087a8d4 in __new_sem_wait_slow (sem=0x7f11d620c438, abstime=0x0) at sem_waitcommon.c:181 #3 0x00007f199087a97a in __new_sem_wait (sem=sem@entry=0x7f11d620c438) at sem_wait.c:29 #4 0x0000555d7fdd9782 in PGSemaphoreLock (sema=0x7f11d620c438) at pg_sema.c:316 #5 0x0000555d7fe4ca0c in LWLockAcquire (lock=0x7f11d620e600, mode=mode@entry=LW_EXCLUSIVE) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/storage/lmgr/lwlock.c:1233 #6 0x0000555d7fe37717 in dsm_create (size=size@entry=380920, flags=flags@entry=1) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/storage/ipc/dsm.c:493 #7 0x0000555d7fc0248f in InitializeParallelDSM (pcxt=pcxt@entry=0x555d81388958) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/access/transam/parallel.c:268 #8 0x0000555d7fd266c7 in ExecInitParallelPlan (planstate=0x555d81105440, estate=0x555d811047e8, nworkers=4) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/executor/execParallel.c:470 #9 0x0000555d7fd36260 in ExecGather (pstate=0x555d81105120) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/executor/nodeGather.c:158 #10 0x0000555d7fd2e70c in ExecProcNode (node=0x555d81105120) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/include/executor/executor.h:250 #11 fetch_input_tuple (aggstate=aggstate@entry=0x555d81104a78) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/executor/nodeAgg.c:695 #12 0x0000555d7fd309cb in agg_retrieve_direct (aggstate=0x555d81104a78) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/executor/nodeAgg.c:2347 #13 ExecAgg (pstate=0x555d81104a78) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/executor/nodeAgg.c:2158 #14 0x0000555d7fd2250b in ExecProcNode (node=0x555d81104a78) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/include/executor/executor.h:250 #15 ExecutePlan (execute_once=<optimized out>, dest=0x555d810b5fc8, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x555d81104a78, estate=0x555d811047e8) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/executor/execMain.c:1723 #16 standard_ExecutorRun (queryDesc=0x555d81111338, direction=<optimized out>, count=0, execute_once=<optimized out>) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/executor/execMain.c:364 #17 0x00007f1987338ea5 in pgss_ExecutorRun (queryDesc=0x555d81111338, direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../contrib/pg_stat_statements/pg_stat_statements.c:889 #18 0x0000555d7fe5ebfb in PortalRunSelect (portal=portal@entry=0x555d81026d48, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x555d810b5fc8) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/tcop/pquery.c:932 #19 0x0000555d7fe60210 in PortalRun (portal=portal@entry=0x555d81026d48, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', run_once=<optimized out>, dest=dest@entry=0x555d810b5fc8, altdest=altdest@entry=0x555d810b5fc8, completionTag=0x7ffe64b3c780 "") at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/tcop/pquery.c:773 #20 0x0000555d7fe5cd89 in exec_execute_message (max_rows=9223372036854775807, portal_name=0x555d810b5bb8 "") at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/tcop/postgres.c:2007 #21 PostgresMain (argc=<optimized out>, argv=argv@entry=0x555d81009a08, dbname=0x555d810370f8 "shard_sw_production", username=<optimized out>) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/tcop/postgres.c:4180 #22 0x0000555d7fb93509 in BackendRun (port=0x555d81033e30) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:4405 #23 BackendStartup (port=0x555d81033e30) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:4077 #24 ServerLoop () at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:1755 #25 0x0000555d7fdeb6bb in PostmasterMain (argc=5, argv=<optimized out>) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/postmaster/postmaster.c:1363 #26 0x0000555d7fb94905 in main (argc=5, argv=0x555d81007850) at /build/postgresql-10-4UiiBA/postgresql-10-10.6/build/../src/backend/main/main.c:228 Also i found relevant (and strange) messages in logs: 2019-01-09 10:56:55.947 UTC 31156 @ from [vxid:16/1632 txid:0] [] ERROR: dsa_area could not attach to segment 2019-01-09 10:56:55.947 UTC 31156 @ from [vxid:16/1632 txid:0] [] STATEMENT: SELECT ... 2019-01-09 10:56:55.950 UTC 14433 @ from [vxid: txid:0] [] LOG: worker process: parallel worker for PID 28927 (PID 31156) exited with exit code 1 2019-01-09 10:56:55.950 UTC 28927 user@db from 127.0.0.1 [vxid:102/913 txid:0] [SELECT] ERROR: dsa_area could not attach to segment 2019-01-09 10:56:55.950 UTC 28927 user@db from 127.0.0.1 [vxid:102/913 txid:0] [SELECT] STATEMENT: SELECT ... 2019-01-09 10:56:55.954 UTC 31157 @ from [vxid:32/1740 txid:0] [] FATAL: terminating connection due to administrator command 2019-01-09 10:56:55.954 UTC 31157 @ from [vxid:32/1740 txid:0] [] STATEMENT: SELECT ... 2019-01-09 10:56:55.954 UTC 31157 @ from [vxid:32/0 txid:0] [] FATAL: cannot unpin a segment that is not pinned log_line_prefix = "%m %p %u@%d from %h [vxid:%v txid:%x] [%i]", without another log messages +- 5 min. Pid 28927 is parallel leader with BgWorkerShutdown, 31156 is parallel worker for this query, 31157 seems another parallel worker for this query. Unfortunally i notice it later and have no backtrace. But i have pg_stat_activity row for this process: datid | 16400 datname | db pid | 22070 usesysid | 852050794 usename | user application_name | ... client_addr | client_hostname | client_port | backend_start | 2019-01-09 10:56:55.839719+00 xact_start | query_start | 2019-01-09 10:56:55.80524+00 state_change | 2019-01-09 10:56:55.840737+00 wait_event_type | LWLock wait_event | DynamicSharedMemoryControlLock state | active backend_xid | backend_xmin | query | SELECT ... (same query as stuck leader) backend_type | background worker Postgresql installed from PGDG repo, no external modules are installed. Queries uses one native partitioned table with Parallel Index Scan or Parallel Bitmap Heap Scan. Query itself works ok, i can not reproduce this issue by hand. Same log output "dsa_area could not attach to segment" + "cannot unpin a segment that is not pinned" was 1 jan. Also it is expected have CHECK_FOR_INTERRUPTS in WaitForBackgroundWorkerShutdown with only one caller in DestroyParallelContext: /* * We can't finish transaction commit or abort until all of the workers * have exited. This means, in particular, that we can't respond to * interrupts at this stage. */ HOLD_INTERRUPTS(); WaitForParallelWorkersToExit(pcxt); RESUME_INTERRUPTS(); Can parallel worker wait something response from leader? thanks in advance! regards, Sergei [1] https://www.postgresql.org/message-id/flat/153228422922.1395.1746424054206154747%40wrigleys.postgresql.org
pgsql-bugs by date: