[sqlsmith] Failed assertions on parallel worker shutdown - Mailing list pgsql-hackers
| From | Andreas Seltenreich |
|---|---|
| Subject | [sqlsmith] Failed assertions on parallel worker shutdown |
| Date | |
| Msg-id | 87r3cu78qo.fsf@elite.ansel.ydns.eu Whole thread Raw |
| Responses |
Re: [sqlsmith] Failed assertions on parallel worker shutdown
|
| List | pgsql-hackers |
There's another class of parallel worker core dumps when testing master
with sqlsmith. In these cases, the following assertion fails for all
workers simulataneously:
TRAP: FailedAssertion("!(mqh->mqh_partial_bytes <= nbytes)", File: "shm_mq.c", Line: 386)
The backtraces of the controlling process is always in
ExecShutdownGatherWorkers. The queries always work fine on re-running,
so I guess there is some race condition on worker shutdown? Backtraces
below.
regards
andreas
Core was generated by `postgres: bgworker: parallel worker for PID 30525 '.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f5a3df91067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007f5a3df91067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007f5a3df92448 in __GI_abort () at abort.c:89
#2 0x00000000007eabe1 in ExceptionalCondition (conditionName=conditionName@entry=0x984e10 "!(mqh->mqh_partial_bytes <=
nbytes)",errorType=errorType@entry=0x82a75d "FailedAssertion", fileName=fileName@entry=0x984b8c "shm_mq.c",
lineNumber=lineNumber@entry=386)at assert.c:54
#3 0x00000000006d8042 in shm_mq_sendv (mqh=0x25f17b8, iov=iov@entry=0x7ffc6352af00, iovcnt=iovcnt@entry=1,
nowait=<optimizedout>) at shm_mq.c:386
#4 0x00000000006d807d in shm_mq_send (mqh=<optimized out>, nbytes=<optimized out>, data=<optimized out>,
nowait=<optimizedout>) at shm_mq.c:327
#5 0x00000000005d96b9 in ExecutePlan (dest=0x25f1850, direction=<optimized out>, numberTuples=0, sendTuples=<optimized
out>,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x2612da8, estate=0x2612658) at
execMain.c:1596
#6 standard_ExecutorRun (queryDesc=0x261a660, direction=<optimized out>, count=0) at execMain.c:338
#7 0x00000000005dc7cf in ParallelQueryMain (seg=<optimized out>, toc=0x7f5a3ea6c000) at execParallel.c:735
#8 0x00000000004e617b in ParallelWorkerMain (main_arg=<optimized out>) at parallel.c:1035
#9 0x0000000000683862 in StartBackgroundWorker () at bgworker.c:726
#10 0x000000000068e9a2 in do_start_bgworker (rw=0x2590760) at postmaster.c:5531
#11 maybe_start_bgworker () at postmaster.c:5706
#12 0x000000000046cbba in ServerLoop () at postmaster.c:1762
#13 0x000000000069081e in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x256d580) at postmaster.c:1298
#14 0x000000000046d80d in main (argc=4, argv=0x256d580) at main.c:228
(gdb) attach 30525
0x00007f5a3e044e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
81 ../sysdeps/unix/syscall-template.S: No such file or directory.
(gdb) bt
#0 0x00007f5a3e044e33 in __epoll_wait_nocancel () at ../sysdeps/unix/syscall-template.S:81
#1 0x00000000006d1b4e in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffc6352aec0, cur_timeout=-1,
set=0x44251c0)at latch.c:981
#2 WaitEventSetWait (set=set@entry=0x44251c0, timeout=timeout@entry=-1,
occurred_events=occurred_events@entry=0x7ffc6352aec0,nevents=nevents@entry=1) at latch.c:935
#3 0x00000000006d1f96 in WaitLatchOrSocket (latch=0x7f5a3d898494, wakeEvents=wakeEvents@entry=1, sock=sock@entry=-1,
timeout=timeout@entry=-1)at latch.c:347
#4 0x00000000006d205d in WaitLatch (latch=<optimized out>, wakeEvents=wakeEvents@entry=1, timeout=timeout@entry=-1) at
latch.c:302
#5 0x00000000004e6d64 in WaitForParallelWorkersToFinish (pcxt=0x442d4e8) at parallel.c:537
#6 0x00000000005dcf84 in ExecParallelFinish (pei=0x441cab8) at execParallel.c:541
#7 0x00000000005eeead in ExecShutdownGatherWorkers (node=node@entry=0x3e3a070) at nodeGather.c:416
#8 0x00000000005ef389 in ExecShutdownGather (node=0x3e3a070) at nodeGather.c:430
#9 0x00000000005dd03d in ExecShutdownNode (node=0x3e3a070) at execProcnode.c:807
#10 0x000000000061ad73 in planstate_tree_walker (planstate=0x3e361a8, walker=0x5dd010 <ExecShutdownNode>, context=0x0)
atnodeFuncs.c:3442
#11 0x000000000061ad73 in planstate_tree_walker (planstate=0xf323c30, walker=0x5dd010 <ExecShutdownNode>, context=0x0)
atnodeFuncs.c:3442
#12 0x000000000061ad73 in planstate_tree_walker (planstate=0xf323960, walker=0x5dd010 <ExecShutdownNode>, context=0x0)
atnodeFuncs.c:3442
#13 0x00000000005d96da in ExecutePlan (dest=0xb826868, direction=<optimized out>, numberTuples=0, sendTuples=<optimized
out>,operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0xf323960, estate=0xf322b28) at
execMain.c:1576
#14 standard_ExecutorRun (queryDesc=0xddca888, direction=<optimized out>, count=0) at execMain.c:338
#15 0x00000000006f6e88 in PortalRunSelect (portal=portal@entry=0x258ccc8, forward=forward@entry=1 '\001', count=0,
count@entry=9223372036854775807,dest=dest@entry=0xb826868) at pquery.c:946
#16 0x00000000006f83ae in PortalRun (portal=0x258ccc8, count=9223372036854775807, isTopLevel=<optimized out>,
dest=0xb826868,altdest=0xb826868, completionTag=0x7ffc6352b3d0 "") at pquery.c:787
#17 0x00000000006f5c63 in exec_simple_query (query_string=<optimized out>) at postgres.c:1094
#18 PostgresMain (argc=39374024, argv=0x25ed130, dbname=0x256e480 "regression", username=0x25ed308 "0\321^\002") at
postgres.c:4059
#19 0x000000000046c8b2 in BackendRun (port=0x25935d0) at postmaster.c:4258
#20 BackendStartup (port=0x25935d0) at postmaster.c:3932
#21 ServerLoop () at postmaster.c:1690
#22 0x000000000069081e in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x256d580) at postmaster.c:1298
#23 0x000000000046d80d in main (argc=4, argv=0x256d580) at main.c:228
pgsql-hackers by date: