Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process - Mailing list pgsql-bugs
From | Keith Fiske |
---|---|
Subject | Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process |
Date | |
Msg-id | CAG1_KcB-L6AMka5NZ+JU8=0o7T9i-dqW0d90rgCnmoaSswBtaA@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum process (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: BUG #10533: 9.4 beta1 assertion failure in autovacuum
process
|
List | pgsql-bugs |
Not sure if this is the same issue originally reported, but the description sounded the same. I can get this to happen pretty reliably in 9.4beta1 with the pgtap test suite for one of my extensions - https://github.com/omniti-labs/mimeo Doesn't happen at the same test or with the same table every time (usually after test08 when large write batches complete), but the error in my postgres logs is the same every time. I've got shared_buffers set fairly low (128mb) since this is just my test instance I run in a VM (saw you mention running out of memory). My tap tests pass without any issues in 9.3, 9.2 and 9.1. So far every time I've run them with 9.4 it's failed at some point when autovacuum kicked in. If you need anymore info let me know. From postgresql logs: TRAP: FailedAssertion("!(CritSectionCount == 0 || (CurrentMemoryContext) == ErrorContext || (MyAuxProcType == CheckpointerProcess))", File: "mcxt.c", Line: 670) 2014-06-06 15:54:29 EDT [] [23575] [53921aa3.5c17-4] db=,user=,e=00000 LOG: server process (PID 25555) was terminated by signal 6: Aborted 2014-06-06 15:54:29 EDT [] [23575] [53921aa3.5c17-5] db=,user=,e=00000 DETAIL: Failed process was running: autovacuum: VACUUM ANALYZE mimeo.mimeo_source_logdel_test_source_condition_q 2014-06-06 15:54:29 EDT [] [23575] [53921aa3.5c17-6] db=,user=,e=00000 LOG: terminating any other active server processes 2014-06-06 15:54:29 EDT [local] [[local]] [25559] [53921bee.63d7-38] db=keith,user=keith,e=57P02 WARNING: terminating connection because of crash of another server process 2014-06-06 15:54:29 EDT [local] [[local]] [25559] [53921bee.63d7-39] db=keith,user=keith,e=57P02 DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2014-06-06 15:54:29 EDT [local] [[local]] [25559] [53921bee.63d7-40] db=keith,user=keith,e=57P02 HINT: In a moment you should be able to reconnect to the database and repeat your command. 2014-06-06 15:54:29 EDT [local] [[local]] [25559] [53921bee.63d7-41] db=keith,user=keith,e=57P02 CONTEXT: SQL statement "SELECT dblink_exec('mimeo_logdel_refresh_mimeo_dest.logdel_test_dest','WITH a AS (SELECT col2,col1 FROM mimeo.mimeo_source_logdel_test_source2_q ORDER BY col2,col1 LIMIT ALL) UPDATE mimeo.mimeo_source_logdel_test_source2_q b SET processed = true FROM a WHERE a.col2 = b.col2 AND a.col1 = b.col1')" PL/pgSQL function refresh_logdel(text,integer,boolean,boolean,boolean) line 171 at EXECUTE statement Back trace: #0 0x00007f61fd14f037 in raise () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007f61fd152698 in abort () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x000000000079ea87 in ExceptionalCondition (conditionName=conditionName@entry=0x9554b0 "!(CritSectionCount == 0 || (CurrentMemoryContext) == ErrorContext || (MyAuxProcType == CheckpointerProcess))", errorType=errorType@entry=0x7d94b0 "FailedAssertion", fileName=fileName@entry=0x95514e "mcxt.c", lineNumber=lineNumber@entry=670) at assert.c:54 #3 0x00000000007c15e0 in palloc (size=128) at mcxt.c:670 #4 0x00000000007d8923 in psprintf (fmt=0x975589 "base/%u/%u") at psprintf.c:60 #5 0x00000000006b4c17 in mdopen (behavior=EXTENSION_FAIL, forknum=MAIN_FORKNUM, reln=0x1107360) at md.c:562 #6 mdopen (reln=0x1107360, forknum=MAIN_FORKNUM, behavior=EXTENSION_FAIL) at md.c:552 #7 0x00000000006b5d47 in _mdfd_getseg (reln=reln@entry=0x1107360, forknum=forknum@entry=MAIN_FORKNUM, blkno=blkno@entry=432, skipFsync=0 '\000', behavior=behavior@entry=EXTENSION_FAIL) at md.c:1707 #8 0x00000000006b66b8 in mdwrite (reln=0x1107360, forknum=MAIN_FORKNUM, blocknum=432, buffer=0x7f61f4cf75c0 "=", skipFsync=<optimized out>) at md.c:742 #9 0x000000000068dcad in FlushBuffer (reln=0x1107360, buf=0x7f61f3c9a540) at bufmgr.c:2002 #10 FlushBuffer (buf=0x7f61f3c9a540, reln=<optimized out>) at bufmgr.c:1917 #11 0x000000000068e621 in BufferAlloc (foundPtr=0x7fffef8daa1e "\272\001\177\006", strategy=0x0, blockNum=0, forkNum=VISIBILITYMAP_FORKNUM, relpersistence=112 'p', smgr=0x1105aa8) at bufmgr.c:687 #12 ReadBuffer_common (smgr=0x1105aa8, relpersistence=<optimized out>, forkNum=forkNum@entry=VISIBILITYMAP_FORKNUM, blockNum=blockNum@entry=0, mode=RBM_ZERO_ON_ERROR, strategy=0x0, hit=hit@entry=0x7fffef8daacf "") at bufmgr.c:335 #13 0x000000000068ecab in ReadBufferExtended (reln=reln@entry=0x7f61fe3e6e70, forkNum=forkNum@entry=VISIBILITYMAP_FORKNUM, blockNum=blockNum@entry=0, mode=mode@entry=RBM_ZERO_ON_ERROR, strategy=strategy@entry=0x0) at bufmgr.c:254 #14 0x00000000004a6ef5 in vm_readbuf (rel=rel@entry=0x7f61fe3e6e70, blkno=blkno@entry=0, extend=extend@entry=0 '\000') at visibilitymap.c:563 #15 0x00000000004a7a94 in visibilitymap_test (rel=rel@entry=0x7f61fe3e6e70, heapBlk=heapBlk@entry=0, buf=buf@entry=0x7fffef8dbc4c) at visibilitymap.c:354 #16 0x00000000005b3892 in lazy_vacuum_page (onerel=onerel@entry=0x7f61fe3e6e70, blkno=blkno@entry=0, buffer=buffer@entry=2144, tupindex=<optimized out>, tupindex@entry=0, vacrelstats=vacrelstats@entry=0x10eeaf8, vmbuffer=vmbuffer@entry=0x7fffef8dbc4c) at vacuumlazy.c:1245 #17 0x00000000005b3b4f in lazy_vacuum_heap (onerel=onerel@entry=0x7f61fe3e6e70, vacrelstats=vacrelstats@entry=0x10eeaf8) at vacuumlazy.c:1152 #18 0x00000000005b42ee in lazy_scan_heap (scan_all=0 '\000', nindexes=2, Irel=<optimized out>, vacrelstats=0x10eeaf8, onerel=0x7f61fe3e6e70) at vacuumlazy.c:1080 #19 lazy_vacuum_rel (onerel=onerel@entry=0x7f61fe3e6e70, vacstmt=vacstmt@entry=0x7fffef8dc360, bstrategy=<optimized out>) at vacuumlazy.c:239 #20 0x00000000005b1e3a in vacuum_rel (relid=relid@entry=1158010, vacstmt=vacstmt@entry=0x7fffef8dc360, do_toast=do_toast@entry=0 '\000', for_wraparound=for_wraparound@entry=0 '\000') at vacuum.c:1202 #21 0x00000000005b2c66 in vacuum (vacstmt=vacstmt@entry=0x7fffef8dc360, relid=<optimized out>, do_toast=do_toast@entry=0 '\000', bstrategy=<optimized out>, bstrategy@entry=0x114a380, for_wraparound=0 '\000', isTopLevel=isTopLevel@entry=1 '\001') at vacuum.c:234 #22 0x00000000006515d3 in autovacuum_do_vac_analyze (bstrategy=0x114a380, tab=0x114a498) at autovacuum.c:2796 #23 do_autovacuum () at autovacuum.c:2326 #24 0x0000000000651b15 in AutoVacWorkerMain (argv=0x0, argc=0) at autovacuum.c:1678 #25 0x00000000006521c2 in StartAutoVacWorker () at autovacuum.c:1463 ---Type <return> to continue, or q <return> to quit--- #26 0x000000000065f89d in StartAutovacuumWorker () at postmaster.c:5188 #27 sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:4842 #28 <signal handler called> #29 0x00007f61fd20ae03 in select () from /lib/x86_64-linux-gnu/libc.so.6 #30 0x000000000045f985 in ServerLoop () at postmaster.c:1530 #31 0x00000000006608ae in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1085910) at postmaster.c:1223 #32 0x0000000000460c28 in main (argc=3, argv=0x1085910) at main.c:225 -- Keith Fiske Database Administrator OmniTI Computer Consulting, Inc. http://www.keithf4.com On Thu, Jun 5, 2014 at 5:34 PM, Andres Freund <andres@2ndquadrant.com> wrote: > Hi, > > On 2014-06-05 21:42:23 +0300, Heikki Linnakangas wrote: > > I think we'll need a variant of GetMultiXactIdMembers() that only returns > > the update XID, avoiding the palloc(). The straight-forward fix would be > to > > copy-paste contents of GetMultiXactIdMembers() into > > MultiXactIdGetUpdateXid(), and instead of returning the members in an > array, > > only return the update-xid. But it's a long and complicated function, so > > copy-pasting is not a good option. I think it needs to be refactored into > > some kind of a helper function that both MultiXactIdGetUpdateXid() and > > GetMultiXactIdMembers() could call. > > While it certainly would be nice to have such a function I have my doubt > it's realistic to get it for 9.4 and the backbranches. > > I haven't thought particularly much about this, but I don't really see > why the heap_page_is_all_visible() bit needs to be in a critical > section? Can't we just do that entire bit after the log_heap_clean()? > Then the heap_page_is_all_visible() can be done outside a critical > section. > > Greetings, > > Andres Freund > > -- > Andres Freund http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs >
pgsql-bugs by date: