Thread: Assertion failure in REL9_5_STABLE
Hi, Running one specific test from our application against REL9_5_STABLE (current as of today) gives me this: #2 0x00007effb59595be in ExceptionalCondition ( conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0 || TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) && ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ? HeapTupleGetUpdateXid(tup) : ( (tup)-"..., errorType=errorType@entry=0x7effb599b74b "FailedAssertion", fileName=fileName@entry=0x7effb5b2796c "combocid.c", lineNumber=lineNumber@entry=132) at assert.c:54 #3 0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at combocid.c:131 #4 0x00007effb55fb0c1 in heap_lock_tuple (relation=0x7effb5bf5d90, tuple=tuple@entry=0x7fffcee73690, cid=346, mode=mode@entry=LockTupleExclusive, wait_policy=LockWaitBlock, follow_updates=follow_updates@entry=1 '\001', buffer=buffer@entry=0x7fffcee7367c, hufd=hufd@entry=0x7fffcee73680) at heapam.c:4813 #5 0x00007effb5753e82 in ExecLockRows (node=node@entry=0x7effb6cebb00) at nodeLockRows.c:179 #6 0x00007effb573cbc8 in ExecProcNode (node=node@entry=0x7effb6cebb00) at execProcnode.c:516 #7 0x00007effb5739432 in ExecutePlan (dest=0x7effb5dd8160 <spi_printtupDR>, direction=<optimized out>, numberTuples=0, sendTuples=1 '\001', operation=CMD_SELECT, planstate=0x7effb6cebb00, estate=0x7effb6ceb8f8) at execMain.c:1549 #8 standard_ExecutorRun (queryDesc=0x7effb6ae3b40, direction=<optimized out>, count=0) at execMain.c:337 #9 0x00007effb57661db in _SPI_pquery (tcount=0, fire_triggers=1 '\001', queryDesc=0x7effb6ae3b40) at spi.c:2411 The failure is a number of levels down a call stack of PL/PgSQL functions, but I can reproduce it at will by calling the function. I unfortunately can't narrow it down to a smaller test case, but attached is an xlogdump of the session. The query that finally breaks the elephant's back is a SELECT .. FOR UPDATE from relid=54511. Any ideas on how to debug this? .m
Attachment
Marko Tiikkaja <marko@joh.to> writes: > The failure is a number of levels down a call stack of PL/PgSQL > functions, but I can reproduce it at will by calling the function. I > unfortunately can't narrow it down to a smaller test case, but attached > is an xlogdump of the session. The query that finally breaks the > elephant's back is a SELECT .. FOR UPDATE from relid=54511. > Any ideas on how to debug this? If this test case used to pass, it'd be pretty useful to git bisect to find where it started to fail. regards, tom lane
Marko Tiikkaja wrote: > Hi, > > Running one specific test from our application against REL9_5_STABLE > (current as of today) gives me this: > > #2 0x00007effb59595be in ExceptionalCondition ( > conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0 > || TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) && > ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ? > HeapTupleGetUpdateXid(tup) : ( (tup)-"..., > errorType=errorType@entry=0x7effb599b74b "FailedAssertion", > fileName=fileName@entry=0x7effb5b2796c "combocid.c", > lineNumber=lineNumber@entry=132) at assert.c:54 > #3 0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at > combocid.c:131 > #4 0x00007effb55fb0c1 in heap_lock_tuple (relation=0x7effb5bf5d90, > tuple=tuple@entry=0x7fffcee73690, cid=346, > mode=mode@entry=LockTupleExclusive, wait_policy=LockWaitBlock, > follow_updates=follow_updates@entry=1 '\001', > buffer=buffer@entry=0x7fffcee7367c, hufd=hufd@entry=0x7fffcee73680) at > heapam.c:4813 Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught the same assertion). Something is odd there ... Can you share the test case? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-08-10 19:28, Alvaro Herrera wrote: > Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after > already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught > the same assertion). Something is odd there ... HeapTupleSatisfiesUpdate() returns HeapTupleBeingUpdated in this case. HeapTupleSelfUpdated comes from here (line 4749): /* if there are updates, follow the update chain */ if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask)) { HTSU_Result res; res = heap_lock_updated_tuple(relation, tuple, &t_ctid, GetCurrentTransactionId(), mode); if (res != HeapTupleMayBeUpdated) { result = res; /* recovery code expects to have buffer lock held */ LockBuffer(*buffer,BUFFER_LOCK_EXCLUSIVE); goto failed; } } > Can you share the test case? Not at this time, sorry. .m
Marko Tiikkaja wrote: > On 2016-08-10 19:28, Alvaro Herrera wrote: > >Umm. AFAICS HeapTupleSatisfiesUpdate() only returns SelfUpdated after > >already calling HeapTupleHeaderGetCmax() (which obviously hasn't caught > >the same assertion). Something is odd there ... > > HeapTupleSatisfiesUpdate() returns HeapTupleBeingUpdated in this case. > HeapTupleSelfUpdated comes from here (line 4749): > > /* if there are updates, follow the update chain */ > if (follow_updates && !HEAP_XMAX_IS_LOCKED_ONLY(infomask)) > { > HTSU_Result res; > res = heap_lock_updated_tuple(relation, tuple, &t_ctid, > GetCurrentTransactionId(), > mode); > if (res != HeapTupleMayBeUpdated) > { > result = res; > /* recovery code expects to have buffer lock held */ > LockBuffer(*buffer, BUFFER_LOCK_EXCLUSIVE); > goto failed; > } > } Oh, I see ... so there's an update chain, and you're updating a earlier tuple. But the later tuple has a multixact and one of the members is the current transaction. I wonder how can you lock a tuple that's not the latest, where that update chain was produced by your own transaction. I suppose this is because of plpgsql use of cursors. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
BTW this is not a regression, right? It's been broken all along. Or am I mistaken? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2016-08-10 11:01 PM, Alvaro Herrera wrote: > Oh, I see ... so there's an update chain, and you're updating a earlier > tuple. But the later tuple has a multixact and one of the members is > the current transaction. > > I wonder how can you lock a tuple that's not the latest, where that > update chain was produced by your own transaction. I suppose this is > because of plpgsql use of cursors. There's a rolled back subtransaction that also did some magic on the rows AFAICT. I can try and spend some time producing a smaller test case over the weekend. No hurry since this missed the today's point release anyway. .m
On 2016-08-11 12:09 AM, Alvaro Herrera wrote: > BTW this is not a regression, right? It's been broken all along. Or am > I mistaken? You're probably right. I just hadn't realized I could run our app against 9.5 with --enable-cassert until last week. .m
On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote: > On 2016-08-11 12:09 AM, Alvaro Herrera wrote: >> >> BTW this is not a regression, right? It's been broken all along. Or am >> I mistaken? > > > You're probably right. I just hadn't realized I could run our app against > 9.5 with --enable-cassert until last week. Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still see a problem? -- Michael
On 11/08/16 8:48 AM, Michael Paquier wrote: > On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote: >> On 2016-08-11 12:09 AM, Alvaro Herrera wrote: >>> >>> BTW this is not a regression, right? It's been broken all along. Or am >>> I mistaken? >> >> You're probably right. I just hadn't realized I could run our app against >> 9.5 with --enable-cassert until last week. > > Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still > see a problem? Yeah, no effect. .m
On 2016-08-11 11:01:18 +0200, Marko Tiikkaja wrote: > On 11/08/16 8:48 AM, Michael Paquier wrote: > > On Thu, Aug 11, 2016 at 8:09 AM, Marko Tiikkaja <marko@joh.to> wrote: > > > On 2016-08-11 12:09 AM, Alvaro Herrera wrote: > > > > > > > > BTW this is not a regression, right? It's been broken all along. Or am > > > > I mistaken? > > > > > > You're probably right. I just hadn't realized I could run our app against > > > 9.5 with --enable-cassert until last week. > > > > Just wondering... If you revert 1f9534b4 and/or b33e81cb do you still > > see a problem? > > Yeah, no effect. Phew. Alvaro, are you tackling this?
Andres Freund wrote: > Phew. Alvaro, are you tackling this? Sure. Marko Tiikkaja wrote: > There's a rolled back subtransaction that also did some magic on the rows > AFAICT. I can try and spend some time producing a smaller test case over > the weekend. No hurry since this missed the today's point release anyway. Marko, any luck? -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Marko Tiikkaja wrote: > Running one specific test from our application against REL9_5_STABLE > (current as of today) gives me this: > > #2 0x00007effb59595be in ExceptionalCondition ( > conditionName=conditionName@entry=0x7effb5b27a88 "!(CritSectionCount > 0 > || TransactionIdIsCurrentTransactionId(( (!((tup)->t_infomask & 0x0800) && > ((tup)->t_infomask & 0x1000) && !((tup)->t_infomask & 0x0080)) ? > HeapTupleGetUpdateXid(tup) : ( (tup)-"..., > errorType=errorType@entry=0x7effb599b74b "FailedAssertion", > fileName=fileName@entry=0x7effb5b2796c "combocid.c", > lineNumber=lineNumber@entry=132) at assert.c:54 > #3 0x00007effb598e19b in HeapTupleHeaderGetCmax (tup=0x7effa85714c8) at > combocid.c:131 Can you please do frame 3 print tup->t_ctid in gdb? That would print the tuple address with which to grep the xlog sequence. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Marko Tiikkaja wrote: > Hi, > > Running one specific test from our application against REL9_5_STABLE > (current as of today) gives me this: Just pushed the fix. Thanks for the report! -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services