Home > mailing lists

Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Date	May 10, 2016 21:00:20
Msg-id	20160510210013.2akn4iee7gl4ycen@alap3.anarazel.de Whole thread Raw
In response to	Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) (Andres Freund <andres@anarazel.de>)
Responses	Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
List	pgsql-hackers

Tree view

On 2016-05-10 09:19:16 -0700, Andres Freund wrote:
> On 2016-05-10 08:09:02 -0400, Robert Haas wrote:
> > On Tue, May 10, 2016 at 3:05 AM, Andres Freund <andres@anarazel.de> wrote:
> > > The easy way to trigger this problem would be to have an oid wraparound
> > > - but the WAL shows that that's not the case here.  I've not figured
> > > that one out entirely (and won't tonight). But I do see WAL records
> > > like:
> > > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID
4302693
> > > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID
4302693
> > > i.e. two NEXTOID records allocating the same range, which obviously
> > > doesn't seem right.  There's also every now and then close by ranges:
> > > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 1/9A404DB8, prev 1/9A404270, desc: NEXTOID
3311455
> > > rmgr: XLOG        len (rec/tot):      4/    30, tx:    7814505, lsn: 1/9A4EC888, prev 1/9A4EB9D0, desc: NEXTOID
3311461

> > It seems to me that the real question
> > here is how you're getting two calls to XLogPutNextOid() with the same
> > value of ShmemVariableCache->nextOid, and the answer, as it seems to
> > me, must be that LWLocks are broken.
> 
> There likely were a bunch of crashes in between, Jeff's test suite
> triggers them at a high rate. It seems a lot more likely than that an
> lwlock bug only materializes in the oid counter.  Investigating.

void
CreateCheckPoint(int flags)
{
...   /*    * An end-of-recovery checkpoint is really a shutdown checkpoint, just    * issued at a different time.
*/  if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY))       shutdown = true;   else       shutdown =
false;
...
   LWLockAcquire(OidGenLock, LW_SHARED);   checkPoint.nextOid = ShmemVariableCache->nextOid;   if (!shutdown)
checkPoint.nextOid+= ShmemVariableCache->oidCount;   LWLockRelease(OidGenLock);

...   recptr = XLogInsert(RM_XLOG_ID,                       shutdown ? XLOG_CHECKPOINT_SHUTDOWN :
XLOG_CHECKPOINT_ONLINE);
...
}

I think that's to blame here.  Looking at the relevant WAL record shows:

pg_xlogdump -p /data/freund/jj -s 2/12004018 -e 2/1327EA28|grep -E 'CHECKPOINT|NEXTOID'
rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID
4302693
rmgr: XLOG        len (rec/tot):     80/   106, tx:          0, lsn: 2/12023C38, prev 2/12023C00, desc:
CHECKPOINT_ONLINEredo 2/12000120; /* ... */ oid 4294501; /* ... */ online

rmgr: XLOG        len (rec/tot):     80/   106, tx:          0, lsn: 2/1327A798, prev 2/1327A768, desc:
CHECKPOINT_SHUTDOWNredo 2/1327A798; /* ... */ oid 4294501; /* ... */ shutdown

rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID
4302693

(note that end-of-recovery checkpoints are logged as shutdown
checkpoints, pretty annoying imo)

So I think the issue is that the 2/12023C38 checkpoint *starts* before
the first NEXTOID, and thus gets an earlier nextoid.  The second -
shutdown/end-of-recovery - checkpoint then hits the above !shutdown and
doesn't add oidCount.  Thus after the crash we continue with a repeated
NEXOID.

There's this remark in xlog_redo():    /*     * We used to try to take the maximum of ShmemVariableCache->nextOid     *
andthe recorded nextOid, but that fails if the OID counter wraps     * around.  Since no OID allocation should be
happeningduring replay     * anyway, better to just believe the record exactly.  We still take     * OidGenLock while
settingthe variable, just in case.     */

I think that was perhaps not the best fix :(

I guess what we should do is to only use checkPoint.nextOid when
starting up from a checkpoint, and entirely rely on NEXTOID otherwise?

Regards,

Andres

pgsql-hackers by date:

From: Jim Nasby
Date: 10 May 2016, 20:57:45
Subject: Re: asynchronous and vectorized execution

From: Bert
Date: 10 May 2016, 21:00:34
Subject: Re: asynchronous and vectorized execution

Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers

Previous

Next