Home > mailing lists

Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Date	May 10, 2016 23:03:04
Msg-id	20160510230259.a2fojqtv6d76arbn@alap3.anarazel.de Whole thread Raw
In response to	Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) (Jeff Janes <jeff.janes@gmail.com>)
Responses	Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
List	pgsql-hackers

Tree view

On 2016-05-10 15:53:38 -0700, Jeff Janes wrote:
> On Tue, May 10, 2016 at 2:00 PM, Andres Freund <andres@anarazel.de> wrote:
> > I think that's to blame here.  Looking at the relevant WAL record shows:
> >
> > pg_xlogdump -p /data/freund/jj -s 2/12004018 -e 2/1327EA28|grep -E 'CHECKPOINT|NEXTOID'
> > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID
4302693
> > rmgr: XLOG        len (rec/tot):     80/   106, tx:          0, lsn: 2/12023C38, prev 2/12023C00, desc:
CHECKPOINT_ONLINEredo 2/12000120; /* ... */ oid 4294501; /* ... */ online

>
> By my understanding, this is the point at which the crash occurred.

Right.

> > rmgr: XLOG        len (rec/tot):     80/   106, tx:          0, lsn: 2/1327A798, prev 2/1327A768, desc:
CHECKPOINT_SHUTDOWNredo 2/1327A798; /* ... */ oid 4294501; /* ... */ shutdown

> > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID
4302693
> >
> > (note that end-of-recovery checkpoints are logged as shutdown
> > checkpoints, pretty annoying imo)
> >
> > So I think the issue is that the 2/12023C38 checkpoint *starts* before
> > the first NEXTOID, and thus gets an earlier nextoid.
>
>
> But isn't CreateCheckPoint called at the end of the checkpoint, not
> the start of it?

No, CreateCheckPoint() does it all.

CreateCheckPoint(int flags)
{
...   /* 1) determine redo pointer */   WALInsertLockAcquireExclusive();   curInsert =
XLogBytePosToRecPtr(Insert->CurrBytePos);  prevPtr = XLogBytePosToRecPtr(Insert->PrevBytePos);
WALInsertLockRelease();
...   /* 2) determine oid */   LWLockAcquire(OidGenLock, LW_SHARED);   checkPoint.nextOid =
ShmemVariableCache->nextOid;  if (!shutdown)       checkPoint.nextOid += ShmemVariableCache->oidCount;
LWLockRelease(OidGenLock);
...   /* 3) actually checkpoint shared_buffers et al. */   CheckPointGuts(checkPoint.redo, flags);
...   /* 4) log the checkpoint */   recptr = XLogInsert(RM_XLOG_ID,                       shutdown ?
XLOG_CHECKPOINT_SHUTDOWN:                       XLOG_CHECKPOINT_ONLINE);

...
}

> I don't understand how it could be out-of-date at that point.  But
> obviously it is.

A checkpoint logically "starts" at 1) in the above abbreviated
CreateCheckPoint(), that's where recovery starts when starting up from
that checkpoint. But inbetween 1) and 4) all other backends can continue
to insert WAL, and it'll be replayed *before* the checkpoint's record
itself.  That means that if some backend generates a NEXTOID record
between 2) and 4) (with largers checkpoints we're looking at minutes to
an hour of time), it's effects will temporarily take effect (as in
ShmemVariableCache->nextOid is updated), but XLOG_CHECKPOINT_ONLINE's
replay will overwrite it unconditionally:
void
xlog_redo(XLogReaderState *record)
{   else if (info == XLOG_CHECKPOINT_ONLINE)   {
...       /* ... but still treat OID counter as exact */       LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
ShmemVariableCache->nextOid= checkPoint.nextOid;       ShmemVariableCache->oidCount = 0;
LWLockRelease(OidGenLock);

Makes sense?

Regards,

Andres

pgsql-hackers by date:

From: Tom Lane
Date: 10 May 2016, 22:55:26
Subject: Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)

From: Andres Freund
Date: 10 May 2016, 23:57:42
Subject: Re: asynchronous and vectorized execution

Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers

Previous

Next