Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers

From Andres Freund
Subject Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Date
Msg-id 20160510230259.a2fojqtv6d76arbn@alap3.anarazel.de
Whole thread Raw
In response to Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)  (Jeff Janes <jeff.janes@gmail.com>)
Responses Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
List pgsql-hackers
On 2016-05-10 15:53:38 -0700, Jeff Janes wrote:
> On Tue, May 10, 2016 at 2:00 PM, Andres Freund <andres@anarazel.de> wrote:
> > I think that's to blame here.  Looking at the relevant WAL record shows:
> >
> > pg_xlogdump -p /data/freund/jj -s 2/12004018 -e 2/1327EA28|grep -E 'CHECKPOINT|NEXTOID'
> > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID
4302693
> > rmgr: XLOG        len (rec/tot):     80/   106, tx:          0, lsn: 2/12023C38, prev 2/12023C00, desc:
CHECKPOINT_ONLINEredo 2/12000120; /* ... */ oid 4294501; /* ... */ online
 
>
> By my understanding, this is the point at which the crash occurred.

Right.

> > rmgr: XLOG        len (rec/tot):     80/   106, tx:          0, lsn: 2/1327A798, prev 2/1327A768, desc:
CHECKPOINT_SHUTDOWNredo 2/1327A798; /* ... */ oid 4294501; /* ... */ shutdown
 
> > rmgr: XLOG        len (rec/tot):      4/    30, tx:          0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID
4302693
> >
> > (note that end-of-recovery checkpoints are logged as shutdown
> > checkpoints, pretty annoying imo)
> >
> > So I think the issue is that the 2/12023C38 checkpoint *starts* before
> > the first NEXTOID, and thus gets an earlier nextoid.
>
>
> But isn't CreateCheckPoint called at the end of the checkpoint, not
> the start of it?

No, CreateCheckPoint() does it all.


CreateCheckPoint(int flags)
{
...   /* 1) determine redo pointer */   WALInsertLockAcquireExclusive();   curInsert =
XLogBytePosToRecPtr(Insert->CurrBytePos);  prevPtr = XLogBytePosToRecPtr(Insert->PrevBytePos);
WALInsertLockRelease();
...   /* 2) determine oid */   LWLockAcquire(OidGenLock, LW_SHARED);   checkPoint.nextOid =
ShmemVariableCache->nextOid;  if (!shutdown)       checkPoint.nextOid += ShmemVariableCache->oidCount;
LWLockRelease(OidGenLock);
...   /* 3) actually checkpoint shared_buffers et al. */   CheckPointGuts(checkPoint.redo, flags);
...   /* 4) log the checkpoint */   recptr = XLogInsert(RM_XLOG_ID,                       shutdown ?
XLOG_CHECKPOINT_SHUTDOWN:                       XLOG_CHECKPOINT_ONLINE);
 
...
}


> I don't understand how it could be out-of-date at that point.  But
> obviously it is.

A checkpoint logically "starts" at 1) in the above abbreviated
CreateCheckPoint(), that's where recovery starts when starting up from
that checkpoint. But inbetween 1) and 4) all other backends can continue
to insert WAL, and it'll be replayed *before* the checkpoint's record
itself.  That means that if some backend generates a NEXTOID record
between 2) and 4) (with largers checkpoints we're looking at minutes to
an hour of time), it's effects will temporarily take effect (as in
ShmemVariableCache->nextOid is updated), but XLOG_CHECKPOINT_ONLINE's
replay will overwrite it unconditionally:
void
xlog_redo(XLogReaderState *record)
{   else if (info == XLOG_CHECKPOINT_ONLINE)   {
...       /* ... but still treat OID counter as exact */       LWLockAcquire(OidGenLock, LW_EXCLUSIVE);
ShmemVariableCache->nextOid= checkPoint.nextOid;       ShmemVariableCache->oidCount = 0;
LWLockRelease(OidGenLock);

Makes sense?

Regards,

Andres



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors)
Next
From: Andres Freund
Date:
Subject: Re: asynchronous and vectorized execution