Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) |
Date | |
Msg-id | 20160510230259.a2fojqtv6d76arbn@alap3.anarazel.de Whole thread Raw |
In response to | Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin
causing errors)
|
List | pgsql-hackers |
On 2016-05-10 15:53:38 -0700, Jeff Janes wrote: > On Tue, May 10, 2016 at 2:00 PM, Andres Freund <andres@anarazel.de> wrote: > > I think that's to blame here. Looking at the relevant WAL record shows: > > > > pg_xlogdump -p /data/freund/jj -s 2/12004018 -e 2/1327EA28|grep -E 'CHECKPOINT|NEXTOID' > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID 4302693 > > rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 2/12023C38, prev 2/12023C00, desc: CHECKPOINT_ONLINEredo 2/12000120; /* ... */ oid 4294501; /* ... */ online > > By my understanding, this is the point at which the crash occurred. Right. > > rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 2/1327A798, prev 2/1327A768, desc: CHECKPOINT_SHUTDOWNredo 2/1327A798; /* ... */ oid 4294501; /* ... */ shutdown > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693 > > > > (note that end-of-recovery checkpoints are logged as shutdown > > checkpoints, pretty annoying imo) > > > > So I think the issue is that the 2/12023C38 checkpoint *starts* before > > the first NEXTOID, and thus gets an earlier nextoid. > > > But isn't CreateCheckPoint called at the end of the checkpoint, not > the start of it? No, CreateCheckPoint() does it all. CreateCheckPoint(int flags) { ... /* 1) determine redo pointer */ WALInsertLockAcquireExclusive(); curInsert = XLogBytePosToRecPtr(Insert->CurrBytePos); prevPtr = XLogBytePosToRecPtr(Insert->PrevBytePos); WALInsertLockRelease(); ... /* 2) determine oid */ LWLockAcquire(OidGenLock, LW_SHARED); checkPoint.nextOid = ShmemVariableCache->nextOid; if (!shutdown) checkPoint.nextOid += ShmemVariableCache->oidCount; LWLockRelease(OidGenLock); ... /* 3) actually checkpoint shared_buffers et al. */ CheckPointGuts(checkPoint.redo, flags); ... /* 4) log the checkpoint */ recptr = XLogInsert(RM_XLOG_ID, shutdown ? XLOG_CHECKPOINT_SHUTDOWN: XLOG_CHECKPOINT_ONLINE); ... } > I don't understand how it could be out-of-date at that point. But > obviously it is. A checkpoint logically "starts" at 1) in the above abbreviated CreateCheckPoint(), that's where recovery starts when starting up from that checkpoint. But inbetween 1) and 4) all other backends can continue to insert WAL, and it'll be replayed *before* the checkpoint's record itself. That means that if some backend generates a NEXTOID record between 2) and 4) (with largers checkpoints we're looking at minutes to an hour of time), it's effects will temporarily take effect (as in ShmemVariableCache->nextOid is updated), but XLOG_CHECKPOINT_ONLINE's replay will overwrite it unconditionally: void xlog_redo(XLogReaderState *record) { else if (info == XLOG_CHECKPOINT_ONLINE) { ... /* ... but still treat OID counter as exact */ LWLockAcquire(OidGenLock, LW_EXCLUSIVE); ShmemVariableCache->nextOid= checkPoint.nextOid; ShmemVariableCache->oidCount = 0; LWLockRelease(OidGenLock); Makes sense? Regards, Andres
pgsql-hackers by date: