Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) |
Date | |
Msg-id | 20160510210013.2akn4iee7gl4ycen@alap3.anarazel.de Whole thread Raw |
In response to | Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin causing errors) (Andres Freund <andres@anarazel.de>) |
Responses |
Re: HeapTupleSatisfiesToast() busted? (was atomic pin/unpin
causing errors)
|
List | pgsql-hackers |
On 2016-05-10 09:19:16 -0700, Andres Freund wrote: > On 2016-05-10 08:09:02 -0400, Robert Haas wrote: > > On Tue, May 10, 2016 at 3:05 AM, Andres Freund <andres@anarazel.de> wrote: > > > The easy way to trigger this problem would be to have an oid wraparound > > > - but the WAL shows that that's not the case here. I've not figured > > > that one out entirely (and won't tonight). But I do see WAL records > > > like: > > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID 4302693 > > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693 > > > i.e. two NEXTOID records allocating the same range, which obviously > > > doesn't seem right. There's also every now and then close by ranges: > > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 1/9A404DB8, prev 1/9A404270, desc: NEXTOID 3311455 > > > rmgr: XLOG len (rec/tot): 4/ 30, tx: 7814505, lsn: 1/9A4EC888, prev 1/9A4EB9D0, desc: NEXTOID 3311461 > > It seems to me that the real question > > here is how you're getting two calls to XLogPutNextOid() with the same > > value of ShmemVariableCache->nextOid, and the answer, as it seems to > > me, must be that LWLocks are broken. > > There likely were a bunch of crashes in between, Jeff's test suite > triggers them at a high rate. It seems a lot more likely than that an > lwlock bug only materializes in the oid counter. Investigating. void CreateCheckPoint(int flags) { ... /* * An end-of-recovery checkpoint is really a shutdown checkpoint, just * issued at a different time. */ if (flags & (CHECKPOINT_IS_SHUTDOWN | CHECKPOINT_END_OF_RECOVERY)) shutdown = true; else shutdown = false; ... LWLockAcquire(OidGenLock, LW_SHARED); checkPoint.nextOid = ShmemVariableCache->nextOid; if (!shutdown) checkPoint.nextOid+= ShmemVariableCache->oidCount; LWLockRelease(OidGenLock); ... recptr = XLogInsert(RM_XLOG_ID, shutdown ? XLOG_CHECKPOINT_SHUTDOWN : XLOG_CHECKPOINT_ONLINE); ... } I think that's to blame here. Looking at the relevant WAL record shows: pg_xlogdump -p /data/freund/jj -s 2/12004018 -e 2/1327EA28|grep -E 'CHECKPOINT|NEXTOID' rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/12004018, prev 2/12003288, desc: NEXTOID 4302693 rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 2/12023C38, prev 2/12023C00, desc: CHECKPOINT_ONLINEredo 2/12000120; /* ... */ oid 4294501; /* ... */ online rmgr: XLOG len (rec/tot): 80/ 106, tx: 0, lsn: 2/1327A798, prev 2/1327A768, desc: CHECKPOINT_SHUTDOWNredo 2/1327A798; /* ... */ oid 4294501; /* ... */ shutdown rmgr: XLOG len (rec/tot): 4/ 30, tx: 0, lsn: 2/1327EA08, prev 2/1327DC60, desc: NEXTOID 4302693 (note that end-of-recovery checkpoints are logged as shutdown checkpoints, pretty annoying imo) So I think the issue is that the 2/12023C38 checkpoint *starts* before the first NEXTOID, and thus gets an earlier nextoid. The second - shutdown/end-of-recovery - checkpoint then hits the above !shutdown and doesn't add oidCount. Thus after the crash we continue with a repeated NEXOID. There's this remark in xlog_redo(): /* * We used to try to take the maximum of ShmemVariableCache->nextOid * andthe recorded nextOid, but that fails if the OID counter wraps * around. Since no OID allocation should be happeningduring replay * anyway, better to just believe the record exactly. We still take * OidGenLock while settingthe variable, just in case. */ I think that was perhaps not the best fix :( I guess what we should do is to only use checkPoint.nextOid when starting up from a checkpoint, and entirely rely on NEXTOID otherwise? Regards, Andres
pgsql-hackers by date: