Re: amcheck assert failure - Mailing list pgsql-bugs
From | Grigory Smolkin |
---|---|
Subject | Re: amcheck assert failure |
Date | |
Msg-id | 39998e3e-816b-f9ed-2a69-6ac7f05dea4a@postgrespro.ru Whole thread Raw |
In response to | Re: amcheck assert failure (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: amcheck assert failure
|
List | pgsql-bugs |
On 04/22/2019 11:31 PM, Peter Geoghegan wrote: > On Mon, Apr 22, 2019 at 12:51 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Nonetheless ... do we really want an assertion failure rather than >> some more-mundane error report? Seems like the point of amcheck >> is to detect data corruption, so I'd rather get an ERROR than an >> assertion (which would not fire in production builds anyway). > (Apologies for sending this a second time, Tom -- somehow failed to CC > the list the first time) > > I think that the size cross-check must have failed to catch the problem: > > /* > * lp_len should match the IndexTuple reported length exactly, since > * lp_len is completely redundant in indexes, and both sources of > * tuple length are MAXALIGN()'d. nbtree does not use lp_len all that > * frequently, and is surprisingly tolerant of corrupt lp_len fields. > */ > if (tupsize != ItemIdGetLength(itemid)) > ereport(ERROR, > (errcode(ERRCODE_INDEX_CORRUPTED), > errmsg("index tuple size does not equal lp_len in > index \"%s\"", > RelationGetRelationName(state->rel)), > errdetail_internal("Index tid=(%u,%u) tuple > size=%zu lp_len=%u page lsn=%X/%X.", > state->targetblock, offset, > tupsize, ItemIdGetLength(itemid), > (uint32) (state->targetlsn >> 32), > (uint32) state->targetlsn), > errhint("This could be a torn page problem."))); > > It seems like the length from tupsize (IndexTupleSize()) happened to > match the redundant ItemIdGetLength()-reported size from the line > pointer. No idea how that could actually be possible, since the value > returned by ItemIdGetLength() would be 12592. What are the chances of > that accidentally matching what the wild itup pointer reported as its > IndexTupleSize()? > > The only explanation I can think of is that the assertion failure was > from the core code -- _bt_check_natts() is itself called within core > code assertions, so it's just about possible that that was how this > happened, before amcheck was even called. Doesn't seem like a very > good explanation. Do you think that that's possible, Grigory? No, it was definitely amcheck: https://pastebin.postgrespro.ru/?2c8ddf5254b2c625#PTbr+jbxFPu2wfy5SFDWdiqaqPN/N5WX8t6mo5rDpmo= > > In any case, you're clearly right that amcheck could be more defensive > here. My pg_hexedit tool detected that there was an LP_UNUSED item > pointer with storage (i.e. whose lp_len > 0), which is a check that > amcheck can and should perform, but doesn't. Especially because it > would prevent a deference of a likely-wild pointer (IndexTuple). > > We could: > > * Throw an error when any line item reports lp_len == 0. > > * Throw an error when there is a line item that's either LP_UNUSED, or > LP_REDIRECT. The corrupt page sent by Grigory had both. > > What do you think of the idea of committing some simple checks to do > this with contrib/amcheck on v12? > -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-bugs by date: