pgstattuple "unexpected zero page" for gist and hash indexes - Mailing list pgsql-hackers
From | Nitin Motiani |
---|---|
Subject | pgstattuple "unexpected zero page" for gist and hash indexes |
Date | |
Msg-id | CAH5HC95gT1J3dRYK4qEnaywG8RqjbwDdt04wuj8p39R=HukayA@mail.gmail.com Whole thread Raw |
List | pgsql-hackers |
Hi Hackers, $SUBJECT happens if we crash just after extending the index. Noah Misch looked into it and came up with the steps to simulate this by patching pgstattuple to extend each rel it visits: ==== --- a/contrib/pgstattuple/pgstattuple.c +++ b/contrib/pgstattuple/pgstattuple.c @@ -254,6 +254,8 @@ pgstat_relation(Relation rel, FunctionCallInfo fcinfo) (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), errmsg("cannot access temporary tables of other sessions"))); + ReleaseBuffer(ExtendBufferedRel(BMR_REL(rel), MAIN_FORKNUM, NULL, EB_CLEAR_SIZE_CACHE)); + if (RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind) || ==== Then, in the regression database, this reaches the error for each hash index and each gist index: ==== CREATE EXTENSION pgstattuple; CREATE FUNCTION pgstattuple_safe(IN reloid regclass) RETURNS text LANGUAGE plpgsql AS $$ BEGIN RETURN pgstattuple($1); EXCEPTION WHEN OTHERS THEN RETURN sqlerrm; END $$; SELECT * from ( SELECT pgstattuple_safe(oid), relkind, oid::regclass FROM pg_class WHERE pg_relation_filenode(oid) <> 0 ) WHERE pgstattuple_safe NOT LIKE '%supported%' AND pgstattuple_safe NOT LIKE '(%' ORDER BY 1; ==== There have been prior reports: 2016-09 https://www.postgresql.org/message-id/CAA4eK1%2BzT16bxVpF3J9UsiZxExCqFa6QxzUfazmJthW6dO78ww%40mail.gmail.com 2016-09 https://www.postgresql.org/message-id/CAE9k0PnCaBkMgsDGuuPnPPTrQUc%3Dy9NiQvvsFFQkDNGcjYSajg%40mail.gmail.com 2021-02 docs https://www.postgresql.org/message-id/161280049644.664.3414087059452030397@wrigleys.postgresql.org For hash indexes, this error comes from pgstat_hash_page() -> _hash_getbuf_with_strategy() -> _hash_checkpage() for a zero page. Similarly gistcheckpage() is the cause for gist indexes. Since after a crash recovery a zero page is normal (as long as not referenced from the rest of the index), emitting ERRCODE_INDEX_CORRUPTED is a false alarm. To fix this, we propose that we remove the checks from pgstat_hash_page() and pgstat_gist_page(). This is something which pgstat_btree_page() already does. It uses a lower-level function ReadBufferExtended() and avoids using _bt_getbuf() which would check for the "unexpected zero page". I'm attaching a patch for the above which does the following : 1. Replaces _hash_getbuf_with_strategy() with ReadBufferExtended() in pgstat_hash_page(). Then for non-PageIsNew() pages, we explicitly check the PageGetSpecialSize(). 2. Similarly gistcheckpage() is removed in pgstat_gist_page(). And for non-PageIsNew(), we explicitly check the PageGetSpecialSize() before checking if it's a leaf page. I confirmed that this was working by running the above mentioned simulation steps along with the patch. Note that _hash_getbuf_with_strategy() also checks if blkno is P_NEW. But that check seems unnecessary here as P_NEW is the same as InvalidBlockNo and pgstat_hash_page() is called by pgstat_index() for the block nos starting from HASH_METAPAGE + 1 and remains < RelationGetNumberOfBlocks(). An alternative might be that we still call the check functions for the non-PageIsNew() cases. Following were Noah Misch's comments on these options : I mildly lean toward not calling these *check*page functions, for two reasons. First, pgstattuple() is not a general integrity checker. It should try its best to count things, and it shouldn't go out of its way to report corruption. Second, pgstattuple doesn't depend on access-method-specific index integrity. pgstat_gist_page() is a thin wrapper around the generic pgstat_index_page(). pgstat_hash_page() does slightly more, but a page failing the _hash_checkpage() checks still wouldn't elicit intolerable behavior from pgstat_hash_page(). Let me know what you folks think of this. Thanks & Regards, Nitin Motiani Google
Attachment
pgsql-hackers by date: