Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Non-deterministic IndexTuple toast compression fromindex_form_tuple() + amcheck false positives |
Date | |
Msg-id | CAH2-WznrVd9ie+TTJ45nDT+v2nUt6YJwQrT9SebCdQKtAvfPZw@mail.gmail.com Whole thread Raw |
Responses |
Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives
|
List | pgsql-hackers |
While testing my nbtree heap TID keyspace patch, I came across a case where amcheck reliably reports corruption. It appeared that a 4 byte varlena index entry that was expected in an index was not actually present. However, index scan queries with the "missing" value in their qual didn't actually give wrong answers. This was reproducible on the master branch, too. It turned out that the problem has existed since the heapallindexed enhancement made it into Postgres 11 was committed. The heapallindexed enhancement that made it into Postgres 11 assumes that the representation of index tuples produced by index_form_tuple() (or all relevant index_form_tuple() callers) is deterministic: for every possible heap tuple input there must be a single possible (bitwise) output. There is no real corruption present with the test case, though it's not entirely clear that this is best thought of as a bug in amcheck -- I'd prefer to make sure that amcheck's expectations are actually met here, rather than have amcheck normalize its input to eliminate the difference in bitwise representation. Steps to reproduce are rather delicate -- I stumbled upon the problem entirely by accident. I can share the full test case if that helps, but will hold off for now, since it involves a pg_dump that's a few megabytes in size. Here is an outline of what I'm doing: pg_restore -d postgres /home/pg/code/suffix_truncation_test/bib_refs_small.dump pg@postgres:5432 [9532]=# \d mgd.bib_refs Table "mgd.bib_refs" Column │ Type │ Collation │ Nullable │ Default ───────────────────┼─────────────────────────────┼───────────┼──────────┼───────── _refs_key │ integer │ │ not null │ _reviewstatus_key │ integer │ │ not null │ reftype │ character(4) │ │ not null │ authors │ text │ │ │ _primary │ character varying(60) │ │ │ title │ text │ │ │ journal │ character varying(100) │ │ │ vol │ character varying(20) │ │ │ issue │ character varying(25) │ │ │ date │ character varying(30) │ │ │ year │ integer │ │ │ pgs │ character varying(30) │ │ │ nlmstatus │ character(1) │ │ not null │ abstract │ text │ │ │ isreviewarticle │ smallint │ │ not null │ _createdby_key │ integer │ │ not null │ 1001 _modifiedby_key │ integer │ │ not null │ 1001 creation_date │ timestamp without time zone │ │ not null │ now() modification_date │ timestamp without time zone │ │ not null │ now() Indexes: "bib_refs_pkey" PRIMARY KEY, btree (_refs_key) "bib_refs_idx_authors" btree (authors) "bib_refs_idx_createdby_key" btree (_createdby_key) "bib_refs_idx_isprimary" btree (_primary) "bib_refs_idx_journal" btree (journal) "bib_refs_idx_modifiedby_key" btree (_modifiedby_key) "bib_refs_idx_reviewstatus_key" btree (_reviewstatus_key) "bib_refs_idx_title" btree (title) "bib_refs_idx_year" btree (year) psql -d postgres -c "create table bug (like mgd.bib_refs);" psql -d postgres -c "create index on bug (title);" psql -d postgres -c "insert into bug select * from mgd.bib_refs;" psql -d postgres -c "create extension if not exists amcheck;" psql -d postgres -c "analyze; set maintenance_work_mem='128MB';" psql -d postgres -c "select bt_index_parent_check('bug_title_idx', true);" ERROR: heap tuple (579,4) from table "bug" lacks matching index tuple within index "bug_title_idx" Here are details of the offending datum in the heap: pg@postgres:5432 [9532]=# select title, length(title), pg_column_size(title) from bug where ctid = '(579,4)'; ─[ RECORD 1 ]──┬──── title │ Final report on the safety assessment of trilaurin, triarachidin, tribehenin, tricaprin, tricaprylin, trierucin, triheptanoin, triheptylundecanoin, triisononanoin, triisopalmitin, triisostearin, trilinolein, trimyristin, trioctanoin, triolein, tripalmitin, tripalmitolein, triricinolein, tristearin, triundecanoin, glyceryl triacetyl hydroxystearate, glyceryl triacetyl ricinoleate, and gl. length │ 390 pg_column_size │ 234 Does anyone have any idea why the 4 byte varlena (text) datum in the single attribute index "bug_title_idx" is uncompressed, while the value in the heap is compressed? No other value in any other index happens to trip the problem, though this is complicated real-world database with many similar indexes over tens of gigabytes of data (I have quite a number of these "INSERT ... SELECT" tests for my nbtree patch). What you see here is a partially boiled-down test case. I've started some preliminary debugging work. A "REINDEX index bug_title_idx" makes amcheck happy, since the index tuple that points to heap tuple '(579,4)' ends up being compressed in exactly the same way as it is in the heap. The initial "INSERT ... SELECT" clearly makes the executor produce compressed values for heap_insert(), though not for btinsert() in this one instance. I've been able to confirm this from gdb. -- Peter Geoghegan
pgsql-hackers by date: