Home > mailing lists

Re: BUG #17245: Index corruption involving deduplicated entries - Mailing list pgsql-bugs

From	Peter Geoghegan
Subject	Re: BUG #17245: Index corruption involving deduplicated entries
Date	October 26, 2021 20:36:25
Msg-id	CAH2-WzmD_mZDm1Sb1C=nqUWj3cLTDoamx9EL_Tpwws_Ur9eFww@mail.gmail.com Whole thread Raw
In response to	Re: BUG #17245: Index corruption involving deduplicated entries (Peter Geoghegan <pg@bowt.ie>)
Responses	Re: BUG #17245: Index corruption involving deduplicated entries Re: BUG #17245: Index corruption involving deduplicated entries
List	pgsql-bugs

Tree view

On Mon, Oct 25, 2021 at 2:29 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Mon, Oct 25, 2021 at 2:08 PM K. R. <iijima.yun@koumakan.jp> wrote:
> > There have been no crashes since; there was one reload (pg_hba.conf
> > edits) and several restarts (to snapshot the file structure with the
> > corrupted index, plus another enabling WAL archiving today in the morning).
>
> Thank you for your help.

Thank you for sharing page images with me privately -- that was very
helpful. I can see the same basic issue in all 3 of the pages that you
said amcheck reports as corrupt (the 3 pages that you shared). As I
suspected, there are posting lists that contain duplicate TIDs, but
look normal in every other way. This is very subtle.

A few more questions for you, if you don't mind:

1. Can you tell me which tables the 4 known-corrupt indexes
(page_redirect_namespace_len, page_len, transcode_key_idx, and
page_main_title) are defined against?
2. Can you show me the definition of the table or tables, using \d from psql?
3. Do you notice any consistent pattern here? For example, are foreign
keys involved?
4. How do the table or tables tend to get updated?

Thanks again

Executive summary on my progress:

This now looks like it might be related to code on the heapam side,
perhaps in heapam_index_build_range_scan(), which has a history of
getting confused about duplicate TIDs with HOT chains (e.g., see
Alvaro's commit a811ea5bde from late 2020).

Here is why I now suspect heapam_index_build_range_scan():

It disturbed me that even "heapallindexed" amcheck verification did
not detect the original reported problem, where we see index scans
that give wrong answers (a scan of the page_main_title index shows an
extra non-matching row, per Andrew's original message). We saw that
amcheck does notice a few corrupt posting list tuples in other nearby
indexes, but not this bigger problem. That eventually made me think
about table_index_build_scan()/heapam_index_build_range_scan(), which
is used by amcheck for heapallindexed verification -- most of the
important work is outsourced to that core code. amcheck more or less
performs heapallindexed verification by comparing the tuples that a
REINDEX would see against the actual tuples it fingerprinting from the
index.

I wonder if heapallindexed verification doesn't detect corruption in
the page_main_title index because "the table seems to agree with the
index", even though the actual details are clearly wrong. In other
words, perhaps even the table_index_build_scan() call inside amcheck's
verify_ntree.c file says that there should be duplicate TIDs in the
index, even though that's always wrong, by definition. What if
table_index_build_scan() is actually the problem?

If my hypothesis is true, then we might expect problems to not even go
away following a REINDEX. Or maybe they would go away sometimes, but
not other times.

This idea is supported by remarks from Herman verschooten on Slack
[1], when discussing a similar problem case on Postgres 14 [2]. I'll
quote him directly:

"""
tranman_production=# CREATE INDEX index_freights_on_cmr_received ON
public.freights USING btree (cmr_received) with (deduplicate_items =
off);
CREATE INDEX
tranman_production=# \set VERBOSITY verbose
tranman_production=# update freights set cmr_received=false where id=49632;
ERROR: XX000: posting list tuple with 20 items cannot be split at offset 168
LOCATION: _bt_swap_posting, nbtdedup.c:1037
tranman_production=# drop index index_freights_on_cmr_received ;
DROP INDEX
tranman_production=# update freights set cmr_received=false where id=49632;
UPDATE 1
"""

Notice that the problem seems to be confined to one index, which has
deduplication explicitly disabled. We see an error message about a
posting list tuple all the same. I think that that's just a downstream
consequence of duplicate TIDs appearing in the index. It looks like
the corruption exists and is independent of whether the index is
present or not. When the index is present the problem becomes obvious,
but it's still there either way.

The supposed posting list tuple with 20 items that the error message
goes on about is actually just a plain tuple. The number 20 comes from
its IndexTupleData.t_tid offset number. In other words, it's an offset
number from the index tuple's pointed-to heap TID, which has been
misinterpreted as something else by BTreeTupleGetNPosting().
BTreeTupleGetNPosting() expects to never be called against such a
tuple, but it can happen once we assume duplicate TIDs are possible.
(You'd see an assertion failure if asserts were enabled, but of course
they're not enabled here.)

Note that this independently reported "freights" case from Herman has
the same amcheck issue we see with the mediawiki indexes on this
thread: amcheck won't actually complain about the clearly corrupt
index_freights_on_cmr_received index, but will complain about other
indexes on the same table. This time around amcheck complains about
out-of-order index tuples, but I strongly suspect that that's
practically the same condition as the "posting list contains misplaced
TID" error messages we see here. In other words, I think that both
cases just have duplicate heap TIDs, and that the variation in error
messages is totally unrelated to the true underlying problem.

[1] https://postgresteam.slack.com/archives/C0FS3UTAP/p1635161173202100?thread_ts=1635154585.197300&cid=C0FS3UTAP
[2] https://postgr.es/m/8CDB73C1-E3AF-40A6-BA81-8AFE174C6402@verschooten.net
--
Peter Geoghegan

pgsql-bugs by date:

From: Tom Lane
Date: 26 October 2021, 19:41:58
Subject: Re: conchuela timeouts since 2021-10-09 system upgrade

From: Thomas Munro
Date: 26 October 2021, 20:45:38
Subject: Re: conchuela timeouts since 2021-10-09 system upgrade

Re: BUG #17245: Index corruption involving deduplicated entries - Mailing list pgsql-bugs

Previous

Next