Re: BUG #17245: Index corruption involving deduplicated entries - Mailing list pgsql-bugs
From | Peter Geoghegan |
---|---|
Subject | Re: BUG #17245: Index corruption involving deduplicated entries |
Date | |
Msg-id | CAH2-WzmD_mZDm1Sb1C=nqUWj3cLTDoamx9EL_Tpwws_Ur9eFww@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #17245: Index corruption involving deduplicated entries (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: BUG #17245: Index corruption involving deduplicated entries
Re: BUG #17245: Index corruption involving deduplicated entries |
List | pgsql-bugs |
On Mon, Oct 25, 2021 at 2:29 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Mon, Oct 25, 2021 at 2:08 PM K. R. <iijima.yun@koumakan.jp> wrote: > > There have been no crashes since; there was one reload (pg_hba.conf > > edits) and several restarts (to snapshot the file structure with the > > corrupted index, plus another enabling WAL archiving today in the morning). > > Thank you for your help. Thank you for sharing page images with me privately -- that was very helpful. I can see the same basic issue in all 3 of the pages that you said amcheck reports as corrupt (the 3 pages that you shared). As I suspected, there are posting lists that contain duplicate TIDs, but look normal in every other way. This is very subtle. A few more questions for you, if you don't mind: 1. Can you tell me which tables the 4 known-corrupt indexes (page_redirect_namespace_len, page_len, transcode_key_idx, and page_main_title) are defined against? 2. Can you show me the definition of the table or tables, using \d from psql? 3. Do you notice any consistent pattern here? For example, are foreign keys involved? 4. How do the table or tables tend to get updated? Thanks again Executive summary on my progress: This now looks like it might be related to code on the heapam side, perhaps in heapam_index_build_range_scan(), which has a history of getting confused about duplicate TIDs with HOT chains (e.g., see Alvaro's commit a811ea5bde from late 2020). Here is why I now suspect heapam_index_build_range_scan(): It disturbed me that even "heapallindexed" amcheck verification did not detect the original reported problem, where we see index scans that give wrong answers (a scan of the page_main_title index shows an extra non-matching row, per Andrew's original message). We saw that amcheck does notice a few corrupt posting list tuples in other nearby indexes, but not this bigger problem. That eventually made me think about table_index_build_scan()/heapam_index_build_range_scan(), which is used by amcheck for heapallindexed verification -- most of the important work is outsourced to that core code. amcheck more or less performs heapallindexed verification by comparing the tuples that a REINDEX would see against the actual tuples it fingerprinting from the index. I wonder if heapallindexed verification doesn't detect corruption in the page_main_title index because "the table seems to agree with the index", even though the actual details are clearly wrong. In other words, perhaps even the table_index_build_scan() call inside amcheck's verify_ntree.c file says that there should be duplicate TIDs in the index, even though that's always wrong, by definition. What if table_index_build_scan() is actually the problem? If my hypothesis is true, then we might expect problems to not even go away following a REINDEX. Or maybe they would go away sometimes, but not other times. This idea is supported by remarks from Herman verschooten on Slack [1], when discussing a similar problem case on Postgres 14 [2]. I'll quote him directly: """ tranman_production=# CREATE INDEX index_freights_on_cmr_received ON public.freights USING btree (cmr_received) with (deduplicate_items = off); CREATE INDEX tranman_production=# \set VERBOSITY verbose tranman_production=# update freights set cmr_received=false where id=49632; ERROR: XX000: posting list tuple with 20 items cannot be split at offset 168 LOCATION: _bt_swap_posting, nbtdedup.c:1037 tranman_production=# drop index index_freights_on_cmr_received ; DROP INDEX tranman_production=# update freights set cmr_received=false where id=49632; UPDATE 1 """ Notice that the problem seems to be confined to one index, which has deduplication explicitly disabled. We see an error message about a posting list tuple all the same. I think that that's just a downstream consequence of duplicate TIDs appearing in the index. It looks like the corruption exists and is independent of whether the index is present or not. When the index is present the problem becomes obvious, but it's still there either way. The supposed posting list tuple with 20 items that the error message goes on about is actually just a plain tuple. The number 20 comes from its IndexTupleData.t_tid offset number. In other words, it's an offset number from the index tuple's pointed-to heap TID, which has been misinterpreted as something else by BTreeTupleGetNPosting(). BTreeTupleGetNPosting() expects to never be called against such a tuple, but it can happen once we assume duplicate TIDs are possible. (You'd see an assertion failure if asserts were enabled, but of course they're not enabled here.) Note that this independently reported "freights" case from Herman has the same amcheck issue we see with the mediawiki indexes on this thread: amcheck won't actually complain about the clearly corrupt index_freights_on_cmr_received index, but will complain about other indexes on the same table. This time around amcheck complains about out-of-order index tuples, but I strongly suspect that that's practically the same condition as the "posting list contains misplaced TID" error messages we see here. In other words, I think that both cases just have duplicate heap TIDs, and that the variation in error messages is totally unrelated to the true underlying problem. [1] https://postgresteam.slack.com/archives/C0FS3UTAP/p1635161173202100?thread_ts=1635154585.197300&cid=C0FS3UTAP [2] https://postgr.es/m/8CDB73C1-E3AF-40A6-BA81-8AFE174C6402@verschooten.net -- Peter Geoghegan
pgsql-bugs by date: