Re: BUG #17245: Index corruption involving deduplicated entries - Mailing list pgsql-bugs
From | Peter Geoghegan |
---|---|
Subject | Re: BUG #17245: Index corruption involving deduplicated entries |
Date | |
Msg-id | CAH2-Wz=s5EMTPeXYO9vS9Pdtq8UkY77qAdmEQ4sL1UMODK8zEA@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #17245: Index corruption involving deduplicated entries (Kamigishi Rei <iijima.yun@koumakan.jp>) |
Responses |
Re: BUG #17245: Index corruption involving deduplicated entries
Re: BUG #17245: Index corruption involving deduplicated entries |
List | pgsql-bugs |
On Thu, Oct 28, 2021 at 1:04 AM Kamigishi Rei <iijima.yun@koumakan.jp> wrote: > I forgot to mention: I have the data directory from the PostgreSQL 13.3 > instance that ran inside the same jail on the same FreeBSD system since > May, so if that might be useful for the investigation, I can pull it > into another VM. I know I keep saying it's the last thing, but...I need more pages! Can you please provide me with additional pages from the heap relation "page", as follows: 5, 7, 12, 14, 16, 21, 22, 26 Your heap page (page 0 of the "page" heap relation) was very interesting. I saw that the TID that is incorrectly repeated in the index "page_len" is marked LP_UNUSED in the heap page. It's one of only two such TIDs/line pointers on your heap/table page, out of 61 total, so I doubt that this is a coincidence. That's why I'm asking for more heap pages -- I would like to verify that this is a consistent thing, and not just a remarkable coincidence. Now this is looking like a problem in VACUUM (pruning?), not a CREATE INDEX thing. It looks like somehow an item that should be LP_DEAD ends up being LP_UNUSED during pruning. I have CC'd Andres, to get his thoughts on this. This explanation fits quite well, because: 1. Even with every verification option enabled, amcheck only ever complains about duplicate TIDs (the exact details vary in not-very-interesting ways). I would expect to only see this if my new hypothesis was correct, because in practice, with many workloads, unused line pointers tend to get reused by later non-HOT updates that create new versions without changing the value of indexes columns. Because amcheck doesn't exhaustively verify that the index has no duplicate TIDs, you're only going to see complaints about duplicates that happen to be duplicated on all key values, up to and including the heap TID itself. 2. If this theory was correct, then we'd occasionally notice TIDs that are recycled by totally unrelated logical rows (or maybe the key values do change). This is what we see in the original problem report, where the index seems to think that 'Isokaze' and 'Ying_Swei' are the same value. The TID originally *did* point to some version that had the correct value, but since VACUUM never removed the TID from the index (because it was never made LP_DEAD in the heap by pruning when it should have been), the "pointed-to value" changed without the index having the obsolete TID removed. I am suggesting that 1 and 2 are the same thing, really. They look a little different, but that's only due to limitations in our tooling. In addition to the "amcheck cannot reliably detect duplicate TIDs in an index" issue I mentioned, there are also heapallindexed limitations to consider. We should not expect the heapallindexed stuff to detect the case where an index has TIDs that currently point to LP_UNUSED items in the heap -- heapallindexed verification can only detect the absence of a needed index tuple, not the presence of a faulty/corrupt index tuple. That's probably why heapallindexed doesn't detect one single problem here, even though the original complaint describes exactly the kind of scenario you might expect heapallindexed to catch. 3. Multiple indexes on the same table "independently" have very similar corruption. I noticed that the heap TID (21,41) is duplicated in posting lists in both the page_redirect_namespace_len index (on block 14, posting list tuple at page offset 13), as well as the page_len index (on block 1, posting list tuple at page offset 15). If this was just a coincidence (if the problem was just in nbtree) then it would be a remarkable coincidence. -- Peter Geoghegan
pgsql-bugs by date: