Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs
From | Peter Geoghegan |
---|---|
Subject | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date | |
Msg-id | CAH2-WzkvYQDXBone0vvYH8BJY-EEyRenMvztaDv0ZdoZHp7acw@mail.gmail.com Whole thread Raw |
In response to | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
|
List | pgsql-bugs |
On Sat, Jul 17, 2021 at 2:27 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > Right. Looking at the code, I don't think that can happen. The code that > deals with the pending list uses lock coupling, so it should never > follow a pointer to a page that was concurrently deleted. Might > something funny happen if a pending list page is deleted and immediately > reused as another kind of page? I don't see a problem with that. I don't immediately see a problem either, but that hardly means much -- I can't honestly claim that I understand the code here. Why does the code in ginInsertCleanup() use lock coupling? It's probably true that if you don't couple pending list pages then the tests would break in an obvious way, but that tells us nothing about the design itself. It was probably self-evident what was going on in earlier versions of the code, back when it could only run in VACUUM anyway, when it didn't eagerly recycle pending list pages, etc. But it's far from obvious now that we have all that stuff too. Let's say processPendingPage() finds a page that is fundamentally of the wrong kind (i.e. not a pending page). Why shouldn't the result be something scary (some kind of data corruption), that doesn't come with an error or hard crash? I can see no reason. Small differences in the format of the tuples seem unlikely to be caught anywhere, at least offhand. Another possible consequence of the lack of sanity checking is that whole *groups* of sibling pages (say from a posting tree or the entry tree) that look like they're a group from the pending list (but are actually from some tree) could get corrupted, one by one. This could happen as a secondary effect of the initial corruption, which might have only affected one page at first. Why shouldn't ginInsertCleanup() do that? Almost all GIN pages use "GinPageGetOpaque(page)->rightlink" in one way or another. If this is what happened then there really is no telling what you'll end up with. You may have a very hard time zeroing in on the truly relevant corruption. > And a > gin btree page won't be reused prematurely as a pending list page, > because ginNewBuffer() checks GinPageIsRecyclable(), even if the page is > being reused as a pending list page. It doesn't matter that ginNewBuffer() checks GinPageIsRecyclable(), since the page header's pd_prune_xid field will be 0 for a pending list page (same as other index AMs, though not the same as other GIN code). Note that GinPageIsRecyclable() returns "true" (safe to recycle) when it sees any "pd_prune_xid = 0" page, by design. Also, if ginInsertCleanup() ever tries to process/delete a posting tree or entry tree page, then it's still not going to set pd_prune_xid in the first place (because it believes that they're pending list pages). > While playing with the trace Pawel sent, I loaded some of the index > pages into a local instance so that I could step through the code with > gdb. I filled the rest of the pages with zeros. I managed to get the gin > btree descend code into an infinite loop with that. When it followed a > pointer to an entry leaf page, but that entry leaf page was substituted > with zeros, it interpreted the all-zeros page as a page with a > right-link to block 0. It then stepped right to read block 0 - which is > the metapage! - and incorrectly interpreted it as an internal page. > Amazingly, it managed to perform the binary search on it as if it > contained index tuples, and came up wth an another block number, which I > had also replaced with all-zeros. That doesn't actually surprise me. I have deliberately corrupted a lot of test data in all kinds of ways. I have some idea how broken things can be internally without breaking in a very obvious way. > Long story short, we should sprinkle a lot more sanity checks to the > functions in ginbtree.c. It should check that the page is of expected > kind; no stepping from internal entry page to posting tree page. And > seeing a pointer to block 0 at any step should be an error. Definitely. > Yeah, I wouldn't be surprised if this turns out to be a bug in pending > list cleanup. But I haven't been able to come up with a concrete > sequence of steps that would cause it. I also wouldn't be surprised if it was some unrelated issue, despite everything. As broken as I think the code in ginInsertCleanup() is, it has managed to be broken without anybody noticing for quite a while on a few occasions already. So unfortunately all of my observations about ginInsertCleanup() may in the end not help you to find the true underlying bug. -- Peter Geoghegan
pgsql-bugs by date: