Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs
From | Alexander Korotkov |
---|---|
Subject | Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date | |
Msg-id | CAPpHfdu41xL4z6LXt6-0USA8cZ0Nm8YtEA7B8ZZwrBWcc+mA5w@mail.gmail.com Whole thread Raw |
In response to | IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: IRe: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
|
List | pgsql-bugs |
(On Thu, Jun 24, 2021 at 3:59 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > On 17/06/2021 22:49, Tom Lane wrote: > > I wrote: > >> Pawel Kudzia <kudzia@gmail.com> writes: > >>> with help from IRC we've found that decreasing work_mem from 1MB to 256kB > >>> or less makes the problem go away: > > > >> Hmm. So that suggests that the index itself is *not* corrupt, > >> but the problem is associated with a bug in the indexscan > >> algorithms. > > > > After staring at the code I think there is at least one bug, and > > possibly two, in shimTriConsistentFn. That's likely to be implicated > > here because intarray's GIN opclass only provides a bool consistent > > function. I'm not very clear on the circumstances that lead to the scan > > code inventing GIN_MAYBE inputs, so I haven't been able to construct a > > test case. > > > > The definite bug is triggered because intarray relies on the API > > specification that says that if the search mode is > > GIN_SEARCH_MODE_DEFAULT, then the consistentFn will only be called > > when there's at least one TRUE key: > > > > case RTOverlapStrategyNumber: > > /* result is not lossy */ > > *recheck = false; > > /* at least one element in check[] is true, so result = true */ > > res = true; > > break; > > > > shimTriConsistentFn ignores this and calls it on all-FALSE inputs, > > for which it'll incorrectly get a TRUE result, as it will also for > > all the following checks. The upshot is that shimTriConsistentFn > > will convert any case with a MAYBE input to a hard TRUE result, > > with no recheck requirement. This'd easily explain the reported > > misbehavior. > > That's subtle. The documentation says: > > > If <literal>*searchMode</literal> is set to > > <literal>GIN_SEARCH_MODE_DEFAULT</literal> (which is the value it is > > initialized to before call), only items that match at least one of > > the returned keys are considered candidate matches. > > I guess you can read that as "consistentFn will only be called when > there is at least one matching item", but that didn't occur to me when I > read that at first. But I agree we need to fix shimTriConsistentFn to > respect that. > > > The other thing that I'm unsure about, because the code is horribly > > underdocumented, is that it's not very clear whether > > shimTriConsistentFn is accurately converting between the bool and > > the tristate APIs. That's because it's not very clear what the > > tristate API actually *is*. In particular, is the end state of > > key->recheckCurItem meaningful in the tristate case? If it's not, > > then the short-circuit case for no MAYBE inputs is broken, because > > it'll return TRUE when the bool consistentFn is trying to tell us > > to recheck. But if it is meaningful, then the multiway case is broken, > > because it will return the recheckCurItem result set by the last call to > > the bool consistentfn; which might be false even though other calls said > > true. (So in that case I think we'd need "key->recheckCurItem = recheck" > > at the end.) I also wonder how any of that works correctly for real > > triconsistent functions, which don't have access to the recheckCurItem > > flag. > > > > Anyway, I'm punting this to the code authors. I'd like to see > > some comments clarifying what the API really is, not just a > > quick-n-dirty code fix. > > I've been trying to create a test case for this, but no luck so far. I > cannot find a codepath that would hit these bugs with the kind of query > that Pawel used. The search condition is very simple: "column && > '{constant}'", with only one key and one constant to search for. There > are three calls to triConsistentFn: > > 1. In startScanKey, but that's only reached if (key->nentries > 1), so > not reachable with Pawel's query. > > 2. In keyGetItem, when haveLossyEntry==true. But AFAICS, haveLossyEntry > is never true with Pawel's query. That would require curItem to be a > lossy pointer, and that in turn can only happen when matchBitmap is > used, and that's only used with partial match queries and with queries > that need a full-index scan. > > 3. The second call in keyGetItem is reached, but it is only passed > GIN_MAYBE when curItem is lossy. Which isn't possible with this query, > see point 2. > > AFAICS the recheckCurItem bug should also not cause wrong results with > that query. I must be missing something. > > I could probably construct a test case with a different query, one > involving multiple elements in the search key, but I'd like to have a > solid explanation for the original report. +1, I see query provided by Pawel is too simple. Bugs, which Tom digged in shimTriConsistentFn() don't seem to explain the wrong query results provided by Pawel. > Pawel, is this a production system, or would it be possible for you to > build from sources with a patch with some extra debugging output? Do you think it also worth checking whether bug persists when set fastupdate = off. Then we could localize whether bug is related to pending lists. ------ Regards, Alexander Korotkov
pgsql-bugs by date: