Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows - Mailing list pgsql-bugs
From | Alexander Korotkov |
---|---|
Subject | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows |
Date | |
Msg-id | CAPpHfdsYBc=vPsF4+5aqrk4DaF5XNvMuDCJkg5DWt-GXXj1qPg@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: BUG #16792: silent corruption of GIN index resulting in SELECTs returning non-matching rows
|
List | pgsql-bugs |
( . () w On Tue, Jun 22, 2021 at 7:02 PM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Fri, Jun 18, 2021 at 12:55 AM Alexander Korotkov > <aekorotkov@gmail.com> wrote: > > On Thu, Jun 17, 2021 at 10:49 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > Anyway, I'm punting this to the code authors. I'd like to see > > > some comments clarifying what the API really is, not just a > > > quick-n-dirty code fix. > > > > Yep, I'm going to have a closer look at this tomorrow. > > Sorry for the delay. I'm going to take a closer look in the next > couple of days. I've closer look at shimTriConsistentFn() function. It looks to me like the function itself has inconsistencies. But the way it's currently used in GIN shouldn't produce the wrong query answers. shimTriConsistentFn() is one of implementation of GinScanKey.triConsistentFn. In turn, GinScanKey.triConsistentFn have 3 callers: 1) startScanKey() to determine required keys 2) keyGetItem() for lossy item check 3) keyGetItem() for normal item check Let's see shimTriConsistentFn() inconsistencies and how callers handle them. 1) shimTriConsistentFn() returns result of directBoolConsistentFn() for zero maybe entries without examining key->recheckCurItem. That may result in returning GIN_TRUE instead of GIN_MAYBE 1.1) startScanKey() doesn't care about recheck, just looking for GIN_FALSE result. 1.2) keyGetItem() for lossy item always implies recheck. 1.3) keyGetItem() for a normal item does the trick. Whether a current item is rechecked is controlled by key->recheckCurItem variable (the same which is set by directBoolConsistentFn(). The following switch sets key->recheckCurItem for GIN_MAYBE, but leaves it untouched for GIN_TRUE. Thus, GIN_TRUE with key->recheckCurItem works here just like GIN_MAYBE. I think this is inconsistent by itself, but this inconsistency compensates for inconsistency in shimTriConsistentFn(). 2) shimTriConsistentFn() might call directBoolConsistentFn() with all false inputs for GIN_SEARCH_MODE_DEFAULT. The result is intended to be false, but opclass consistent method isn't intended to handle this situation correctly. For instance, ginint4_consistent() returns true without checking the input array. That could make shimTriConsistentFn() return GIN_TRUE instead of GIN_MAYBE, or GIN_MAYBE instead of GIN_FALSE. 2.1) In principle, this could lead startScanKey() to select less required entries than possible. But this is definitely not the case of ginint4_consistent() when meeting any of entries is enough for match. 2.2) In principle, keyGetItem() could get false positives for lossy items. But that wouldn't lead to wrong query answers. Again, this is not the case of ginint4_consistent(). 2.3) keyGetItem() for a normal item doesn't call triConsistentFn() with any GIN_MAYBE or all GIN_FALSE. To sum up, I don't see how inconsistencies in shimTriConsistentFn() could lead to wrong query answers, especially in intarray GIN index. Nevertheless, these inconsistencies should be fixed. I'm going to propose a patch soon. ------ Regards, Alexander Korotkov
pgsql-bugs by date: