Re: Incorrect behaviour when using a GiST index on points - Mailing list pgsql-hackers
From | Noah Misch |
---|---|
Subject | Re: Incorrect behaviour when using a GiST index on points |
Date | |
Msg-id | 20121018191828.GB10844@tornado.leadboat.com Whole thread Raw |
In response to | Re: Incorrect behaviour when using a GiST index on points (Noah Misch <noah@leadboat.com>) |
Responses |
Re: Incorrect behaviour when using a GiST index on points
|
List | pgsql-hackers |
On Thu, Oct 11, 2012 at 07:17:28AM -0400, Noah Misch wrote: > On Tue, Oct 02, 2012 at 01:58:40PM -0400, Noah Misch wrote: > > > > On Mon, Aug 27, 2012 at 7:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > >> There's also the big-picture question of whether we should just get rid > > > >> of fuzzy comparisons in the geometric types instead of trying to hack > > > >> indexes to work around them. > > > In any event, I think we should entertain a patch to make the GiST operator > > class methods bug-compatible with corresponding operators. Even if we decide > > to change operator behavior in HEAD, the back branches could use it. > > We have broad agreement that the specific implementation of fuzz in geometric > comparison operators is shoddy, but nobody has voiced interest in designing a > concrete improvement. I propose adding a TODO item "Remove or improve > rounding in geometric comparison operators", endorsing Alexander's design, and > reviewing his patch. Objections? TODO added, and here's a review: The patch adds no regression tests; it should add tests illustrating the problems it fixes. I audited the other indexable geometric operators for similar problems. This passage in gist_point_consistent_internal(), which handles (point,point) operators, caught my suspicion: case RTSameStrategyNumber: if (isLeaf) { result = FPeq(key->low.x, query->x) && FPeq(key->low.y, query->y); } else { result = (query->x <= key->high.x && query->x >=key->low.x && query->y <= key->high.y && query->y >= key->low.y); } break; A leaf entry reachable from an internal entry may fall exactly on the internal-entry bounding box. Since we would accept a fuzzy match at the leaf level, I think we must also accept a fuzzy match at the internal level. > *** a/src/backend/access/gist/gistproc.c > --- b/src/backend/access/gist/gistproc.c > *************** > *** 1326,1331 **** gist_point_consistent(PG_FUNCTION_ARGS) > --- 1327,1333 ---- > bool *recheck = (bool *) PG_GETARG_POINTER(4); > bool result; > StrategyNumber strategyGroup = strategy / GeoStrategyNumberOffset; > + BOX *query, *key; This function now has "query" variables within subsidiary blocks redundant with and masking this one. Avoid doing that. > > switch (strategyGroup) > { > *************** > *** 1337,1348 **** gist_point_consistent(PG_FUNCTION_ARGS) > *recheck = false; > break; > case BoxStrategyNumberGroup: > ! result = DatumGetBool(DirectFunctionCall5( > ! gist_box_consistent, > ! PointerGetDatum(entry), > ! PG_GETARG_DATUM(1), > ! Int16GetDatum(RTOverlapStrategyNumber), > ! 0, PointerGetDatum(recheck))); > break; > case PolygonStrategyNumberGroup: > { > --- 1339,1356 ---- > *recheck = false; > break; > case BoxStrategyNumberGroup: > ! /* > ! * This code repeats logic of on_ob which uses simple comparison > ! * rather than FP* functions. > ! */ > ! query = PG_GETARG_BOX_P(1); > ! key = DatumGetBoxP(entry->key); > ! > ! *recheck = false; > ! result = key->high.x >= query->low.x && > ! key->low.x <= query->high.x && > ! key->high.y >= query->low.y && > ! key->low.y <= query->high.y; For leaf entries, this correctly degenerates to on_pb(). For internal entries, it must, but does not, implement box_overlap(). (The fuzzy box_overlap() would be fine.) I recommend making gist_point_consistent()'s treatment of boxes resemble its treatment of circles and polygons; that eases verifying their correctness. Call gist_box_consistent. Then, for leaf entries, call box_contain_pt(). GiST "consistent" functions often validate the strategy number, but the circle, polygon and box branches of gist_point_consistent silently assume strategy % GeoStrategyNumberOffset == RTContainedByStrategyNumber. Should they verify that assumption? I raise this as an incidental question; it is not new with your patch. Thanks, nm
pgsql-hackers by date: