Re: SP-GiST for ranges based on 2d-mapping and quad-tree - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: SP-GiST for ranges based on 2d-mapping and quad-tree |
Date | |
Msg-id | 50145A9C.7080400@enterprisedb.com Whole thread Raw |
In response to | Re: SP-GiST for ranges based on 2d-mapping and quad-tree (Alexander Korotkov <aekorotkov@gmail.com>) |
Responses |
Re: SP-GiST for ranges based on 2d-mapping and quad-tree
|
List | pgsql-hackers |
On 23.07.2012 10:37, Alexander Korotkov wrote: > On Fri, Jul 20, 2012 at 3:48 PM, Heikki Linnakangas< > heikki.linnakangas@enterprisedb.com> wrote: > >> It would be nice to have an introduction, perhaps as a file comment at the >> top of rangetypes_spgist.c, explaining how the quad tree works. I have a >> general idea of what a quad tree is, but it's not immediately obvious how >> it maps to SP-GiST. What is stored on a leaf node and an internal node? >> What is the 'prefix' (seems to be the centroid)? How are ranges mapped to >> 2D points? (the function comment of getQuadrant() is a good start for that >> last one) > > I've added some comments at the top of rangetypes_spgist.c. Thanks, much better. I think the handling of empty ranges needs some further explanation. If I understand the code correctly, the root node can contain a centroid like usual, and empty ranges are placed in the magic 5th quadrant. Alternatively, the root node has no centroid, and contains only two subnodes: all empty ranges are placed under one subnode, and non-empty ranges under the other. It seems it would be simpler if we always stored empty nodes the latter way, with a no-centroid root node, and nodes with a centroid would always only have 4 subnodes. When the first empty range is added to an index that already contains non-empty values, the choose-function could return spgSplitTuple to create a new root node that divides the space into empty and non-empty values. Then again, I don't think it would actually simplify the code much. Handling the 5th quadrant doesn't require much code in spg_range_quad_inner_consistent() as it is. So maybe it's better to leave it the way it is. Or perhaps we should stipulate that a root node with no centroid can only contain empty ranges. When you add the first non-empty range, have choose function return spgSplitTuple, and create a new root node with a centroid, and the empty ranges in the 5th quadrant. >> In spg_range_quad_inner_**consistent(), if in->hasPrefix == true, ISTM that >> in all cases where 'empty' is true, 'which' is set to 0, meaning that there >> can be no matches in any of the quadrants. In most of the case-branches, >> you explicitly check for 'empty', but even in the ones where you don't, I >> think you end up setting which=0 if empty==true. I'm not 100% sure about >> the RANGESTRAT_ADJACENT case, though. Am I missing something? > > Ops., it was a bug: RANGESTRAT_ADJACENT shoud set which=0 if empty==true, > while RANGESTRAT_CONTAINS and RANGESTRAT_CONTAINED_BY not. Corrected. Ok. I did some copy-editing, rewording some comments, and fixing whitespace. Patch attached, hope I didn't break anything. I think the most difficult part of the patch is the spg_range_quad_inner_consistent() function. It's hard to understand how the various strategies are implemented. I started to expand the comments in for each strategy, explaining how each strategy maps to a bounding box in the 2D space, but I'm not sure how much that actually helps. Perhaps it would help to also restructure the code so that you first have a switch-case statement that maps the scan key to bounding box, without worrying about the centroid yet, and then check which quadrants you need to descend to to find points within the box. The adjacent-strategy is more complicated than a simple bounding-box search, though. I'm having trouble understanding how exactly the RANGESTRAT_ADJACENT works. The geometric interpretation of 'adjacent' is that the scan key defines two lines, one horizontal and one vertical. Any points that lie on either of the lines match the query. Looking at the implementation, it's not clear how the code implements the search for along those two lines. Also, I wonder if we really need to reconstruct the "previous" value in a RANGESTRAT_ADJACENT search. ISTM we only need to remember which of the two lines we are chasing. For example, if you descend to quadrant 2 because there might be a point there that lies on the horizontal line, but we already know that there can't be any points there lie on the vertical line, you only need to remember that, not the whole centroid from the previous level. Does the SP-GiST API require the "reconstructed" values stored by inner_consistent to be of the correct datatype, or can it store any Datums in the array? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Attachment
pgsql-hackers by date: