Re: [HACKERS] Range Merge Join v1 - Mailing list pgsql-hackers
From | Jeff Davis |
---|---|
Subject | Re: [HACKERS] Range Merge Join v1 |
Date | |
Msg-id | CAMp0ubfKjj6H39iAX=zoPZ0ke_MAyTatQPwMrfw5dkSSztuV4Q@mail.gmail.com Whole thread Raw |
Responses |
Re: [HACKERS] Range Merge Join v1
|
List | pgsql-hackers |
Version 2 attached. Fixed a few issues, expanded tests, added docs. A simple performance test (script attached) shows about a 5X improvement when comparing against a nested loop with an inner index-only scan over a gist index. Even better, this doesn't require an index, so it will work even if the input relations are subqueries. Regards, Jeff Davis On Thu, Apr 6, 2017 at 1:43 AM, Jeff Davis <pgsql@j-davis.com> wrote: > ======== > Example: > ======== > > Find different people using the same website at the same time: > > create table session(sessionid text, username text, during tstzrange); > SELECT s1.username, s2.username, s1.during * s2.during > FROM session s1, session s2 > WHERE s1.during && s2.during AND s1.username < s2.username > > > ===================================== > Brief summary of previous discussion: > ===================================== > > http://www.postgresql.org/message-id/1334554850.10878.52.camel@jdavis > > - can indexes solve it already (parameterized paths, etc.)? > - planner complexity (track path keys, mergejoinable equality, etc.) > - spatial join algorithm choice > - compare range merge join against existing indexes to see if any > indexing approach is viable > > ========== > Externals: > ========== > > No new syntax or other externals. Just improved execution strategy for > joining ranges using the overlaps operator (&&). > > New syntax is possible, but I don't see a strong reason to support > special range join syntax at this time. > > ============= > Optimizer Design > ============= > > I took the path of least resistance, so if I am doing something wrong > please let me know. I hardwired it to look for the range overlaps > operator when checking if it could do a merge join, and then add > range_ops to restrictinfo->mergeopfamilies. > > Then, I have to suppress adding it as an equivalence class, because > overlaps is not equality. It still adds single-member ECs to > restrictinfo->right_ec and left_ec, but (I think) that's OK. > > Also, I have to prevent generating paths for right/full join, because > the range join algorithm can't (currently) handle those. > > ============= > Costing > ============= > > Needs more consideration. Seems to do reasonable things in the few > examples I tried. > > ============= > Executor Design > ============= > > See detailed comments in nodeMergejoin.c > > ============= > Performance > ============= > > Seems much better than index nest loop join when the join is not > selective. I will post detailed numbers soon. > > If no index is available, range merge join is the only reasonable way > to execute a range join. For instance, if the inner is not a leaf in > the plan. > > ============= > Alternatives: > ============= > > It was suggested that I approach it as a general spatial-join > problem. That has merit, but I rejected it for now because the > algorithm that I think has the most promise is range-oriented > anyway. By that I mean we would need to extract all of the dimensions > into ranges first, and then perform the join. With that in mind, it > makes sense to implement range joins first; and then later provide the > APIs to get at the spatial dimensions so that we can implement other > spatial joins as range joins. > > Another suggestion was to base it on hash join, but I never understood > that proposal and it didn't seem to map very well to a spatial join. > > Yet another suggestion was to use some kind of temporary index. Some > brief experiments I did indicated that it would be fairly slow (though > more investigation might be useful here). Also, it doesn't provide any > alternative to the nestloop-with-inner-index we already offer at the > leaf level today. > > Regards, > Jeff Davis -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: