Thread: Rangejoin rebased
New rangejoin patch attached. I had previously attempted to make this work well for multiple range join keys, but this patch implements single rangejoin keys only, and the rest of the rangejoin clauses are effectively just rechecks. I believe it can be made effective for multiple rangejoin keys, but at the cost of additional complexity which is neither justified nor implemented at this point. Regards, Jeff Davis
Attachment
On 29 December 2017 at 18:25, Jeff Davis <pgsql@j-davis.com> wrote: > New rangejoin patch attached. > > I had previously attempted to make this work well for multiple range > join keys, but this patch implements single rangejoin keys only, and > the rest of the rangejoin clauses are effectively just rechecks. I > believe it can be made effective for multiple rangejoin keys, but at > the cost of additional complexity which is neither justified nor > implemented at this point. For this to be useful, it needs to include some details of how to use it when people have NOT used range datatypes in their tables. If we can see some examples with StartDate and EndDate cast to a tzrange, plus some "don't write it like this" anti-patterns that would help. We can then make it clear that this is a huge performance benefit for these important cases. Just to emphasise why we want this, it might be better for the EXPLAIN to say "Time Range Join" when the ranges being joined are Time Ranges, and for other cases to just say "Range Join". The use of the word Merge doesn't help much there. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Sat, Jan 6, 2018 at 10:38 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > For this to be useful, it needs to include some details of how to use > it when people have NOT used range datatypes in their tables. Good idea. I added an example that doesn't have range types in the base table. > If we can see some examples with StartDate and EndDate cast to a > tzrange, plus some "don't write it like this" anti-patterns that would > help. By "anti-patterns", I assume you mean implementing range intersection by hand in SQL over non-range types. Such an example would be quite a long query, and might add to the confusion -- it seems strange to spend more text explaining what *not* to do than what to do. I see what you are saying, but perhaps I'm just not coming up with the right text to explain it succinctly in the manual, so for now I just added one example. Let me know if you think it needs more. > We can then make it clear that this is a huge performance benefit for > these important cases. Done. > Just to emphasise why we want this, it might be better for the EXPLAIN > to say "Time Range Join" when the ranges being joined are Time Ranges, > and for other cases to just say "Range Join". The use of the word > Merge doesn't help much there. I don't care for special-casing the word "time" in there, because no other type would benefit. It seems a little too magical. I also do like leaving "merge" in there because it helps the user understand why their inputs are being sorted. Regards, Jeff Davis
Attachment
On 10 January 2018 at 04:24, Jeff Davis <pgsql@j-davis.com> wrote: > On Sat, Jan 6, 2018 at 10:38 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> For this to be useful, it needs to include some details of how to use >> it when people have NOT used range datatypes in their tables. > > Good idea. I added an example that doesn't have range types in the base table. Cool, thanks ... It would be useful to consider any related use cases. Are there applications for range operators other than &&? Do we optimize for TIMESTAMP <@ RANGE as well? Does this link in nicely with partition-aware joins? Does it allow partition exclusion if you join a daterange to a time range partitioned table? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 10 January 2018 at 04:24, Jeff Davis <pgsql@j-davis.com> wrote: > Done. I think you need to make changes to other parts of the docs also, so that it is clear what will now be possible https://www.postgresql.org/docs/devel/static/using-explain.html https://www.postgresql.org/docs/devel/static/xoper-optimization.html#id-1.8.3.17.9 https://www.postgresql.org/docs/devel/static/planner-optimizer.html -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Hi Jeff, Just a quick comment -- I ran a slightly modified version of a query from the regression tests, and got an assertion failure: select i1, ir1, i2, ir2 from (select * from rangejoin_left order by ir1 desc) as a1 inner join (select * from rangejoin_right order by ir2 desc) as a2 on (i1 = i2 and ir1 && ir2) order by ir1 desc, i1; TRAP: FailedAssertion("!(!ssup->ssup_reverse)", File: "/home/akuzmenkov/pgp-old/build/../postgrespro/src/backend/executor/nodeMergejoin.c", Line: 492) The sort order isn't right for the join, it seems. I remember having similar troubles with my full merge join implementation. I tried filtering unsuitable paths in try_mergejoin_path, but that was not quite enough. The planner tries to use only one sort direction to limit the number of path, so the path we need might not be there at all. This optimization was added in commit 834ddc62, see right_merge_direction(). Sadly, I have no idea how to solve this. -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Tue, Jan 9, 2018 at 11:24 PM, Jeff Davis <pgsql@j-davis.com> wrote: >> Just to emphasise why we want this, it might be better for the EXPLAIN >> to say "Time Range Join" when the ranges being joined are Time Ranges, >> and for other cases to just say "Range Join". The use of the word >> Merge doesn't help much there. > > I don't care for special-casing the word "time" in there, because no > other type would benefit. It seems a little too magical. I also do > like leaving "merge" in there because it helps the user understand why > their inputs are being sorted. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jan 12, 2018 at 11:02 AM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > The sort order isn't right for the join, it seems. I remember having similar > troubles with my full merge join implementation. I tried filtering > unsuitable paths in try_mergejoin_path, but that was not quite enough. The > planner tries to use only one sort direction to limit the number of path, so > the path we need might not be there at all. This optimization was added in > commit 834ddc62, see right_merge_direction(). Sadly, I have no idea how to > solve this. Interesting problem, thank you. My first reaction was to hack right_merge_direction() to recognize the range opfamily in the ASC direction as always potentially useful. But what's happening is that, when the inputs are already forced to be sorted DESC, as in your example, there is still no appropriate pathkey. So the problem is reproducible with no ORDER BY clause at all. My proposed fix is to make an internal opfamily identical to the external one, such that it's not recognized as part of the same EC, and the planner won't try to eliminate it. It loses out on potential optimizations, but those are mostly theoretical since the btree opclass ordering for ranges is not very interesting to a user. I notice that window functions seem to handle these cases better, maybe that approach would work for your full join patch? I haven't investigated that yet. Regards, Jeff Davis
On Wed, Jan 10, 2018 at 7:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Do we optimize for TIMESTAMP <@ RANGE as well? Not currently. It requires a little extra complexity because empty ranges will match anything and need special handling. > Does this link in nicely with partition-aware joins? Yes: if the partitioning is on a non-range column, and the join key includes both the partition key and a range column, it can do partition-wise joins. It does not try to invent a concept of partitioning on a spatial key. > Does it allow partition exclusion if you join a daterange to a time > range partitioned table? I'm a little unclear what you mean here. Are you talking about spatial partitioning? Or are you talking about joining a daterange column to a timestamptz column (I suppose using @>)? I think the answer to your question is "no", but let me know if I am missing an important case. Regards, Jeff Davis
On 17 January 2018 at 05:49, Jeff Davis <pgsql@j-davis.com> wrote: > On Wed, Jan 10, 2018 at 7:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> Do we optimize for TIMESTAMP <@ RANGE as well? > > Not currently. It requires a little extra complexity because empty > ranges will match anything and need special handling. TIMESTAMP <@ RANGE is arguably more important than RANGE && RANGE Trying to cast timestamp to range to make that work is a bit hokey If the problem is just empty ranges, it seems like we should do that here also. I'd be happy with the optimization only working if ranges are provably non-empty, e.g. CHECK (NOT isempty(col)) Or perhaps we need non-empty types: e.g. tsrangene -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 19 January 2018 at 08:25, Simon Riggs <simon@2ndquadrant.com> wrote: > On 17 January 2018 at 05:49, Jeff Davis <pgsql@j-davis.com> wrote: >> On Wed, Jan 10, 2018 at 7:49 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> Do we optimize for TIMESTAMP <@ RANGE as well? >> >> Not currently. It requires a little extra complexity because empty >> ranges will match anything and need special handling. err... that isn't correct. An empty range matches nothing, so can be ignored in joins. So probably best to explain some more, please. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 19, 2018 at 2:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > err... that isn't correct. An empty range matches nothing, so can be > ignored in joins. > > So probably best to explain some more, please. The semantics of R1 @> R2 will return true if R1 is a non-NULL range and R2 is empty. It's set semantics, and all sets contain the empty set. But I understand @> is an important case so I am looking into it. Regards, Jeff Davis
On 23 January 2018 at 05:08, Jeff Davis <pgsql@j-davis.com> wrote: > On Fri, Jan 19, 2018 at 2:07 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> err... that isn't correct. An empty range matches nothing, so can be >> ignored in joins. >> >> So probably best to explain some more, please. > > The semantics of R1 @> R2 will return true if R1 is a non-NULL range > and R2 is empty. > > It's set semantics, and all sets contain the empty set. Understood > But I understand @> is an important case so I am looking into it. Perhaps we are misunderstanding each other TIMESTAMP <@ RANGE1 doesn't match if RANGE1 is empty and that is the most important case RANGE OP RANGE is important also. It would be useful for OP to be more than just && It's certainly weird that R1 @> EMPTY is true, but R1 && EMPTY is not. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Tue, Jan 23, 2018 at 2:04 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Perhaps we are misunderstanding each other > > TIMESTAMP <@ RANGE1 doesn't match if RANGE1 is empty > and that is the most important case When <@ is supported, that case should be fine if range1 is on the outer. The case I was concerned about is with a R1 <@ R2 join where R1 is on the inner side and could have empty ranges. One option would be to try to force R2 to be on the inner. But that doesn't quite solve other related issues, like if R2 has a few large ranges that contain almost everything. Right now I'm considering an approach where we use some counters to determine that a few ranges are preventing us from moving the mark forward, and then move those few ranges into a separate tuplestore so that we can move the mark forward. > RANGE OP RANGE is important also. It would be useful for OP to be more > than just && I agree that contains/contained-by are useful; do you have other operators in mind as well? > > It's certainly weird that R1 @> EMPTY is true, but R1 && EMPTY is not. This was discussed back in 9.2, and there were no obviously better semantics available. I chose to follow set semantics: X contains Y means that Y is a subset of X; X overlaps Y means that the intersection of X and Y is nonempty. I understand it can be surprising, but other definitions can be surprising, too. Regards, Jeff Davis
On 16.01.2018 10:49, Jeff Davis wrote: > My proposed fix is to make an internal opfamily identical to the > external one, such that it's not recognized as part of the same EC, > and the planner won't try to eliminate it. It loses out on potential > optimizations, but those are mostly theoretical since the btree > opclass ordering for ranges is not very interesting to a user. I think I figured out what to do with missing sort directions. We can change select_outer_pathkeys_for_merge() to generate the pathkeys we need. Also, find_mergeclauses_for_outer_pathkeys() has to be changed too, so that it knows which pathkeys are compatible to which range join clauses. About the patch, do I understand it right that you are working on the next version now? -- Alexander Kuzmenkov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
On Fri, Mar 2, 2018 at 11:12 AM, Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> wrote: > On 16.01.2018 10:49, Jeff Davis wrote: >> My proposed fix is to make an internal opfamily identical to the >> external one, such that it's not recognized as part of the same EC, >> and the planner won't try to eliminate it. It loses out on potential >> optimizations, but those are mostly theoretical since the btree >> opclass ordering for ranges is not very interesting to a user. > > I think I figured out what to do with missing sort directions. We can change > select_outer_pathkeys_for_merge() to generate the pathkeys we need. Also, > find_mergeclauses_for_outer_pathkeys() has to be changed too, so that it > knows which pathkeys are compatible to which range join clauses. > > About the patch, do I understand it right that you are working on the next > version now? I think we are quite clearly past the deadline to submit a new patch for inclusion in v11 at this point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 3/2/18 11:44 AM, Robert Haas wrote: > On Fri, Mar 2, 2018 at 11:12 AM, Alexander Kuzmenkov > <a.kuzmenkov@postgrespro.ru> wrote: >> On 16.01.2018 10:49, Jeff Davis wrote: >>> My proposed fix is to make an internal opfamily identical to the >>> external one, such that it's not recognized as part of the same EC, >>> and the planner won't try to eliminate it. It loses out on potential >>> optimizations, but those are mostly theoretical since the btree >>> opclass ordering for ranges is not very interesting to a user. >> >> I think I figured out what to do with missing sort directions. We can change >> select_outer_pathkeys_for_merge() to generate the pathkeys we need. Also, >> find_mergeclauses_for_outer_pathkeys() has to be changed too, so that it >> knows which pathkeys are compatible to which range join clauses. >> >> About the patch, do I understand it right that you are working on the next >> version now? > > I think we are quite clearly past the deadline to submit a new patch > for inclusion in v11 at this point. It seems that a new patch is needed here but none has been presented. I've marked this Waiting on Author for the moment, but I really think it should be marked Returned with Feedback and submitted to the next CF when a new patch is ready. Regards, -- -David david@pgmasters.net
Hi, On 2018-03-28 14:18:42 -0400, David Steele wrote: > It seems that a new patch is needed here but none has been presented. > I've marked this Waiting on Author for the moment, but I really think it > should be marked Returned with Feedback and submitted to the next CF > when a new patch is ready. I'd just do so now. There's not been any progress for months, and there's been an update request weeks ago... Greetings, Andres Freund
On 3/28/18 2:23 PM, Andres Freund wrote: > On 2018-03-28 14:18:42 -0400, David Steele wrote: >> It seems that a new patch is needed here but none has been presented. >> I've marked this Waiting on Author for the moment, but I really think it >> should be marked Returned with Feedback and submitted to the next CF >> when a new patch is ready. > > I'd just do so now. There's not been any progress for months, and > there's been an update request weeks ago... Done. -- -David david@pgmasters.net