Re: Performance improvement for joins where outer side is unique - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Performance improvement for joins where outer side is unique |
Date | |
Msg-id | CAKJS1f_jRki1PQ4X-9UGKa-wnBhECQLnrxCX5haQzu4SDR_r2Q@mail.gmail.com Whole thread Raw |
In response to | Re: Performance improvement for joins where outer side is unique (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Performance improvement for joins where outer side is unique
Re: Performance improvement for joins where outer side is unique Re: [HACKERS] Performance improvement for joins where outer side is unique |
List | pgsql-hackers |
On 12 March 2016 at 11:43, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> I wondered why, instead of inventing an extra semantics-modifying flag, >> we couldn't just change the jointype to *be* JOIN_SEMI when we've >> discovered that the inner side is unique. > > BTW, to clarify: I'm not imagining that we'd make this change in the > query jointree, as for example prepjointree.c might do. That would appear > to add join order constraints, which we don't want. But I'm thinking that > at the instant where we form a join Path, we could change the Path's > jointype to be JOIN_SEMI or JOIN_SEMI_OUTER if we're able to prove the > inner side unique, rather than annotating the Path with a separate flag. > Then that representation is what propagates forward. Thanks for looking at this. Yes that might work, since we'd just be changing the jointype in the JoinPath, if that path was discarded if favour of, say the commutative variant, which was not "unique inner", then it shouldn't matter, as the join type for that path would be the original one. The thing that might matter is that, this; explain (costs off) select * from t1 inner join t2 on t1.id=t2.id QUERY PLAN ------------------------------Hash Join Hash Cond: (t1.id = t2.id) -> Seq Scan on t1 -> Hash -> Seq Scan ont2 could become; QUERY PLAN ------------------------------Hash Semi Join Hash Cond: (t1.id = t2.id) -> Seq Scan on t1 -> Hash -> Seq Scanon t2 Wouldn't that cause quite a bit of confusion? People browsing EXPLAIN output might be a bit confused at the lack of EXISTS/IN clause in a query which is showing a Semi Join. Now, we could get around that by adding JOIN_SEMI_INNER I guess, and just displaying that as a normal inner join, yet it'll behave exactly like JOIN_SEMI! And if we do that, then the join node code changes from; if (node->js.match_first_tuple_only) node->nl_NeedNewOuter = true; to; if (node->js.jointype == JOIN_SEMI || node->js.jointype == JOIN_SEMI_INNER || node->js.jointype == JOIN_SEMI_LEFT) node->nl_NeedNewOuter = true; which is kinda horrid and adds a few cycles and code without a real need for it. If we kept the match_first_tuples_only and set it in the node startup similar to how it is now, or changed JoinType to be some bitmask mask that had a bit for each piece of a venn diagram, and an extra for the unique inner... but I'm not so sure I like that idea... To me it seems neater just to keep the match_first_tuple_only and just update the planner stuff to use the new jointypes. Thoughts? > > It seems like the major intellectual complexity here is to figure out > how to detect inner-side-unique at reasonable cost. I see that for > LEFT joins you're caching that in the SpecialJoinInfos, which is probably > fine. But for INNER joins it looks like you're just doing it over again > for every candidate join, and that seems mighty expensive. hmm yeah, perhaps something can be done to cache that, I'm just not all that sure how much reuse the cache will get used since it generates the Paths for nestloop/merge/has join methods all at once, once the unique_inner has been determined for that inner rel, and the restrict info for whatever the outer rel(s) are. I didn't expect that to happen twice, so I'm not all that sure if a cache will get reused... ... thinks more ... Perhaps maybe if rel x was found to be unique with the join conditions of rel y, then we can be sure that x is unique against y, z, as that could only possible add more to the join condition, never less. Is this what you meant? ... I'll look into that. The other thing I thought of was to add a dedicated list for unique indexes in RelOptInfo, this would also allow rel_supports_distinctness() to do something a bit smarter than just return false if there's no indexes. That might not buy us much though, but at least relations tend to have very little unique indexes, even when they have lots of indexes. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: