Thread: IN joining
Hi I have a problem understanding the code to make certain in join are performed properly. Specifically I have problems understading when IN_UNIQUE_{INNER,OUTER} is a valid jointype. Its in joinrels.c:make_join_rel. Consider this example: SELECT * FROM a,b WHERE a.id = b.id AND (a.id) IN (SELECT c.id FROM c) the possible execution trees are {{a,b}, {c}}, {{a,c},{b}} and the code seems to also permit {{b,c},{a}}. It is the latter I'm having problems with. When joining {b} and {c} it will fall through and suggest a IN_UNIQUE_{INNER,OUTER} jointype. My logic is this: {c} \in {c,b} so it is a valid plan according to the first check. We have an issue according to the second check and we haven't done the work before according to the 3rd and 4th checks. Since the lefthand of the IN {a} is not in either {b} or {c} we skip the IN_JOIN{_REVERSE}. But since one of the relations is equal to the right side {c} of the IN we determine that IN_UNIQUE_{INNER,OUTER} is a valid jointype. Now, the next join between {a} and {b,c} is the one I fail to understand when it can ever happen... {c} \in {a,b,c} so it is a valid plan according to the first check. We have an issue according to the second check. Since we have no trace of the IN's left hand {a} in {b,c} 3rd and 4th check says we have not done the work?!? The final checks fail because {c} != {b,c}, thus we determine it is an invalid plan. My question is: When is it ever a valid jointype to use IN_UNIQUE_{INNER,OUTER}? Or am I missing something? -- Dennis
Dennis Haney <davh@diku.dk> writes: > Consider this example: > SELECT * FROM a,b WHERE a.id = b.id AND (a.id) IN (SELECT c.id FROM c) > the possible execution trees are {{a,b}, {c}}, {{a,c},{b}} and the code > seems to also permit {{b,c},{a}}. No, it does not --- as you say, that would give wrong answers. That case is eliminated by the tests following this comment: * JOIN_IN technique will work if outerrel includes LHS and * innerrel is exactly RHS; conversely JOIN_REVERSE_INhandles * RHS/LHS. * * JOIN_UNIQUE_OUTER will work if outerrel is exactlyRHS; * conversely JOIN_UNIQUE_INNER will work if innerrel is * exactly RHS. Joining {b,c} to {a} does not meet any of those four allowed cases. regards, tom lane
Tom Lane wrote: <blockquote cite="mid11205.1078520699@sss.pgh.pa.us" type="cite"><pre wrap="">Dennis Haney <a class="moz-txt-link-rfc2396E"href="mailto:davh@diku.dk"><davh@diku.dk></a> writes: </pre><blockquote type="cite"><prewrap="">Consider this example: SELECT * FROM a,b WHERE a.id = b.id AND (a.id) IN (SELECT c.id FROM c) the possible execution trees are {{a,b}, {c}}, {{a,c},{b}} and the code seems to also permit {{b,c},{a}}. </pre></blockquote><pre wrap=""> No, it does not --- as you say, that would give wrong answers. That case is eliminated by the tests following this comment: * JOIN_IN technique will work if outerrel includes LHS and * innerrel is exactly RHS; conversely JOIN_REVERSE_INhandles * RHS/LHS. * * JOIN_UNIQUE_OUTER will work if outerrel is exactlyRHS; * conversely JOIN_UNIQUE_INNER will work if innerrel is * exactly RHS. Joining {b,c} to {a} does not meet any of those four allowed cases. </pre></blockquote> Exactly my point... So why ever bothercreating the {b,c} node which is legal by the above definition?<br /><br /><br /><pre class="moz-signature" cols="72">-- Dennis use Inline C => q{void p(char*g){ printf("Just Another %s Hacker\n",g);}};p("Perl"); </pre>
Dennis Haney <davh@diku.dk> writes: >> Joining {b,c} to {a} does not meet any of those four allowed cases. >> > Exactly my point... So why ever bother creating the {b,c} node which is > legal by the above definition? We don't, because there is no such join clause. regards, tom lane
Tom Lane wrote: <blockquote cite="mid12170.1078528135@sss.pgh.pa.us" type="cite"><pre wrap="">Dennis Haney <a class="moz-txt-link-rfc2396E"href="mailto:davh@diku.dk"><davh@diku.dk></a> writes: </pre><blockquote type="cite"><blockquotetype="cite"><pre wrap="">Joining {b,c} to {a} does not meet any of those four allowed cases. </pre></blockquote><pre wrap="">Exactly my point... So why ever bother creating the {b,c} node which is legal by the above definition? </pre></blockquote><pre wrap=""> We don't, because there is no such join clause. </pre></blockquote> No, but we create the equality via the implied equality mechanism...<br /><br /> select * from a, bwhere a.id = b.id3 and a.id in (select c.id2 from c);<br /><br /> rtable is (after in-optimization):<br /> resno refname relid inFromCl<br /> ----- --------- ----- --------<br /> 1 a 17143 inFromCl<br/> 2 b 17151 inFromCl<br /> 3 IN_subquery [subquery]<br /> 4 c 17147 inFromCl<br /><br /> in gdb:<br /> break joinrels.c:563<br /> commands<br /> call bms_is_subset(ininfo->lefthand,rel1->relids)<br /> call bms_equal(ininfo->righthand, rel2->relids)<br /> callbms_is_subset(ininfo->lefthand, rel2->relids)<br /> call bms_equal(ininfo->righthand, rel1->relids)<br />x/t rel1->relids.words<br /> x/t rel2->relids.words<br /> x/t joinrelids.words<br /> p jointype<br /> printf "%s\n",pretty_format_node_dump(nodeToString(((RestrictInfo*)((RestrictInfo*)restrictlist)->clause)->clause))<br />end<br /><br /> then we get this join:<br /><br /> Breakpoint 4, make_join_rel (root=0x8307bc8, rel1=0x8316920, rel2=0x8316b10,jointype=JOIN_UNIQUE_INNER)<br /> at joinrels.c:563<br /> 563 switch (jointype)<br /> $92= 0 '\0'<br /> $93 = 1 '\001'<br /> $94 = 0 '\0'<br /> $95 = 0 '\0'<br /> 0x83169ac: 00000000000000000000000000000100<br/> 0x8316b9c: 00000000000000000000000000010000<br /> 0x832670c: 00000000000000000000000000010100<br/> $96 = JOIN_UNIQUE_INNER<br /> {OPEXPR<br /> :opno 96<br /> :opfuncid 0<br/> :opresulttype 16<br /> :opretset false<br /> :args (<br /> {VAR<br /> :varno 4<br /> :varattno1<br /> :vartype 23<br /> :vartypmod -1<br /> :varlevelsup 0<br /> :varnoold 4<br /> :varoattno 1<br /> }<br /><br /> {VAR<br /> :varno 2<br /> :varattno 1<br /> :vartype23<br /> :vartypmod -1<br /> :varlevelsup 0<br /> :varnoold 2<br /> :varoattno 1<br /> }<br /> )<br /> }<br /><br /><br /><pre class="moz-signature" cols="72">-- Dennis </pre>
Dennis Haney <davh@diku.dk> writes: >>> Exactly my point... So why ever bother creating the {b,c} node which is >>> legal by the above definition? >> >> We don't, because there is no such join clause. >> > No, but we create the equality via the implied equality mechanism... > select * from a, b where a.id = b.id3 and a.id in (select c.id2 from c); Oh, I had forgotten that your original example involved an implied equality. I don't see that anything is wrong though. The join path that will result from considering the implied equality will be like ((UNIQUE-ified subselect) INNER JOIN b) INNER JOIN a which is perfectly legal and perhaps even a winner. Once you stick a UNIQUE on top of the IN's subselect, you can treat the IN as exactly like a plain equality join. [ thinks a bit... ] Actually I guess there is a problem here: we won't actually generate that plan, because this test is too strict: /* * If we already joined IN's RHS to any part of its LHS in * either input path, then thisjoin is not constrained (the * necessary work was done at a lower level). */ if (bms_overlap(ininfo->lefthand,rel1->relids) && bms_is_subset(ininfo->righthand, rel1->relids)) continue; if (bms_overlap(ininfo->lefthand, rel2->relids) && bms_is_subset(ininfo->righthand, rel2->relids)) continue; I think it should be /* * If we already joined IN's RHS to anything else in * either input path, then this joinis not constrained (the * necessary work was done at a lower level). */ if (bms_is_subset(ininfo->righthand,rel1->relids) && !bms_equal(ininfo->righthand, rel1->relids)) continue; if (bms_is_subset(ininfo->righthand, rel2->relids) && !bms_equal(ininfo->righthand, rel2->relids)) continue; Comments? regards, tom lane
Tom Lane wrote: [SNIP: a repetion of my first post ;) ] >I think it should be > > /* > * If we already joined IN's RHS to anything else in > * either input path, then this join is not constrained (the > * necessary work was done at a lower level). > */ > if (bms_is_subset(ininfo->righthand, rel1->relids) && > !bms_equal(ininfo->righthand, rel1->relids)) > continue; > if (bms_is_subset(ininfo->righthand, rel2->relids) && > !bms_equal(ininfo->righthand, rel2->relids)) > continue; > >Comments? > > It's good. It was pretty much what I was thinking was wrong to begin with. Whether the generated plans are valid is a different issue ;) -- Dennis