Thread: Fixing Grittner's planner issues
I looked into the planning problems in HEAD discussed here: http://archives.postgresql.org/pgsql-hackers/2009-02/msg00120.php As I suspected, there are really two independent bugs there, though both are in the new semi/antijoin planning code. The first problem is the one already mentioned: cost_mergejoin fails to penalize semi/anti joins for nonunique merge keys, although nonunique keys require rescanning of parts of the inner relation and can result in huge runtime increases. After unsuccessfully fooling with some quick and dirty fixes, I remembered why I'd tried to dodge the problem: the correct way to estimate the amount of rescanning done involves estimating the merge condition selectivity as if it were a plain inner join condition, not a semi or anti join. We have code to do that, certainly, but this means that we will sometimes demand the selectivity of the same RestrictInfo under JOIN_INNER rules and sometimes under SEMI or ANTI rules. And there's only space to cache one selectivity there. And we *really* need to cache both because cost_mergejoin tends to get done a large number of times in a complex join problem. The only good solution I can see is to add another selectivity cache field to RestrictInfo so that we can cache both the JOIN_INNER selectivity and the "native" selectivity of the qual's context. This is slightly annoying from a space consumption point of view, but it's not fatal. The second problem is specific to the particular structure of Kevin's query: SELECT ... FROM "Case" "C" LEFT OUTER JOIN "CaseDispo" "CD" ON ("CD"."caseNo" = "C"."caseNo") AND ("CD"."countyNo" = "C"."countyNo") AND (NOT (EXISTS (SELECT 1 FROM "CaseDispo" "CD2" WHERE ("CD2"."caseNo" = "CD"."caseNo") AND ("CD2"."countyNo" = "CD"."countyNo") AND ("CD2"."dispoDate"> "CD"."dispoDate")))) WHERE some-clause-that-selects-just-a-few-C-rows that is, the EXISTS clause is part of the ON condition of an outer join. If it referred to any variables of the left side of the upper join (ie, "C" here) then we couldn't convert it to a separate join at all. Since it refers only to "CD", it's semantically legitimate to push it down into the right-hand side of the outer join, and then we can convert it to a lower outer join. The problem is that having done so, the resulting anti join doesn't commute with the upper outer join. So we're forced to evaluate it first, and in this particular example this results in a much worse plan than leaving it as a SubLink would produce. A SubLink isn't evaluated "after" the upper join, exactly, but from a performance point of view we get that effect because it only gets evaluated after all the other join conditions have succeeded for a particular pair of C and CD rows. In this case we only need the answer for a few C/CD rows so that's a win. (If we needed the answer for most CD rows, it's not a win; but I don't think we can optimize that case at the expense of getting miserably slower for cases we used to be good at.) I think the only fix for this that's practical in the 8.4 time frame is to give up trying to flatten EXISTS/NOT EXISTS that occur within the ON condition of an outer join, ie, revert to 8.3's level of intelligence about this case. It seems like a general purpose solution would involve somehow being able to separate the semantic effects of an outer join (in particular, null-insertion) from the time at which it's performed, and I've really got no idea how to do that or what consequences it would have for both the planner and the executor. Reflecting on this further, I suspect there are also some bugs in the planner's rules about when semi/antijoins can commute with other joins; but that's not directly causing Kevin's problem, because the rules do make the correct decision for this case. So I've got a few "must fix" items here, but before I go off to start doing that, I wondered if anyone had any comments or better ideas. regards, tom lane
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: > SELECT ... FROM > "Case" "C" > LEFT OUTER JOIN "CaseDispo" "CD" > ON ("CD"."caseNo" = "C"."caseNo") AND ("CD"."countyNo" = "C"."countyNo") > AND (NOT (EXISTS (SELECT 1 FROM "CaseDispo" "CD2" > WHERE ("CD2"."caseNo" = "CD"."caseNo") > AND ("CD2"."countyNo" = "CD"."countyNo") > AND ("CD2"."dispoDate" > "CD"."dispoDate")))) > WHERE some-clause-that-selects-just-a-few-C-rows > > that is, the EXISTS clause is part of the ON condition of an outer join. > If it referred to any variables of the left side of the upper join > (ie, "C" here) then we couldn't convert it to a separate join at all. > I wondered if anyone had any comments The only thing that comes to mind for me that seems possibly helpful is that we have typically considered it obvious that in the context of the NOT EXISTS clause we have already established that ("CD"."caseNo" = "C"."caseNo") AND ("CD"."countyNo" = "C"."countyNo") and have not been at all consistent about whether we used C or CD to compare to CD2. Our operating assumption has been that it didn't matter in that context. -Kevin
"Kevin Grittner" <Kevin.Grittner@wicourts.gov> writes: > Tom Lane <tgl@sss.pgh.pa.us> wrote: >> If it referred to any variables of the left side of the upper join >> (ie, "C" here) then we couldn't convert it to a separate join at > all. > The only thing that comes to mind for me that seems possibly helpful > is that we have typically considered it obvious that in the context of > the NOT EXISTS clause we have already established that ("CD"."caseNo" > = "C"."caseNo") AND ("CD"."countyNo" = "C"."countyNo") and have not > been at all consistent about whether we used C or CD to compare to > CD2. Our operating assumption has been that it didn't matter in that > context. Yeah, I had thought about suggesting that you could use that to determine which type of plan got chosen, but immediately dismissed it as too bletcherous for words. In any case the same type of problem could occur in scenarios where the upper and lower join conditions aren't so neatly related. regards, tom lane
> I think the only fix for this that's practical in the 8.4 time frame is > to give up trying to flatten EXISTS/NOT EXISTS that occur within the ON > condition of an outer join, ie, revert to 8.3's level of intelligence > about this case. It seems like a general purpose solution would involve > somehow being able to separate the semantic effects of an outer join > (in particular, null-insertion) from the time at which it's performed, > and I've really got no idea how to do that or what consequences it would > have for both the planner and the executor. I think that A LEFT JOIN (B ANTI JOIN C) is equivalent to (A LEFT JOIN B) LEFT JOIN (UNIQUE C) if you rewrite each attribute provided by b as CASE WHEN (no matching tuple found in C) THEN b.attribute END. On a related note, I have some vague unease about planning A SEMI JOIN B as A INNER JOIN (UNIQUE B), as make_one_rel currently attempts to do. For a merge join or nested loop, I don't see how this can ever be a win over teaching the executor to just not rescan B. For a hash join, it can be a win if B turns out to have duplicates, but then again you could also just teach the executor to skip the insertion of the duplicate into the table in the first place (it has to hash 'em anyway...). I think maybe I'm not understanding something about the logic here. It seems like you might want some infrastructure for cases where we know we are just probing the inner side of the relation for a match. If you find at least one, you either make a decision as to whether or not to filter the tuple (regular semi/anti-join) or whether or not to force certain fields to NULL (semi/anti-join under ON-clause). But all these cases can share the we-don't-care-about-multiple-inner-matches logic. > Reflecting on this further, I suspect there are also some bugs in the > planner's rules about when semi/antijoins can commute with other joins; > but that's not directly causing Kevin's problem, because the rules do > make the correct decision for this case. One thing I notice is that src/backend/optimizer/README should probably be updated with the rules for commuting SEMI and ANTI joins; it currently only mentions INNER, LEFT, RIGHT, and FULL. ...Robert
[ forgot to respond to this earlier, sorry ] Robert Haas <robertmhaas@gmail.com> writes: > On a related note, I have some vague unease about planning A SEMI JOIN > B as A INNER JOIN (UNIQUE B), as make_one_rel currently attempts to > do. For a merge join or nested loop, I don't see how this can ever be > a win over teaching the executor to just not rescan B. For a hash > join, it can be a win if B turns out to have duplicates, but then > again you could also just teach the executor to skip the insertion of > the duplicate into the table in the first place (it has to hash 'em > anyway...). I think maybe I'm not understanding something about the > logic here. The case where this is a win is where B is small (say a few rows) and not unique, and A is large, and there's a relevant index on A. Then considering this join approach lets us produce a plan that looks like NestLoop HashAggregate (or GroupAggregate) Scan B IndexScan A Index Condition : A.x = B.y Every other possible plan for this join involves reading all of A. If B produces too many rows for the nestloop indexscan to be a win, then one of the other join approaches will beat out this one in the cost comparisons. > One thing I notice is that src/backend/optimizer/README should > probably be updated with the rules for commuting SEMI and ANTI joins; > it currently only mentions INNER, LEFT, RIGHT, and FULL. Yeah, I noticed that too. How embarrassing. Will fix it as part of the patch, which I hope to start on tomorrow. regards, tom lane
On Wed, Feb 11, 2009 at 8:24 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ forgot to respond to this earlier, sorry ] Thanks for responding now. > Robert Haas <robertmhaas@gmail.com> writes: >> On a related note, I have some vague unease about planning A SEMI JOIN >> B as A INNER JOIN (UNIQUE B), as make_one_rel currently attempts to >> do. For a merge join or nested loop, I don't see how this can ever be >> a win over teaching the executor to just not rescan B. For a hash >> join, it can be a win if B turns out to have duplicates, but then >> again you could also just teach the executor to skip the insertion of >> the duplicate into the table in the first place (it has to hash 'em >> anyway...). I think maybe I'm not understanding something about the >> logic here. > > The case where this is a win is where B is small (say a few rows) and > not unique, and A is large, and there's a relevant index on A. Then > considering this join approach lets us produce a plan that looks like > > NestLoop > HashAggregate (or GroupAggregate) > Scan B > IndexScan A > Index Condition : A.x = B.y Right, so maybe I wasn't as clear as I could have been in asking the question. I do understand how it can be a win to unique B and use it as the OUTER relation (jointype JOIN_UNIQUE_OUTER). What I don't understand is how it can ever be a win to unique B and use it as the INNER relation (jointype JOIN_UNIQUE_INNER). >> One thing I notice is that src/backend/optimizer/README should >> probably be updated with the rules for commuting SEMI and ANTI joins; >> it currently only mentions INNER, LEFT, RIGHT, and FULL. > > Yeah, I noticed that too. How embarrassing. Will fix it as part of > the patch, which I hope to start on tomorrow. Cool. On the topic of documentation, I find the following comment in joinrels.c rather impenetrable: /* * Do these steps only if we actually have a regular semijoin, * as opposed to a case where we should unique-ify the RHS. */ I don't think "regular semijoin" is a term of art, so I'm somewhat at a loss to understand what this means. And why "as opposed to" a case where we should unique-ify the RHS? ISTM the code will sometimes try both... ...Robert
[ back to planner stuff after a hiatus ] Robert Haas <robertmhaas@gmail.com> writes: > Right, so maybe I wasn't as clear as I could have been in asking the > question. I do understand how it can be a win to unique B and use it > as the OUTER relation (jointype JOIN_UNIQUE_OUTER). What I don't > understand is how it can ever be a win to unique B and use it as the > INNER relation (jointype JOIN_UNIQUE_INNER). Hmm, well, maybe B is *really* nonunique and unique'ifying it makes it small enough to fit in a single-batch hash table? Also, seriously nonunique RHS data is pretty awful for mergejoining (too much rescanning) so I could imagine wanting to do it for a mergejoin too. regards, tom lane
[ after re-reading the code a bit ] Robert Haas <robertmhaas@gmail.com> writes: > Cool. On the topic of documentation, I find the following comment in > joinrels.c rather impenetrable: > /* > * Do these steps only if we actually have a > regular semijoin, > * as opposed to a case where we should > unique-ify the RHS. > */ The point here is that a semijoin ordinarily requires forming the whole LHS relation (ie, min_lefthand) before applying the special join rule. However, if you unique-ify the RHS then it's a regular inner join and you don't have to obey that restriction, ie, you can join to just part of min_lefthand. Now ordinarily that's not an amazingly good idea but there are important special cases where it is. IIRC the case that motivated this was SELECT FROM a, b WHERE (a.x, b.y) IN (SELECT c1, c2 FROM c) If you do this as a semijoin then you are forced to form the cartesian product of a*b before you can semijoin to c. If you uniqueify c then you can join it to a first and then b using regular joins (possibly indexscans on a.x and then b.y), or b and then a. So join_is_legal allows such a join order, and the code in make_join_rel has to be careful not to claim that "a semijoin c" is a valid way of forming that join. I'll change the comment. Does this help? /* * We might have a normal semijoin, or a case where we don't have * enough rels to dothe semijoin but can unique-ify the RHS and * then do an innerjoin. In the latter case we can't apply * JOIN_SEMI joining. */ regards, tom lane
On Thu, Feb 19, 2009 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ after re-reading the code a bit ] > > Robert Haas <robertmhaas@gmail.com> writes: >> Cool. On the topic of documentation, I find the following comment in >> joinrels.c rather impenetrable: > >> /* >> * Do these steps only if we actually have a >> regular semijoin, >> * as opposed to a case where we should >> unique-ify the RHS. >> */ > > The point here is that a semijoin ordinarily requires forming the whole > LHS relation (ie, min_lefthand) before applying the special join rule. > However, if you unique-ify the RHS then it's a regular inner join and > you don't have to obey that restriction, ie, you can join to just part > of min_lefthand. Now ordinarily that's not an amazingly good idea but > there are important special cases where it is. IIRC the case that > motivated this was > > SELECT FROM a, b WHERE (a.x, b.y) IN (SELECT c1, c2 FROM c) > > If you do this as a semijoin then you are forced to form the cartesian > product of a*b before you can semijoin to c. If you uniqueify c > then you can join it to a first and then b using regular joins (possibly > indexscans on a.x and then b.y), or b and then a. > > So join_is_legal allows such a join order, and the code in make_join_rel > has to be careful not to claim that "a semijoin c" is a valid way of > forming that join. Gotcha. > I'll change the comment. Does this help? > > /* > * We might have a normal semijoin, or a case where we don't have > * enough rels to do the semijoin but can unique-ify the RHS and > * then do an innerjoin. In the latter case we can't apply > * JOIN_SEMI joining. > */ It's an improvement, but your example above is so helpful in understanding what is going on here that it might be worth explicitly mentioning it in the comment, maybe something like this: /** In a case like the following, we don't have enough rels to plan this as a semijoin,* but we don't give up completely, because it might be possible to unique-ify the* RHS and perform part of the join at this level.** SELECT FROM a, b WHERE (a.x, b.y) IN (SELECT c1, c2 FROMc)*/ ...Robert
On Thu, Feb 19, 2009 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > [ back to planner stuff after a hiatus ] > > Robert Haas <robertmhaas@gmail.com> writes: >> Right, so maybe I wasn't as clear as I could have been in asking the >> question. I do understand how it can be a win to unique B and use it >> as the OUTER relation (jointype JOIN_UNIQUE_OUTER). What I don't >> understand is how it can ever be a win to unique B and use it as the >> INNER relation (jointype JOIN_UNIQUE_INNER). > > Hmm, well, maybe B is *really* nonunique and unique'ifying it makes it > small enough to fit in a single-batch hash table? > > Also, seriously nonunique RHS data is pretty awful for mergejoining > (too much rescanning) so I could imagine wanting to do it for a > mergejoin too. Well, as I wrote upthread: # For a merge join or nested loop, I don't see how this can ever be a win over teaching the executor to just not rescan B. For a hash # join, it can be a win if B turns out to have duplicates, but then again you could also just teach the executor to skip the insertion of # the duplicate into the table in the first place (it has to hash 'em anyway...). A nestjoin seems like the clearest example. If the inner path is not unique and not sorted, you'll need to either sort it or hash it. That seems like a lot of trouble considering that you could just scan the unsorted inner path until you hit the first match, and then move on to the next outer tuple (and I think this is pretty much what JOIN_SEMI does anyway). If the inner path is not unique but does happen to be sorted, then unique-ifying should be cheap, but I would think it would still be faster to do it the JOIN_SEMI way rather than insert a separate unique node to do basically the same work. If add a materialize node for the inner path after you unique-ify it, you might reduce the number of tuples by enough to pay for the unique-ify and materialize steps, but if that's the case you should probably be doing it that way for JOIN_SEMI too. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 19, 2009 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> [ back to planner stuff after a hiatus ] > Well, as I wrote upthread: What you're actually suggesting is modifying the executor to incorporate the unique-fication logic into hashjoin and/or mergejoin. Maybe, but that code is way too complex already for my taste (especially mergejoin) and what we'd save is, hmm, four lines in the planner. regards, tom lane
On Thu, Feb 19, 2009 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Feb 19, 2009 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> [ back to planner stuff after a hiatus ] > >> Well, as I wrote upthread: > > What you're actually suggesting is modifying the executor to incorporate > the unique-fication logic into hashjoin and/or mergejoin. Maybe, but > that code is way too complex already for my taste (especially mergejoin) > and what we'd save is, hmm, four lines in the planner. What you'd save is planning time. ...Robert
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Feb 19, 2009 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I'll change the comment. Does this help? > It's an improvement, but your example above is so helpful in > understanding what is going on here that it might be worth explicitly > mentioning it in the comment, maybe something like this: Done, see http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/joinrels.c?r1=1.97&r2=1.98 regards, tom lane
On Thu, Feb 19, 2009 at 3:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Feb 19, 2009 at 1:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I'll change the comment. Does this help? > >> It's an improvement, but your example above is so helpful in >> understanding what is going on here that it might be worth explicitly >> mentioning it in the comment, maybe something like this: > > Done, see > http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/optimizer/path/joinrels.c?r1=1.97&r2=1.98 Thanks, that's much clearer IMO. ...Robert
On Thu, Feb 19, 2009 at 7:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Thu, Feb 19, 2009 at 1:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> [ back to planner stuff after a hiatus ] > >> Well, as I wrote upthread: > > What you're actually suggesting is modifying the executor to incorporate > the unique-fication logic into hashjoin and/or mergejoin. Maybe, but > that code is way too complex already for my taste (especially mergejoin) > and what we'd save is, hmm, four lines in the planner. I'm not entirely following the implications for semijoins but I know I've noticed more than a few cases where an option to Hash to only gather unique values seems like it would be a win. Consider cases like this where we hash the values twice: postgres=# explain select * from generate_series(1,1000) as a(i) where i in (select * from generate_series(1,100) as b(i)); QUERY PLAN --------------------------------------------------------------------------------------------Hash Join (cost=19.50..45.75rows=1000 width=4) Hash Cond: (a.i = b.i) -> Function Scan on generate_series a (cost=0.00..12.50 rows=1000width=4) -> Hash (cost=17.00..17.00 rows=200 width=4) -> HashAggregate (cost=15.00..17.00 rows=200 width=4) -> Function Scan on generate_series b (cost=0.00..12.50 rows=1000 width=4) (6 rows) It's tempting to have Hash cheat and just peek at the node beneath it to see if it's a HashAggregate, in which case it could call a special method to request the whole hash. But it would have to know that it's just a plain uniquify and not implementing a GROUP BY. -- greg
Greg Stark <stark@enterprisedb.com> writes: > It's tempting to have Hash cheat and just peek at the node beneath it > to see if it's a HashAggregate, in which case it could call a special > method to request the whole hash. But it would have to know that it's > just a plain uniquify and not implementing a GROUP BY. More to the point, it would have to check if it's unique-ifying on the same columns and with the same operators as the upper hash is using. If we were going to do something like this, making it a real option to the Hash node and teaching the planner about that would be *much* easier, and would also allow saner cost estimation. I agree that doing something like this on the inner side of a hashjoin might not be too unreasonable --- it was the mergejoin case that really seemed ugly when I thought about it. regards, tom lane
On Thu, Feb 19, 2009 at 4:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Stark <stark@enterprisedb.com> writes: >> It's tempting to have Hash cheat and just peek at the node beneath it >> to see if it's a HashAggregate, in which case it could call a special >> method to request the whole hash. But it would have to know that it's >> just a plain uniquify and not implementing a GROUP BY. > > More to the point, it would have to check if it's unique-ifying on the > same columns and with the same operators as the upper hash is using. > If we were going to do something like this, making it a real option to > the Hash node and teaching the planner about that would be *much* > easier, and would also allow saner cost estimation. > > I agree that doing something like this on the inner side of a hashjoin > might not be too unreasonable --- it was the mergejoin case that really > seemed ugly when I thought about it. Hmm, for some reason I thought hash join would be the harder case (since the logic to de-dupe the hash table would be all new). In the merge-join and nest-join cases, isn't this pretty much what JOIN_SEMI already does? ...Robert