Thread: Why could GEQO produce plans with lower costs than thestandard_join_search?
Hi, I was expecting the plans generated by standard_join_search to have lower costs than the plans from GEQO. But after the results I have from a join order benchmark show that GEQO produces plans with lower costs most of the time! I wonder what is causing this observation? From my understanding, standard_join_search is doing a complete search. So I'm not sure how the GEQO managed to do better than that. Thank you, Donald Dong
Re: Why could GEQO produce plans with lower costs than the standard_join_search?
From
Tom Lane
Date:
Donald Dong <xdong@csumb.edu> writes: > I was expecting the plans generated by standard_join_search to have lower costs > than the plans from GEQO. But after the results I have from a join order > benchmark show that GEQO produces plans with lower costs most of the time! > I wonder what is causing this observation? From my understanding, > standard_join_search is doing a complete search. So I'm not sure how the GEQO > managed to do better than that. standard_join_search is *not* exhaustive; there's a heuristic that causes it not to consider clauseless joins unless it has to. For the most part, GEQO uses the same heuristic (cf desirable_join()), but given the right sort of query shape you can probably trick it into situations where it will be forced to use a clauseless join when the core code wouldn't. It'd still be surprising for that to come out with a lower cost estimate than a join order that obeys the heuristic, though. Clauseless joins are generally pretty awful. I'm a tad suspicious about the representativeness of your benchmark queries if you find this is happening "most of the time". regards, tom lane
Re: Why could GEQO produce plans with lower costs than thestandard_join_search?
From
Donald Dong
Date:
Hi, Thank you very much for the explanation! I think the join order benchmark I used [1] is somewhat representative, however, I probably didn't use the most accurate cost estimation. I find the cost from cheapest_total_path->total_cost is different from the cost from queryDesc->planstate->total_cost. What I saw was that GEQO tends to form paths with lower cheapest_total_path->total_cost (aka the fitness of the children). However, standard_join_search is more likely to produce a lower queryDesc->planstate->total_cost, which is the cost we get using explain. I wonder why those two total costs are different? If the total_cost from the planstate is more accurate, could we use that instead as the fitness in geqo_eval? [1] https://github.com/gregrahn/join-order-benchmark Regards, Donald Dong > On May 7, 2019, at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Donald Dong <xdong@csumb.edu> writes: >> I was expecting the plans generated by standard_join_search to have lower costs >> than the plans from GEQO. But after the results I have from a join order >> benchmark show that GEQO produces plans with lower costs most of the time! > >> I wonder what is causing this observation? From my understanding, >> standard_join_search is doing a complete search. So I'm not sure how the GEQO >> managed to do better than that. > > standard_join_search is *not* exhaustive; there's a heuristic that causes > it not to consider clauseless joins unless it has to. > > For the most part, GEQO uses the same heuristic (cf desirable_join()), > but given the right sort of query shape you can probably trick it into > situations where it will be forced to use a clauseless join when the > core code wouldn't. It'd still be surprising for that to come out with > a lower cost estimate than a join order that obeys the heuristic, > though. Clauseless joins are generally pretty awful. > > I'm a tad suspicious about the representativeness of your benchmark > queries if you find this is happening "most of the time". > > regards, tom lane
Re: Why could GEQO produce plans with lower costs than the standard_join_search?
From
Tom Lane
Date:
Donald Dong <xdong@csumb.edu> writes: > I find the cost from cheapest_total_path->total_cost is different > from the cost from queryDesc->planstate->total_cost. What I saw was > that GEQO tends to form paths with lower > cheapest_total_path->total_cost (aka the fitness of the children). > However, standard_join_search is more likely to produce a lower > queryDesc->planstate->total_cost, which is the cost we get using > explain. > I wonder why those two total costs are different? If the total_cost > from the planstate is more accurate, could we use that instead as the > fitness in geqo_eval? You're still asking us to answer hypothetical questions unsupported by evidence. In what case does that really happen? regards, tom lane
Re: Why could GEQO produce plans with lower costs than thestandard_join_search?
From
Donald Dong
Date:
On May 22, 2019, at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Donald Dong <xdong@csumb.edu> writes: >> I find the cost from cheapest_total_path->total_cost is different >> from the cost from queryDesc->planstate->total_cost. What I saw was >> that GEQO tends to form paths with lower >> cheapest_total_path->total_cost (aka the fitness of the children). >> However, standard_join_search is more likely to produce a lower >> queryDesc->planstate->total_cost, which is the cost we get using >> explain. > >> I wonder why those two total costs are different? If the total_cost >> from the planstate is more accurate, could we use that instead as the >> fitness in geqo_eval? > > You're still asking us to answer hypothetical questions unsupported > by evidence. In what case does that really happen? Hi, My apologies if this is not the minimal necessary set up. But here's more information about what I saw using the following query (JOB/1a.sql): SELECT MIN(mc.note) AS production_note, MIN(t.title) AS movie_title, MIN(t.production_year) AS movie_year FROM company_type AS ct, info_type AS it, movie_companies AS mc, movie_info_idx AS mi_idx, title AS t WHERE ct.kind = 'production companies' AND it.info = 'top 250 rank' AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%' AND (mc.note LIKE '%(co-production)%' OR mc.note LIKE '%(presents)%') AND ct.id = mc.company_type_id AND t.id = mc.movie_id AND t.id = mi_idx.movie_id AND mc.movie_id = mi_idx.movie_id AND it.id = mi_idx.info_type_id; I attached the query plan and debug_print_rel output for GEQO and standard_join_search. planstate->total_cost cheapest_total_path GEQO 54190.13 54239.03 STD 54179.02 54273.73 Here I observe GEQO produces a lower cheapest_total_path->total_cost, but its planstate->total_cost is higher than what standard_join_search produces. Regards, Donald Dong
Attachment
Re: Why could GEQO produce plans with lower costs than thestandard_join_search?
From
"Finnerty, Jim"
Date:
Fwiw, I had an intern do some testing on the JOB last year, and he reported that geqo sometimes produced plans of lower costthan the standard planner (we were on PG10 at the time). I filed it under "unexplained things that we need to investigatewhen we have time", but alas... In any case, Donald isn't the only one who has noticed this behavior. On 5/22/19, 3:54 PM, "Donald Dong" <xdong@csumb.edu> wrote: On May 22, 2019, at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Donald Dong <xdong@csumb.edu> writes: >> I find the cost from cheapest_total_path->total_cost is different >> from the cost from queryDesc->planstate->total_cost. What I saw was >> that GEQO tends to form paths with lower >> cheapest_total_path->total_cost (aka the fitness of the children). >> However, standard_join_search is more likely to produce a lower >> queryDesc->planstate->total_cost, which is the cost we get using >> explain. > >> I wonder why those two total costs are different? If the total_cost >> from the planstate is more accurate, could we use that instead as the >> fitness in geqo_eval? > > You're still asking us to answer hypothetical questions unsupported > by evidence. In what case does that really happen? Hi, My apologies if this is not the minimal necessary set up. But here's more information about what I saw using the following query (JOB/1a.sql): SELECT MIN(mc.note) AS production_note, MIN(t.title) AS movie_title, MIN(t.production_year) AS movie_year FROM company_type AS ct, info_type AS it, movie_companies AS mc, movie_info_idx AS mi_idx, title AS t WHERE ct.kind = 'production companies' AND it.info = 'top 250 rank' AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%' AND (mc.note LIKE '%(co-production)%' OR mc.note LIKE '%(presents)%') AND ct.id = mc.company_type_id AND t.id = mc.movie_id AND t.id = mi_idx.movie_id AND mc.movie_id = mi_idx.movie_id AND it.id = mi_idx.info_type_id; I attached the query plan and debug_print_rel output for GEQO and standard_join_search. planstate->total_cost cheapest_total_path GEQO 54190.13 54239.03 STD 54179.02 54273.73 Here I observe GEQO produces a lower cheapest_total_path->total_cost, but its planstate->total_cost is higher than what standard_join_search produces. Regards, Donald Dong
Re: Why could GEQO produce plans with lower costs than the standard_join_search?
From
Andrew Gierth
Date:
>>>>> "Finnerty" == Finnerty, Jim <jfinnert@amazon.com> writes: Finnerty> planstate-> total_cost cheapest_total_path Finnerty> GEQO 54190.13 54239.03 Finnerty> STD 54179.02 54273.73 These differences aren't significant - the standard join search has a "fuzz factor" built into it, such that paths have to be more than 1% better in cost in order to actually be considered as being better than an existing path. -- Andrew (irc:RhodiumToad)
Re: Why could GEQO produce plans with lower costs than the standard_join_search?
From
Tom Lane
Date:
Donald Dong <xdong@csumb.edu> writes: > On May 22, 2019, at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> You're still asking us to answer hypothetical questions unsupported >> by evidence. In what case does that really happen? > I attached the query plan and debug_print_rel output for GEQO and > standard_join_search. > planstate->total_cost cheapest_total_path > GEQO 54190.13 54239.03 > STD 54179.02 54273.73 > Here I observe GEQO produces a lower > cheapest_total_path->total_cost, but its planstate->total_cost is higher > than what standard_join_search produces. Well, (1) the plan selected by GEQO is in fact more expensive than the one found by the standard search. Not by much --- as Andrew observes, this difference is less than what the planner considers "fuzzily the same" --- but nonetheless 54190.13 > 54179.02. (2) the paths you show do not correspond to the finally selected plans --- they aren't even the same shape. (The Gathers are in different places, to start with.) I'm not sure where you were capturing the path data, but it looks like you missed top-level parallel-aggregation planning, and that managed to find some plans that were marginally cheaper than the ones you captured. Keep in mind that GEQO only considers join planning, not grouping/aggregation. Andrew's point about fuzzy cost comparison is also a good one, though we needn't invoke it to explain these particular numbers. regards, tom lane
Re: Why could GEQO produce plans with lower costs than thestandard_join_search?
From
Donald Dong
Date:
On May 23, 2019, at 9:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Donald Dong <xdong@csumb.edu> writes: >> On May 22, 2019, at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> You're still asking us to answer hypothetical questions unsupported >>> by evidence. In what case does that really happen? > >> I attached the query plan and debug_print_rel output for GEQO and >> standard_join_search. > >> planstate->total_cost cheapest_total_path >> GEQO 54190.13 54239.03 >> STD 54179.02 54273.73 > >> Here I observe GEQO produces a lower >> cheapest_total_path->total_cost, but its planstate->total_cost is higher >> than what standard_join_search produces. > > Well, > > (1) the plan selected by GEQO is in fact more expensive than > the one found by the standard search. Not by much --- as Andrew > observes, this difference is less than what the planner considers > "fuzzily the same" --- but nonetheless 54190.13 > 54179.02. > > (2) the paths you show do not correspond to the finally selected > plans --- they aren't even the same shape. (The Gathers are in > different places, to start with.) I'm not sure where you were > capturing the path data, but it looks like you missed top-level > parallel-aggregation planning, and that managed to find some > plans that were marginally cheaper than the ones you captured. > Keep in mind that GEQO only considers join planning, not > grouping/aggregation. > > Andrew's point about fuzzy cost comparison is also a good one, > though we needn't invoke it to explain these particular numbers. Oh, that's very good to know! I captured the path at the end of the join_search_hook. If I understood correctly, top-level parallel-aggregation will be applied later, so GEQO is not taking it into consideration during the join searching? By looking at the captured costs, I thought GEQO found a better join order than the standard_join_search. However, the final plan using the join order produced by GEQO turns out to be more expansive. Would that imply if GEQO sees a join order which is identical to the one produced by standard_join_search, it will discard it since the cheapest_total_path has a higher cost, though the final plan may be cheaper? Here is another query (JOB/27a.sql) which has more significant cost differences: planstate->total_cost cheapest_total_path GEQO 343016.77 343016.75 STD 342179.13 344137.33 Regards, Donald Dong
Re: Why could GEQO produce plans with lower costs than the standard_join_search?
From
Tom Lane
Date:
Donald Dong <xdong@csumb.edu> writes: > On May 23, 2019, at 9:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> (2) the paths you show do not correspond to the finally selected >> plans --- they aren't even the same shape. (The Gathers are in >> different places, to start with.) I'm not sure where you were >> capturing the path data, but it looks like you missed top-level >> parallel-aggregation planning, and that managed to find some >> plans that were marginally cheaper than the ones you captured. >> Keep in mind that GEQO only considers join planning, not >> grouping/aggregation. > By looking at the captured costs, I thought GEQO found a better join > order than the standard_join_search. However, the final plan using > the join order produced by GEQO turns out to be more expansive. Would > that imply if GEQO sees a join order which is identical to the one > produced by standard_join_search, it will discard it since the > cheapest_total_path has a higher cost, though the final plan may be > cheaper? I suspect what's really going on is that you're looking at the wrong paths. The planner remembers more paths for each rel than just the cheapest-total-cost one, the reason being that total cost is not the only figure of merit. The plan that is winning in the end, it looks like, is parallelized aggregation on top of a non-parallel join plan, but the cheapest_total_path uses up the opportunity for a Gather on a parallelized scan/join. If we were just doing a scan/join and no aggregation, that path would have been the basis for the final plan, but it's evidently not being chosen here; the planner is going to some other scan/join path that is not parallelized. I haven't looked closely at whether the parallel-query hacking has paid any attention to GEQO. It's entirely likely that GEQO is still choosing its join order on the basis of cheapest-total scan/join cost without regard to parallelizability, which would lead to an apparently better cost for the cheapest_total_path even though the path that will end up being used is some other one. regards, tom lane
Re: Why could GEQO produce plans with lower costs than thestandard_join_search?
From
Donald Dong
Date:
On May 23, 2019, at 10:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Donald Dong <xdong@csumb.edu> writes: >> On May 23, 2019, at 9:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> (2) the paths you show do not correspond to the finally selected >>> plans --- they aren't even the same shape. (The Gathers are in >>> different places, to start with.) I'm not sure where you were >>> capturing the path data, but it looks like you missed top-level >>> parallel-aggregation planning, and that managed to find some >>> plans that were marginally cheaper than the ones you captured. >>> Keep in mind that GEQO only considers join planning, not >>> grouping/aggregation. > >> By looking at the captured costs, I thought GEQO found a better join >> order than the standard_join_search. However, the final plan using >> the join order produced by GEQO turns out to be more expansive. Would >> that imply if GEQO sees a join order which is identical to the one >> produced by standard_join_search, it will discard it since the >> cheapest_total_path has a higher cost, though the final plan may be >> cheaper? > > I suspect what's really going on is that you're looking at the wrong > paths. The planner remembers more paths for each rel than just the > cheapest-total-cost one, the reason being that total cost is not the > only figure of merit. The plan that is winning in the end, it looks > like, is parallelized aggregation on top of a non-parallel join plan, > but the cheapest_total_path uses up the opportunity for a Gather on > a parallelized scan/join. If we were just doing a scan/join and > no aggregation, that path would have been the basis for the final > plan, but it's evidently not being chosen here; the planner is going > to some other scan/join path that is not parallelized. Seems the paths in the final rel (path list, cheapest parameterized paths, cheapest startup path, and cheapest total path) are the same identical path for this particular query (JOB/1a.sql). Am I missing anything? Since the total cost of the cheapest-total-path is what GEQO is currently using to evaluate the fitness (minimizing), I'm expecting the cheapest-total-cost to measure how good is a join order. So a join order from standard_join_search, with higher cheapest-total-cost, ends up to be better is pretty surprising to me. Perhaps the cheapest-total-cost should not be the best/only choice for fitness? Regards, Donald Dong
Re: Why could GEQO produce plans with lower costs than the standard_join_search?
From
Tom Lane
Date:
Donald Dong <xdong@csumb.edu> writes: > Perhaps the cheapest-total-cost should not be the best/only choice > for fitness? Well, really the GEQO code should be thrown out and rewritten from the ground up ... but that hasn't quite gotten done yet. regards, tom lane