Thread: Why could GEQO produce plans with lower costs than thestandard_join_search?

Why could GEQO produce plans with lower costs than thestandard_join_search?

From

Donald Dong

Date:

07 May 2019, 23:29:04

Hi,

I was expecting the plans generated by standard_join_search to have lower costs
than the plans from GEQO. But after the results I have from a join order
benchmark show that GEQO produces plans with lower costs most of the time!

I wonder what is causing this observation? From my understanding,
standard_join_search is doing a complete search. So I'm not sure how the GEQO
managed to do better than that.

Thank you,
Donald Dong

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

From

Tom Lane

Date:

07 May 2019, 23:44:31

Donald Dong <xdong@csumb.edu> writes:
> I was expecting the plans generated by standard_join_search to have lower costs
> than the plans from GEQO. But after the results I have from a join order
> benchmark show that GEQO produces plans with lower costs most of the time!

> I wonder what is causing this observation? From my understanding,
> standard_join_search is doing a complete search. So I'm not sure how the GEQO
> managed to do better than that.

standard_join_search is *not* exhaustive; there's a heuristic that causes
it not to consider clauseless joins unless it has to.

For the most part, GEQO uses the same heuristic (cf desirable_join()),
but given the right sort of query shape you can probably trick it into
situations where it will be forced to use a clauseless join when the
core code wouldn't.  It'd still be surprising for that to come out with
a lower cost estimate than a join order that obeys the heuristic,
though.  Clauseless joins are generally pretty awful.

I'm a tad suspicious about the representativeness of your benchmark
queries if you find this is happening "most of the time".

            regards, tom lane

Re: Why could GEQO produce plans with lower costs than thestandard_join_search?

From

Donald Dong

Date:

22 May 2019, 18:35:07

Hi,

Thank you very much for the explanation! I think the join order
benchmark I used [1] is somewhat representative, however, I probably
didn't use the most accurate cost estimation.

I find the cost from cheapest_total_path->total_cost is different
from the cost from  queryDesc->planstate->total_cost. What I saw was
that GEQO tends to form paths with lower
cheapest_total_path->total_cost (aka the fitness of the children).
However, standard_join_search is more likely to produce a lower
queryDesc->planstate->total_cost, which is the cost we get using
explain.

I wonder why those two total costs are different? If the total_cost
from the planstate is more accurate, could we use that instead as the
fitness in geqo_eval?

[1] https://github.com/gregrahn/join-order-benchmark

Regards,
Donald Dong

> On May 7, 2019, at 4:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Donald Dong <xdong@csumb.edu> writes:
>> I was expecting the plans generated by standard_join_search to have lower costs
>> than the plans from GEQO. But after the results I have from a join order
>> benchmark show that GEQO produces plans with lower costs most of the time!
>
>> I wonder what is causing this observation? From my understanding,
>> standard_join_search is doing a complete search. So I'm not sure how the GEQO
>> managed to do better than that.
>
> standard_join_search is *not* exhaustive; there's a heuristic that causes
> it not to consider clauseless joins unless it has to.
>
> For the most part, GEQO uses the same heuristic (cf desirable_join()),
> but given the right sort of query shape you can probably trick it into
> situations where it will be forced to use a clauseless join when the
> core code wouldn't.  It'd still be surprising for that to come out with
> a lower cost estimate than a join order that obeys the heuristic,
> though.  Clauseless joins are generally pretty awful.
>
> I'm a tad suspicious about the representativeness of your benchmark
> queries if you find this is happening "most of the time".
>
>             regards, tom lane

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

From

Tom Lane

Date:

22 May 2019, 18:42:31

Donald Dong <xdong@csumb.edu> writes:
> I find the cost from cheapest_total_path->total_cost is different
> from the cost from  queryDesc->planstate->total_cost. What I saw was
> that GEQO tends to form paths with lower
> cheapest_total_path->total_cost (aka the fitness of the children).
> However, standard_join_search is more likely to produce a lower
> queryDesc->planstate->total_cost, which is the cost we get using
> explain.

> I wonder why those two total costs are different? If the total_cost
> from the planstate is more accurate, could we use that instead as the
> fitness in geqo_eval?

You're still asking us to answer hypothetical questions unsupported
by evidence.  In what case does that really happen?

            regards, tom lane

Re: Why could GEQO produce plans with lower costs than thestandard_join_search?

From

Donald Dong

Date:

22 May 2019, 19:53:56

On May 22, 2019, at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> Donald Dong <xdong@csumb.edu> writes:
>> I find the cost from cheapest_total_path->total_cost is different
>> from the cost from  queryDesc->planstate->total_cost. What I saw was
>> that GEQO tends to form paths with lower
>> cheapest_total_path->total_cost (aka the fitness of the children).
>> However, standard_join_search is more likely to produce a lower
>> queryDesc->planstate->total_cost, which is the cost we get using
>> explain.
> 
>> I wonder why those two total costs are different? If the total_cost
>> from the planstate is more accurate, could we use that instead as the
>> fitness in geqo_eval?
> 
> You're still asking us to answer hypothetical questions unsupported
> by evidence.  In what case does that really happen?

Hi,

My apologies if this is not the minimal necessary set up. But here's
more information about what I saw using the following query
(JOB/1a.sql):

SELECT MIN(mc.note) AS production_note,
       MIN(t.title) AS movie_title,
       MIN(t.production_year) AS movie_year
FROM company_type AS ct,
     info_type AS it,
     movie_companies AS mc,
     movie_info_idx AS mi_idx,
     title AS t
WHERE ct.kind = 'production companies'
  AND it.info = 'top 250 rank'
  AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'
  AND (mc.note LIKE '%(co-production)%'
       OR mc.note LIKE '%(presents)%')
  AND ct.id = mc.company_type_id
  AND t.id = mc.movie_id
  AND t.id = mi_idx.movie_id
  AND mc.movie_id = mi_idx.movie_id
  AND it.id = mi_idx.info_type_id;

I attached the query plan and debug_print_rel output for GEQO and
standard_join_search.

            planstate->total_cost    cheapest_total_path
GEQO        54190.13                54239.03
STD            54179.02                54273.73

Here I observe GEQO  produces a lower
cheapest_total_path->total_cost, but its planstate->total_cost is higher
than what standard_join_search produces.

Regards,
Donald Dong

Attachment

Re: Why could GEQO produce plans with lower costs than thestandard_join_search?

From

"Finnerty, Jim"

Date:

22 May 2019, 21:03:08

Fwiw, I had an intern do some testing on the JOB last year, and he reported that geqo sometimes produced plans of lower
costthan the standard planner (we were on PG10 at the time).  I filed it under "unexplained things that we need to
investigatewhen we have time", but alas...
 

In any case, Donald isn't the only one who has noticed this behavior. 

On 5/22/19, 3:54 PM, "Donald Dong" <xdong@csumb.edu> wrote:

    On May 22, 2019, at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
    > 
    > Donald Dong <xdong@csumb.edu> writes:
    >> I find the cost from cheapest_total_path->total_cost is different
    >> from the cost from  queryDesc->planstate->total_cost. What I saw was
    >> that GEQO tends to form paths with lower
    >> cheapest_total_path->total_cost (aka the fitness of the children).
    >> However, standard_join_search is more likely to produce a lower
    >> queryDesc->planstate->total_cost, which is the cost we get using
    >> explain.
    > 
    >> I wonder why those two total costs are different? If the total_cost
    >> from the planstate is more accurate, could we use that instead as the
    >> fitness in geqo_eval?
    > 
    > You're still asking us to answer hypothetical questions unsupported
    > by evidence.  In what case does that really happen?
    
    Hi,
    
    My apologies if this is not the minimal necessary set up. But here's
    more information about what I saw using the following query
    (JOB/1a.sql):
    
    SELECT MIN(mc.note) AS production_note,
           MIN(t.title) AS movie_title,
           MIN(t.production_year) AS movie_year
    FROM company_type AS ct,
         info_type AS it,
         movie_companies AS mc,
         movie_info_idx AS mi_idx,
         title AS t
    WHERE ct.kind = 'production companies'
      AND it.info = 'top 250 rank'
      AND mc.note NOT LIKE '%(as Metro-Goldwyn-Mayer Pictures)%'
      AND (mc.note LIKE '%(co-production)%'
           OR mc.note LIKE '%(presents)%')
      AND ct.id = mc.company_type_id
      AND t.id = mc.movie_id
      AND t.id = mi_idx.movie_id
      AND mc.movie_id = mi_idx.movie_id
      AND it.id = mi_idx.info_type_id;
    
    I attached the query plan and debug_print_rel output for GEQO and
    standard_join_search.
    
                planstate->total_cost    cheapest_total_path
    GEQO        54190.13                54239.03
    STD            54179.02                54273.73
    
    Here I observe GEQO  produces a lower
    cheapest_total_path->total_cost, but its planstate->total_cost is higher
    than what standard_join_search produces.
    
    Regards,
    Donald Dong

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

From

Andrew Gierth

Date:

23 May 2019, 08:57:18

>>>>> "Finnerty" == Finnerty, Jim <jfinnert@amazon.com> writes:

 Finnerty> planstate-> total_cost    cheapest_total_path
 Finnerty>     GEQO        54190.13        54239.03
 Finnerty>     STD        54179.02        54273.73

These differences aren't significant - the standard join search has a
"fuzz factor" built into it, such that paths have to be more than 1%
better in cost in order to actually be considered as being better than
an existing path.

-- 
Andrew (irc:RhodiumToad)

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

From

Tom Lane

Date:

23 May 2019, 16:02:18

Donald Dong <xdong@csumb.edu> writes:
> On May 22, 2019, at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> You're still asking us to answer hypothetical questions unsupported
>> by evidence.  In what case does that really happen?

> I attached the query plan and debug_print_rel output for GEQO and
> standard_join_search.

>             planstate->total_cost    cheapest_total_path
> GEQO        54190.13                54239.03
> STD            54179.02                54273.73

> Here I observe GEQO  produces a lower
> cheapest_total_path->total_cost, but its planstate->total_cost is higher
> than what standard_join_search produces.

Well,

(1) the plan selected by GEQO is in fact more expensive than
the one found by the standard search.  Not by much --- as Andrew
observes, this difference is less than what the planner considers
"fuzzily the same" --- but nonetheless 54190.13 > 54179.02.

(2) the paths you show do not correspond to the finally selected
plans --- they aren't even the same shape.  (The Gathers are in
different places, to start with.)  I'm not sure where you were
capturing the path data, but it looks like you missed top-level
parallel-aggregation planning, and that managed to find some
plans that were marginally cheaper than the ones you captured.
Keep in mind that GEQO only considers join planning, not
grouping/aggregation.

Andrew's point about fuzzy cost comparison is also a good one,
though we needn't invoke it to explain these particular numbers.

            regards, tom lane

Re: Why could GEQO produce plans with lower costs than thestandard_join_search?

From

Donald Dong

Date:

23 May 2019, 17:05:29

On May 23, 2019, at 9:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 
> Donald Dong <xdong@csumb.edu> writes:
>> On May 22, 2019, at 11:42 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> You're still asking us to answer hypothetical questions unsupported
>>> by evidence.  In what case does that really happen?
> 
>> I attached the query plan and debug_print_rel output for GEQO and
>> standard_join_search.
> 
>>             planstate->total_cost    cheapest_total_path
>> GEQO        54190.13                54239.03
>> STD            54179.02                54273.73
> 
>> Here I observe GEQO  produces a lower
>> cheapest_total_path->total_cost, but its planstate->total_cost is higher
>> than what standard_join_search produces.
> 
> Well,
> 
> (1) the plan selected by GEQO is in fact more expensive than
> the one found by the standard search.  Not by much --- as Andrew
> observes, this difference is less than what the planner considers
> "fuzzily the same" --- but nonetheless 54190.13 > 54179.02.
> 
> (2) the paths you show do not correspond to the finally selected
> plans --- they aren't even the same shape.  (The Gathers are in
> different places, to start with.)  I'm not sure where you were
> capturing the path data, but it looks like you missed top-level
> parallel-aggregation planning, and that managed to find some
> plans that were marginally cheaper than the ones you captured.
> Keep in mind that GEQO only considers join planning, not
> grouping/aggregation.
> 
> Andrew's point about fuzzy cost comparison is also a good one,
> though we needn't invoke it to explain these particular numbers.

Oh, that's very good to know! I captured the path at the end of the
join_search_hook. If I understood correctly, top-level
parallel-aggregation will be applied later, so GEQO is not taking it
into consideration during the join searching?

By looking at the captured costs, I thought GEQO found a better join
order than the standard_join_search. However, the final plan using
the join order produced by GEQO turns out to be more expansive. Would
that imply if GEQO sees a join order which is identical to the one
produced by standard_join_search, it will discard it since the
cheapest_total_path has a higher cost, though the final plan may be
cheaper?

Here is another query (JOB/27a.sql) which has more significant cost
differences:

        planstate->total_cost    cheapest_total_path
GEQO    343016.77            343016.75
STD        342179.13            344137.33

Regards,
Donald Dong

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

From

Tom Lane

Date:

23 May 2019, 17:43:11

Donald Dong <xdong@csumb.edu> writes:
> On May 23, 2019, at 9:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> (2) the paths you show do not correspond to the finally selected
>> plans --- they aren't even the same shape.  (The Gathers are in
>> different places, to start with.)  I'm not sure where you were
>> capturing the path data, but it looks like you missed top-level
>> parallel-aggregation planning, and that managed to find some
>> plans that were marginally cheaper than the ones you captured.
>> Keep in mind that GEQO only considers join planning, not
>> grouping/aggregation.

> By looking at the captured costs, I thought GEQO found a better join
> order than the standard_join_search. However, the final plan using
> the join order produced by GEQO turns out to be more expansive. Would
> that imply if GEQO sees a join order which is identical to the one
> produced by standard_join_search, it will discard it since the
> cheapest_total_path has a higher cost, though the final plan may be
> cheaper?

I suspect what's really going on is that you're looking at the wrong
paths.  The planner remembers more paths for each rel than just the
cheapest-total-cost one, the reason being that total cost is not the
only figure of merit.  The plan that is winning in the end, it looks
like, is parallelized aggregation on top of a non-parallel join plan,
but the cheapest_total_path uses up the opportunity for a Gather on
a parallelized scan/join.  If we were just doing a scan/join and
no aggregation, that path would have been the basis for the final
plan, but it's evidently not being chosen here; the planner is going
to some other scan/join path that is not parallelized.

I haven't looked closely at whether the parallel-query hacking has
paid any attention to GEQO.  It's entirely likely that GEQO is still
choosing its join order on the basis of cheapest-total scan/join cost
without regard to parallelizability, which would lead to an apparently
better cost for the cheapest_total_path even though the path that
will end up being used is some other one.

            regards, tom lane

Re: Why could GEQO produce plans with lower costs than thestandard_join_search?

From

Donald Dong

Date:

23 May 2019, 20:15:02

On May 23, 2019, at 10:43 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Donald Dong <xdong@csumb.edu> writes:
>> On May 23, 2019, at 9:02 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> (2) the paths you show do not correspond to the finally selected
>>> plans --- they aren't even the same shape.  (The Gathers are in
>>> different places, to start with.)  I'm not sure where you were
>>> capturing the path data, but it looks like you missed top-level
>>> parallel-aggregation planning, and that managed to find some
>>> plans that were marginally cheaper than the ones you captured.
>>> Keep in mind that GEQO only considers join planning, not
>>> grouping/aggregation.
> 
>> By looking at the captured costs, I thought GEQO found a better join
>> order than the standard_join_search. However, the final plan using
>> the join order produced by GEQO turns out to be more expansive. Would
>> that imply if GEQO sees a join order which is identical to the one
>> produced by standard_join_search, it will discard it since the
>> cheapest_total_path has a higher cost, though the final plan may be
>> cheaper?
> 
> I suspect what's really going on is that you're looking at the wrong
> paths.  The planner remembers more paths for each rel than just the
> cheapest-total-cost one, the reason being that total cost is not the
> only figure of merit.  The plan that is winning in the end, it looks
> like, is parallelized aggregation on top of a non-parallel join plan,
> but the cheapest_total_path uses up the opportunity for a Gather on
> a parallelized scan/join.  If we were just doing a scan/join and
> no aggregation, that path would have been the basis for the final
> plan, but it's evidently not being chosen here; the planner is going
> to some other scan/join path that is not parallelized.

Seems the paths in the final rel (path list, cheapest parameterized
paths, cheapest startup path, and cheapest total path)  are the same
identical path for this particular query (JOB/1a.sql). Am I missing
anything?

Since the total cost of the cheapest-total-path is what GEQO is
currently using to evaluate the fitness (minimizing), I'm expecting
the cheapest-total-cost to measure how good is a join order. So a
join order from standard_join_search, with higher
cheapest-total-cost, ends up to be better is pretty surprising to me.

Perhaps the cheapest-total-cost should not be the best/only choice
for fitness?

Regards,
Donald Dong

Re: Why could GEQO produce plans with lower costs than the standard_join_search?

From

Tom Lane

Date:

23 May 2019, 22:33:21

Donald Dong <xdong@csumb.edu> writes:
> Perhaps the cheapest-total-cost should not be the best/only choice
> for fitness?

Well, really the GEQO code should be thrown out and rewritten from
the ground up ... but that hasn't quite gotten done yet.

            regards, tom lane