Home > mailing lists

Re: Different plan for very similar queries - Mailing list pgsql-performance

From	Tomas Vondra
Subject	Re: Different plan for very similar queries
Date	May 29, 2015 23:47:59
Msg-id	5568FAA5.3080807@2ndquadrant.com Whole thread Raw
In response to	Re: Different plan for very similar queries ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Responses	Re: Different plan for very similar queries
List	pgsql-performance

Tree view

Hi,

On 05/29/15 11:51, Peter J. Holzer wrote:
> A couple of additional observations:
>
> The total cost of both queries is quite similar, so random variations
> might push into one direction or the other. Indeed, after dropping
> and recreating indexes (I tried GIN indexes as suggested by Heikki on
> [1]) and calling analyze after each change, I have now reached a
> state where both queries use the fast plan.

I don't think bitmap indexes are particularly good match for this use
case. The queries need to check an existence of a few records, and btree
indexes are great for that - the first plan is very fast.

Why exactly does the second query use a much slower plan I'm not sure. I
believe I've found an issue in planning semi joins (reported to
pgsql-hackers a few minutes ago), but may be wrong and the code is OK.

Can you try forcing the same plan for the second query, using "enable"
flags? E.g.

    SET enable_mergejoin = off;

will disable the merge join, and push the optimizer towards a different
join type. You may have to disable a few more node types until you get
the same plan as for the first query, i.e.

    nestloop semi join
      -> index scan
      -> index scan

See this for more info:

    http://www.postgresql.org/docs/9.1/static/runtime-config-query.html

Also, have you tuned the PostgreSQL configuration? How?

Can you provide the dataset? Not necessarily all the columns, it should
be sufficient to provide the columns used in the join/where clauses:

     term -> facttablename, columnname, term
     facttable_stat_fta4 -> einheit, berechnungsart

That'd make reproducing the problem much easier.

> In the first case the query planner seems to add the cost of the two
> index scans to get the total cost, despite the fact that for a semi
> join the second index scan can be aborted after the first hit (so
> either the cost of the second scan should be a lot less than
> 384457.80 or it needs to be divided by a large factor for the semi
> join).
>
> In the second case the cost of the second index scan (2545748.85) is
> either completely ignored or divided by a large factor: It doesn't
> seem to contribute much to the total cost.

I believe this is a consequence of the semi join semantics, because the
explain plan contains "total" costs and row counts, as if the whole
relation was scanned (in this case all the 43M rows), but the optimizer
only propagates fraction of the cost estimate (depending on how much of
the relation it expects to scan). In this case it expects to scan a tiny
part of the index scan, so the impact on the total cost is small.

A bit confusing, yeah.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-performance by date:

From: Tomas Vondra
Date: 29 May 2015, 19:20:44
Subject: Re: Postgres is using 100% CPU

From: Ashik S L
Date: 30 May 2015, 13:46:31
Subject: Re: Postgres is using 100% CPU

Re: Different plan for very similar queries - Mailing list pgsql-performance

Previous

Next