Re: Different plan for very similar queries - Mailing list pgsql-performance
From | Tomas Vondra |
---|---|
Subject | Re: Different plan for very similar queries |
Date | |
Msg-id | 5568FAA5.3080807@2ndquadrant.com Whole thread Raw |
In response to | Re: Different plan for very similar queries ("Peter J. Holzer" <hjp-pgsql@hjp.at>) |
Responses |
Re: Different plan for very similar queries
|
List | pgsql-performance |
Hi, On 05/29/15 11:51, Peter J. Holzer wrote: > A couple of additional observations: > > The total cost of both queries is quite similar, so random variations > might push into one direction or the other. Indeed, after dropping > and recreating indexes (I tried GIN indexes as suggested by Heikki on > [1]) and calling analyze after each change, I have now reached a > state where both queries use the fast plan. I don't think bitmap indexes are particularly good match for this use case. The queries need to check an existence of a few records, and btree indexes are great for that - the first plan is very fast. Why exactly does the second query use a much slower plan I'm not sure. I believe I've found an issue in planning semi joins (reported to pgsql-hackers a few minutes ago), but may be wrong and the code is OK. Can you try forcing the same plan for the second query, using "enable" flags? E.g. SET enable_mergejoin = off; will disable the merge join, and push the optimizer towards a different join type. You may have to disable a few more node types until you get the same plan as for the first query, i.e. nestloop semi join -> index scan -> index scan See this for more info: http://www.postgresql.org/docs/9.1/static/runtime-config-query.html Also, have you tuned the PostgreSQL configuration? How? Can you provide the dataset? Not necessarily all the columns, it should be sufficient to provide the columns used in the join/where clauses: term -> facttablename, columnname, term facttable_stat_fta4 -> einheit, berechnungsart That'd make reproducing the problem much easier. > In the first case the query planner seems to add the cost of the two > index scans to get the total cost, despite the fact that for a semi > join the second index scan can be aborted after the first hit (so > either the cost of the second scan should be a lot less than > 384457.80 or it needs to be divided by a large factor for the semi > join). > > In the second case the cost of the second index scan (2545748.85) is > either completely ignored or divided by a large factor: It doesn't > seem to contribute much to the total cost. I believe this is a consequence of the semi join semantics, because the explain plan contains "total" costs and row counts, as if the whole relation was scanned (in this case all the 43M rows), but the optimizer only propagates fraction of the cost estimate (depending on how much of the relation it expects to scan). In this case it expects to scan a tiny part of the index scan, so the impact on the total cost is small. A bit confusing, yeah. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-performance by date: