Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower - Mailing list pgsql-bugs
| From | Andres Freund |
|---|---|
| Subject | Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower |
| Date | |
| Msg-id | 20220707043656.6tug2hfauxbizmi3@awork3.anarazel.de Whole thread Raw |
| In response to | Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower (Tom Lane <tgl@sss.pgh.pa.us>) |
| Responses |
Re: BUG #17540: Prepared statement: PG switches to a generic query plan which is consistently much slower
|
| List | pgsql-bugs |
Hi,
On 2022-07-06 23:13:18 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > I think the cost for the slow plan being so much cheaper can almost be
> > qualified as bug.
> > The slow plan seems pretty nonsensical to me. ISTM that something in the
> > costing there is at least almost broken.
>
> I think this is probably an instance of the known problem that a generic
> plan is made without knowledge of the actual parameter values, and that
> can lead us to make statistical assumptions that are not valid for the
> actual values, but nonetheless make one plan look cheaper than another
> even though the opposite is true given the actual values. In essence,
> comparing the cost estimate for the generic plan to the cost estimate
> for a custom plan is not really logically valid, because those estimates
> are founded on different statistics. I don't know how to fix that :-(.
I think there's something more fundamentally wrong - somehow we end up with
assuming > 50% selectivity on both the min and the max initplan, for the same
condition! And afaics (although it's a bit hard to see with the precision
explain prints floating point values as) don't charge cpu_operator_cost /
cpu_tuple_cost. And this is on a table where we can know, despite not know the
parameter value, that the column being compared has a correlation of 1.
In this case the whole generic plan part seems like a red herring. The generic
plan is *awful* and would still be awful if the value were known, but
somewhere around the middle of the value range.
Here's the op's tables + query, but without the prepared statement part:
CREATE TABLE relation_tuple_transaction (
id BIGSERIAL NOT NULL UNIQUE,
timestamp TIMESTAMP WITHOUT TIME ZONE NOT NULL UNIQUE,
CONSTRAINT pk_rttx PRIMARY KEY (id)
);
CREATE INDEX ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction(timestamp);
INSERT INTO relation_tuple_transaction(timestamp) SELECT * FROM generate_series
( now() - interval '3 days'
, now()
, '1 second'::interval) dd
;
vacuum freeze analyze;
EXPLAIN ANALYZE SELECT MIN(id), MAX(id) from relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5
days');
postgres[631148][1]=# EXPLAIN ANALYZE SELECT MIN(id), MAX(id) from relation_tuple_transaction WHERE timestamp >= (now()
-interval '1.5 days');;
Result (cost=1.01..1.02 rows=1 width=16) (actual time=113.379..113.381 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.50 rows=1 width=8) (actual time=113.347..113.348 rows=1 loops=1)
-> Index Scan using pk_rttx on relation_tuple_transaction (cost=0.42..10741.45 rows=127009 width=8) (actual
time=113.345..113.345rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
Rows Removed by Filter: 129746
InitPlan 2 (returns $1)
-> Limit (cost=0.42..0.50 rows=1 width=8) (actual time=0.024..0.024 rows=1 loops=1)
-> Index Scan Backward using pk_rttx on relation_tuple_transaction relation_tuple_transaction_1
(cost=0.42..10741.45rows=127009 width=8) (actual time=0.023..0.023 rows=1 loops=1)
Index Cond: (id IS NOT NULL)
Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
Planning Time: 0.370 ms
Execution Time: 113.441 ms
(14 rows)
We're pretty much by definition scanning half the table via the index scans,
and end up with a cost of 1.02 (yes, aware that the paths are costed
separately).
FWIW, manually writing the min/max as ORDER BY timestamp ASC/DESC LIMIT 1
queries yields a *vastly* better plan:
EXPLAIN ANALYZE SELECT (SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days')
ORDERBY timestamp ASC LIMIT 1), (SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5
days')ORDER BY timestamp DESC LIMIT 1);
Result (cost=0.92..0.93 rows=1 width=16) (actual time=0.110..0.111 rows=1 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.46 rows=1 width=16) (actual time=0.079..0.079 rows=1 loops=1)
-> Index Scan using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction
(cost=0.42..4405.46rows=129602 width=16) (actual time=0.077..0.078 rows=1 loops=1)
Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
InitPlan 2 (returns $1)
-> Limit (cost=0.42..0.46 rows=1 width=16) (actual time=0.028..0.028 rows=1 loops=1)
-> Index Scan Backward using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction
relation_tuple_transaction_1 (cost=0.42..4405.46 rows=129602 width=16) (actual time=0.027..0.027 rows=1 loops=1)
Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
Planning Time: 0.270 ms
Execution Time: 0.159 ms
(11 rows)
And it stays sane even if you add a (redundantly evaluated) AND id IS NOT NULL.
EXPLAIN SELECT id FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days') AND id IS NOT NULL
ORDERBY timestamp ASC LIMIT 1;
QUERY PLAN
Limit (cost=0.42..0.46 rows=1 width=16)
-> Index Scan using ix_relation_tuple_transaction_by_timestamp on relation_tuple_transaction (cost=0.42..4405.46
rows=129602width=16)
Index Cond: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
Filter: (id IS NOT NULL)
(4 rows)
EXPLAIN SELECT min(id) FROM relation_tuple_transaction WHERE timestamp >= (now() - interval '1.5 days');
QUERY PLAN
Result (cost=0.50..0.51 rows=1 width=8)
InitPlan 1 (returns $0)
-> Limit (cost=0.42..0.50 rows=1 width=8)
-> Index Scan using pk_rttx on relation_tuple_transaction (cost=0.42..10741.45 rows=129602 width=8)
Index Cond: (id IS NOT NULL)
Filter: ("timestamp" >= (now() - '1 day 12:00:00'::interval))
(6 rows)
Greetings,
Andres Freund
pgsql-bugs by date: