between not propated into a simple equality join - Mailing list pgsql-hackers
From | Benedikt Grundmann |
---|---|
Subject | between not propated into a simple equality join |
Date | |
Msg-id | CADbMkNM7qiD6Hp5bAsa=Gc04e13a-=Ku4Uby=yaNG5WxZ=WVgA@mail.gmail.com Whole thread Raw |
Responses |
Re: between not propated into a simple equality join
|
List | pgsql-hackers |
We just run into a very simple query that the planner does much worse on than we thought it would (in production the table in question is ~ 100 GB). It surprised us given the planner is generally quite good, so I thought I share our surprise
Setup:
postgres_prod@proddb_testing=# select version();[1]
version
────────────────────────────────────────────────────────────────────────────────────────────────────────────────
PostgreSQL 9.2.16 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-16), 64-bit
(1 row)
Time: 69.246 ms
postgres_prod@proddb_testing=# create table toy_data3 (the_date date, i int);
CREATE TABLE
Time: 67.096 ms
postgres_prod@proddb_testing=# insert into toy_data3
(select current_date-(s.idx/1000), s.idx from generate_series(1,1000000) as s(idx));
INSERT 0 1000000
Time: 1617.483 ms
postgres_prod@proddb_testing=# create index toy_data_date3 on toy_data3(the_date);
CREATE INDEX
Time: 660.166 ms
postgres_prod@proddb_testing=# analyze toy_data3;
ANALYZE
Time: 294.984 ms
The bad behavior:
postgres_prod@proddb_testing=# explain analyze
select * from (
select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2 where td1.the_date = td2.the_date and td1.i = td2.i
) foo
where the_date between current_date and current_date;
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Join (cost=55.49..21980.50 rows=1 width=8) (actual time=0.336..179.374 rows=999 loops=1)
Hash Cond: ((td2.the_date = td1.the_date) AND (td2.i = td1.i))
-> Seq Scan on toy_data3 td2 (cost=0.00..14425.00 rows=1000000 width=8) (actual time=0.007..72.510 rows=1000000 lo
-> Hash (cost=40.44..40.44 rows=1003 width=8) (actual time=0.321..0.321 rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.01..40.44 rows=1003 width=8) (actual time=0.018.
Index Cond: ((the_date >= ('now'::cstring)::date) AND (the_date <= ('now'::cstring)::date))
Total runtime: 179.440 ms
(8 rows)
Time: 246.094 ms
Notice the red. Which is sad because one would like it to realize that it could propagate the index constraint onto td2. That is on both sides of the join do the green.
As it does correctly when one explicitly uses equality (bold below) (but of course we sometimes have multiple day ranges in production and we only used a single date range above to make it extra interesting for the planner to NOT do a seqscan):
postgres_prod@proddb_testing=# explain analyze
select * from (
select td1.the_date, td1.i
from toy_data3 td1, toy_data3 td2 where td1.the_date = td2.the_date and td1.i = td2.i ) foo
where the_date = current_date;
QUERY PLAN
───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Join (cost=50.47..92.17 rows=1 width=8) (actual time=0.300..0.652 rows=999 loops=1)
Hash Cond: (td1.i = td2.i)
-> Index Scan using toy_data_date3 on toy_data3 td1 (cost=0.00..37.93 rows=1003 width=8) (actual time=0.023..0.169
Index Cond: (the_date = ('now'::cstring)::date)
-> Hash (cost=37.93..37.93 rows=1003 width=8) (actual time=0.270..0.270 rows=999 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 40kB
-> Index Scan using toy_data_date3 on toy_data3 td2 (cost=0.00..37.93 rows=1003 width=8) (actual time=0.007.
Index Cond: (the_date = ('now'::cstring)::date)
Total runtime: 0.713 ms
(9 rows)
Time: 66.904 ms
Cheers,
Bene
pgsql-hackers by date: