Different Choices For Index/Sequential Scan With And Without A Join In 7.2 - Mailing list pgsql-sql
From | Mark kirkwood |
---|---|
Subject | Different Choices For Index/Sequential Scan With And Without A Join In 7.2 |
Date | |
Msg-id | 01082620361000.01707@spikey.slithery.org Whole thread Raw |
Responses |
Re: Different Choices For Index/Sequential Scan With And Without A Join In 7.2
|
List | pgsql-sql |
Dear List, I have been doing a little investigation on when the optimizer chooses a sequential scan over an index access. I have come accross what interesting behaviour in the current 7.2 sources ( 2001-08-17): The consider two types of query on my "usual" tables : SELECT f.d0key, count(f.val) FROM fact0 f WHERE f.d0key BETWEEN 270 AND <integer> GROUP BY f.d0key; and SELECT d0.f1, count(f.val) FROM dim0 d0, fact0 f WHERE d0.d0key = f.d0key AND d0.f1 BETWEEN '2000-01-26' AND <'date'> GROUP BY d0.f1; Note that 'f1' = '2000-01-26' corrosponds to 'd0key' = 270 in the table 'dim0'; I wanted to find the values for <integer> and <date> for which the optimizer changed from and index acess to a seq scan of the 'fact0' table. I used cpu_tuple_cost = 0.4, but everything else was fairly standard. For the first query the value of <integer> ( i.e : 'd0key' ) was 627 For the second the value of <date> (i.e 'f1' ) was '2000-02-05' ( corrosponds to d0key = 279 ) It guess I was expecting the value that made the first query change from index to seq scan to be "close" to the value that made the second query use a sequential scan....as the fact0 access of the second query is essentially the first query. However the results are vastly different - have I missed something obvious here ? The script and explain output are listed below. regards Mark <--script -------------------------------------------------------------------- SET cpu_tuple_cost=0.4; SHOW cpu_tuple_cost; -- show what keys are for what dates... -- SELECT d0.d0key, d0.f1 FROM dim0 d0 WHERE d0.d0key IN ('270','279','280','626','627') ; -- show when index scans change to sequential -- for the fact0 table alone... -- EXPLAIN SELECT f.d0key, count(f.val) FROM fact0 f WHERE f.d0key BETWEEN 270 AND 626 GROUP BY f.d0key ; EXPLAIN SELECT f.d0key, count(f.val) FROM fact0 f WHERE f.d0key BETWEEN 270 AND 627 GROUP BY f.d0key ; -- show when index scans change to sequential -- for the two table join --EXPLAIN SELECT d0.f1, count(f.val) FROM dim0 d0, fact0 f WHERE d0.d0key = f.d0key AND d0.f1 BETWEEN '2000-01-26' AND '2000-02-04' GROUP BY d0.f1 ; EXPLAIN SELECT d0.f1, count(f.val) FROM dim0 d0, fact0 f WHERE d0.d0key = f.d0key AND d0.f1 BETWEEN '2000-01-26' AND '2000-02-05' GROUP BY d0.f1 ; <--results -------------------------------------------------------------------- SET VARIABLE NOTICE: cpu_tuple_cost is 0.4 SHOW VARIABLEd0key | f1 -------+------------------------ 270 | 2000-01-26 00:00:00+13 279 | 2000-02-04 00:00:00+13 280 | 2000-02-05 00:00:00+13 626 | 2001-01-16 00:00:00+13 627 | 2001-01-17 00:00:00+13 (5 rows) NOTICE: QUERY PLAN: Aggregate (cost=0.00..1308177.10 rows=33453 width=8) -> Group (cost=0.00..1307340.77 rows=334533 width=8) -> IndexScan using fact0_pk on fact0 f (cost=0.00..1306504.44 rows=334533 width=8) EXPLAIN NOTICE: QUERY PLAN: Aggregate (cost=1308030.21..1309707.21 rows=33540 width=8) -> Group (cost=1308030.21..1308868.71 rows=335400 width=8) -> Sort (cost=1308030.21..1308030.21 rows=335400 width=8) -> Seq Scan on fact0 f (cost=0.00..1272693.00rows=335400 width=8) EXPLAIN NOTICE: QUERY PLAN: Aggregate (cost=0.00..1155870.07 rows=268 width=20) -> Group (cost=0.00..1155863.36 rows=2684 width=20) -> NestedLoop (cost=0.00..1155856.65 rows=2684 width=20) -> Index Scan using dim0_q1 on dim0 d0 (cost=0.00..6.63 rows=9 width=12) -> Index Scan using fact0_pk on fact0 f (cost=0.00..117117.99 rows=30000 width=8) EXPLAIN NOTICE: QUERY PLAN: Aggregate (cost=1281572.52..1281587.43 rows=298 width=20) -> Group (cost=1281572.52..1281579.97 rows=2982 width=20) -> Sort (cost=1281572.52..1281572.52 rows=2982 width=20) -> Hash Join (cost=7.06..1281400.41 rows=2982width=20) -> Seq Scan on fact0 f (cost=0.00..1257693.00 rows=3000000 width=8) -> Hash (cost=7.04..7.04 rows=10 width=12) -> Index Scanusing dim0_q1 on dim0 d0 (cost=0.00..7.04 rows=10 width=12) EXPLAIN