How does the query planner make its plan? - Mailing list pgsql-general
From | Christian Schröder |
---|---|
Subject | How does the query planner make its plan? |
Date | |
Msg-id | 47308755.2050509@deriva.de Whole thread Raw |
Responses |
Re: How does the query planner make its plan?
Re: How does the query planner make its plan? |
List | pgsql-general |
Hi list, once again I do not understand how the query planner works and why it apparently does not find the best result. I have a table with about 125 million rows. There is a char(5) column with a (non-unique) index. When I try to find the distinct values in this column using the following sql statement: select distinct exchange from foo the query planner chooses not to use the index, but performs a sequential scan. When I disfavour the use of sequential scans ("set enable_seqscan = off") the performance is more than 6 times better. Why does the query planner's plan go wrong? The table has been vacuum analyzed just before I ran the queries. Here is the plan when I let the query planner alone: QUERY PLAN -------------------------------------------------------------------------------- Unique (cost=23057876.40..23683350.48 rows=4 width=9) -> Sort (cost=23057876.40..23370613.44 rows=125094816 width=9) Sort Key: exchange -> Seq Scan on quotes (cost=0.00..3301683.16 rows=125094816 width=9) (4 rows) This is what really happens: QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=23057876.40..23683350.48 rows=4 width=9) (actual time=1577159.744..1968911.024 rows=4 loops=1) -> Sort (cost=23057876.40..23370613.44 rows=125094816 width=9) (actual time=1577159.742..1927400.118 rows=125094818 loops=1) Sort Key: exchange -> Seq Scan on quotes (cost=0.00..3301683.16 rows=125094816 width=9) (actual time=0.022..169744.162 rows=125094818 loops=1) Total runtime: 1969844.753 ms (5 rows) With "enable_seqscan = off" I get this plan: QUERY PLAN ------------------------------------------------------------------------------------------------------- Unique (cost=0.00..89811549.81 rows=4 width=9) -> Index Scan using quotes_exchange_key on quotes (cost=0.00..89498812.77 rows=125094816 width=9) (2 rows) And again with execution times: QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Unique (cost=0.00..89811549.81 rows=4 width=9) (actual time=0.079..313068.922 rows=4 loops=1) -> Index Scan using quotes_exchange_key on quotes (cost=0.00..89498812.77 rows=125094816 width=9) (actual time=0.078..273787.493 rows=125094818 loops=1) Total runtime: 313068.967 ms (3 rows) I understand that from looking at the estimations (89811549.81 with index scan vs. 23683350.48 with sequential scan) the query planner had to choose the sequential scan. So maybe I have to tune the planner cost constants? Indeed I did some changes to these values, but in my opinion this should make index scans preferable: #seq_page_cost = 1.0 # measured on an arbitrary scale #random_page_cost = 4.0 # same scale as above random_page_cost = 1.0 #cpu_tuple_cost = 0.01 # same scale as above #cpu_index_tuple_cost = 0.005 # same scale as above cpu_index_tuple_cost = 0.001 #cpu_operator_cost = 0.0025 # same scale as above #effective_cache_size = 128MB effective_cache_size = 4GB The machine is a dedicated database server with two dual-core xeon processors and 8 GB memory. Thanks for your help, Christian -- Deriva GmbH Tel.: +49 551 489500-42 Financial IT and Consulting Fax: +49 551 489500-91 Hans-Böckler-Straße 2 http://www.deriva.de D-37079 Göttingen Deriva CA Certificate: http://www.deriva.de/deriva-ca.cer
pgsql-general by date: