Re: strange estimate for number of rows - Mailing list pgsql-performance
From | Daniel Manley |
---|---|
Subject | Re: strange estimate for number of rows |
Date | |
Msg-id | 3FB3DD1E.2000509@libertyrms.info Whole thread Raw |
In response to | Re: strange estimate for number of rows (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: strange estimate for number of rows
|
List | pgsql-performance |
Hi, I'm the lead developer on the project this concerns (forgive my newbiness on this list). We tried a couple of scenarios with effective_cache_size=60000, cpu_index_tuple_cost=0.0001 and random_page_cost=2 with no change in the plan. explain analyse select * from tablename where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1'; ----------------------------------------------------------------------------------------------------------------------------------- Seq Scan on tablename (cost=0.00..348199.14 rows=1180724 width=91) (actual time=7727.668..36286.898 rows=579238 loops=1) Filter: ((transaction_date >= '2003-09-01 00:00:00+00'::timestamp with time zone) AND (transaction_date < '2003-10-01 00:00:00+00'::timestamp with time zone)) Total runtime: 36625.351 ms explain analyse select * from transactions_posted where product_id = 2; ----------------------------------------------------------------------------------------------------------------------------------- Seq Scan on transactions_posted (cost=0.00..319767.95 rows=6785237 width=91) (actual time=0.091..35596.328 rows=5713877 loops=1) Filter: (product_id = 2) Total runtime: 38685.373 ms The product_id alone gives a difference of a millions rows from estimate to actual, vs. the factor of 2 from the transaction_date. Dan Manley Tom Lane пишет: >Andrew Sullivan <andrew@libertyrms.info> writes: > > >>The statistics on transaction_date and product_id are set to 1000. >>Everything is all analysed nicely. But I'm getting a poor plan, >>because of an estimate that the number of rows to be returned is >>about double how many actually are: >> >> > > > >>explain analyse select * from transactions_posted where >>transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' and >>product_id = 2; >> >> > >Are the estimates accurate for queries on the two columns individually, >ie >... where transaction_date >= '2003-9-1' and transaction_date < '2003-10-1' >... where product_id = 2 > >If so, the problem is that there's a correlation between >transaction_date and product_id, which the system cannot model because >it has no multi-column statistics. > >However, given that the estimate is only off by about a factor of 2, >you'd still be getting the wrong plan even if the estimate were perfect, >because the estimated costs differ by nearly a factor of 3. > >Given the actual runtimes, I'm thinking maybe you want to reduce >random_page_cost. What are you using for that now? > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 4: Don't 'kill -9' the postmaster > >
pgsql-performance by date: