Thread: why is postgres estimating so badly?
I have a query and estimations and results don´t look similar, here is explain analyze:
NOTICE: QUERY PLAN:
Sort (cost=12443.90..12443.90 rows=1 width=93) (actual time=505331.94..505332.67 rows=175 loops=1)
-> Aggregate (cost=12443.88..12443.89 rows=1 width=93) (actual time=472520.29..505326.48 rows=175 loops=1)
-> Group (cost=12443.88..12443.89 rows=1 width=93) (actual time=472307.31..485173.92 rows=325302 loops=1)
-> Sort (cost=12443.88..12443.88 rows=1 width=93) (actual time=472307.24..473769.79 rows=325302 loops=1)
-> Nested Loop (cost=12439.25..12443.87 rows=1 width=93) (actual time=103787.68..441614.43 rows=325302 loops=1)
-> Hash Join (cost=12439.25..12440.64 rows=1 width=85) (actual time=103733.76..120916.86 rows=325302 loops=1)
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=15) (actual time=7.81..8.72 rows=25 loops=1)
-> Hash (cost=12439.25..12439.25 rows=1 width=70) (actual time=103722.25..103722.25 rows=0 loops=1)
-> Nested Loop (cost=0.00..12439.25 rows=1 width=70) (actual time=95.43..100162.91 rows=325302 loops=1)
-> Nested Loop (cost=0.00..12436.23 rows=1 width=62) (actual time=84.91..47502.93 rows=325302 loops=1)
-> Nested Loop (cost=0.00..12412.93 rows=4 width=24) (actual time=66.86..8806.01 rows=43424 loops=1)
-> Seq Scan on part (cost=0.00..12399.00 rows=1 width=4) (actual time=24.88..4076.81 rows=10856 loops=1)
-> Index Scan using partsupp_pkey on partsupp (cost=0.00..13.89 rows=4 width=20) (actual time=0.20..0.34 rows=4 loops=10856)
-> Index Scan using l_partsupp_index on lineitem (cost=0.00..6.02 rows=1 width=38) (actual time=0.20..0.61 rows=7 loops=43424)
-> Index Scan using supplier_pkey on supplier (cost=0.00..3.01 rows=1 width=8) (actual time=0.08..0.10 rows=1 loops=325302)
-> Index Scan using orders_pkey on orders (cost=0.00..3.22 rows=1 width=8) (actual time=0.85..0.87 rows=1 loops=325302)
Total runtime: 505563.85 msec
-> Aggregate (cost=12443.88..12443.89 rows=1 width=93) (actual time=472520.29..505326.48 rows=175 loops=1)
-> Group (cost=12443.88..12443.89 rows=1 width=93) (actual time=472307.31..485173.92 rows=325302 loops=1)
-> Sort (cost=12443.88..12443.88 rows=1 width=93) (actual time=472307.24..473769.79 rows=325302 loops=1)
-> Nested Loop (cost=12439.25..12443.87 rows=1 width=93) (actual time=103787.68..441614.43 rows=325302 loops=1)
-> Hash Join (cost=12439.25..12440.64 rows=1 width=85) (actual time=103733.76..120916.86 rows=325302 loops=1)
-> Seq Scan on nation (cost=0.00..1.25 rows=25 width=15) (actual time=7.81..8.72 rows=25 loops=1)
-> Hash (cost=12439.25..12439.25 rows=1 width=70) (actual time=103722.25..103722.25 rows=0 loops=1)
-> Nested Loop (cost=0.00..12439.25 rows=1 width=70) (actual time=95.43..100162.91 rows=325302 loops=1)
-> Nested Loop (cost=0.00..12436.23 rows=1 width=62) (actual time=84.91..47502.93 rows=325302 loops=1)
-> Nested Loop (cost=0.00..12412.93 rows=4 width=24) (actual time=66.86..8806.01 rows=43424 loops=1)
-> Seq Scan on part (cost=0.00..12399.00 rows=1 width=4) (actual time=24.88..4076.81 rows=10856 loops=1)
-> Index Scan using partsupp_pkey on partsupp (cost=0.00..13.89 rows=4 width=20) (actual time=0.20..0.34 rows=4 loops=10856)
-> Index Scan using l_partsupp_index on lineitem (cost=0.00..6.02 rows=1 width=38) (actual time=0.20..0.61 rows=7 loops=43424)
-> Index Scan using supplier_pkey on supplier (cost=0.00..3.01 rows=1 width=8) (actual time=0.08..0.10 rows=1 loops=325302)
-> Index Scan using orders_pkey on orders (cost=0.00..3.22 rows=1 width=8) (actual time=0.85..0.87 rows=1 loops=325302)
Total runtime: 505563.85 msec
estimated 12000msec
here is the query:
SELECT
nation,
o_year,
CAST((sum(amount))AS NUMERIC(10,2))AS sum_profit
FROM(
SELECT
nation.name AS nation,
EXTRACT(year FROM orders.orderdate) AS o_year,
lineitem.extendedprice*(1-lineitem.discount)-partsupp.supplycost*lineitem.quantity AS amount
FROM
part,
supplier,
lineitem,
partsupp,
orders,
nation
WHERE
supplier.suppkey=lineitem.suppkey
AND partsupp.suppkey=lineitem.suppkey
AND partsupp.partkey=lineitem.partkey
AND part.partkey=lineitem.partkey
AND orders.orderkey=lineitem.orderkey
AND supplier.nationkey=nation.nationkey
AND part.name LIKE '%green%'
) AS profit
GROUP BY
nation,
o_year
ORDER BY
nation,
o_year DESC;
nation,
o_year,
CAST((sum(amount))AS NUMERIC(10,2))AS sum_profit
FROM(
SELECT
nation.name AS nation,
EXTRACT(year FROM orders.orderdate) AS o_year,
lineitem.extendedprice*(1-lineitem.discount)-partsupp.supplycost*lineitem.quantity AS amount
FROM
part,
supplier,
lineitem,
partsupp,
orders,
nation
WHERE
supplier.suppkey=lineitem.suppkey
AND partsupp.suppkey=lineitem.suppkey
AND partsupp.partkey=lineitem.partkey
AND part.partkey=lineitem.partkey
AND orders.orderkey=lineitem.orderkey
AND supplier.nationkey=nation.nationkey
AND part.name LIKE '%green%'
) AS profit
GROUP BY
nation,
o_year
ORDER BY
nation,
o_year DESC;
lineitem is about 6M rows
partsupp 800K rows
part 200K rows
any advice?
Thanks and regards
"Luis Alberto Amigo Navarro" <lamigo@atc.unican.es> writes: > -> Seq Scan on part (cost=0.00..12399.00 rows=1 width=4) (actual time=24.88..4076.81 rows=10856 loops=1) Seems like the major misestimation is above: the LIKE clause on part is estimated to select just one row, but it selects 10856 of 'em. Had the planner realized the number of returned rows would be in the thousands, it'd likely have used quite a different plan structure. > AND part.name LIKE '%green%' It's difficult for the planner to produce a decent estimate for the selectivity of an unanchored LIKE clause, since there are no statistics it can use for the purpose. We recently changed FIXED_CHAR_SEL in src/backend/utils/adt/selfuncs.c from 0.04 to 0.20, which would make this particular case come out better. (I believe the estimate would work out to about 320, if part is 200K rows; that should be enough to produce at least some change of plan.) You could try patching your local installation likewise. regards, tom lane
> > AND part.name LIKE '%green%' > > It's difficult for the planner to produce a decent estimate for the > selectivity of an unanchored LIKE clause, since there are no statistics > it can use for the purpose. We recently changed FIXED_CHAR_SEL in > src/backend/utils/adt/selfuncs.c from 0.04 to 0.20, which would make > this particular case come out better. (I believe the estimate would > work out to about 320, if part is 200K rows; that should be enough to > produce at least some change of plan.) You could try patching your > local installation likewise. Here are the results, worse than before: NOTICE: QUERY PLAN: Sort (cost=25209.88..25209.88 rows=1 width=93) (actual time=1836143.78..1836144.48 rows=175 loops=1) -> Aggregate (cost=25209.85..25209.87 rows=1 width=93) (actual time=1803559.97..1836136.47 rows=175 loops=1) -> Group (cost=25209.85..25209.86 rows=2 width=93) (actual time=1803348.04..1816093.89 rows=325302 loops=1) -> Sort (cost=25209.85..25209.85 rows=2 width=93) (actual time=1803347.97..1804795.41 rows=325302 loops=1) -> Hash Join (cost=25208.43..25209.84 rows=2 width=93) (actual time=1744714.61..1772790.19 rows=325302 loops=1) -> Seq Scan on nation (cost=0.00..1.25rows=25 width=15) (actual time=13.92..14.84 rows=25 loops=1) -> Hash (cost=25208.42..25208.42 rows=2 width=78) (actual time=1744603.74..1744603.74 rows=0 loops=1) -> Nested Loop (cost=0.00..25208.42rows=2 width=78) (actual time=139.21..1740110.04 rows=325302 loops=1) -> Nested Loop (cost=0.00..25201.19 rows=2 width=70) (actual time=122.37..1687895.49 rows=325302 loops=1) -> NestedLoop (cost=0.00..25187.93 rows=4 width=62) (actual time=121.75..856097.27 rows=325302 loops=1) -> Nested Loop (cost=0.00..17468.91 rows=1280 width=24) (actual time=78.43..19698.77 rows=43424 loops=1) -> Seq Scan on part (cost=0.00..12399.00 rows=320 width=4) (actual time=29.57..4179.70 rows=10856 loops=1) -> Index Scan using partsupp_pkey on partsupp (cost=0.00..15.79 rows=4 width=20) (actual time=1.17..1.33 rows=4 loops=10856) -> Index Scan using l_partsupp_index on lineitem (cost=0.00..6.02 rows=1 width=38) (actual time=2.83..18.97 rows=7 loops=43424) -> Index Scan using orders_pkey on orders (cost=0.00..3.23 rows=1 width=8) (actual time=2.47..2.50 rows=1 loops=325302) -> Index Scan using supplier_pkey on supplier (cost=0.00..3.01 rows=1 width=8) (actual time=0.08..0.09 rows=1 loops=325302) Total runtime: 1836375.16 msec It looks even worse, another advice?, or maybe a query change. here is the query again: SELECTnation,o_year,CAST((sum(amount))AS NUMERIC(10,2))AS sum_profit FROM(SELECT nation.name AS nation, EXTRACT(year FROM orders.orderdate) AS o_year, lineitem.extendedprice*(1-lineitem.discount)-partsupp.supplycost*lineitem.qu antity AS amountFROM part, supplier, lineitem, partsupp, orders, nationWHERE supplier.suppkey=lineitem.suppkey AND partsupp.suppkey=lineitem.suppkeyAND partsupp.partkey=lineitem.partkey AND part.partkey=lineitem.partkey AND orders.orderkey=lineitem.orderkeyAND supplier.nationkey=nation.nationkey AND part.name LIKE '%green%') AS profit GROUP BYnation,o_year ORDER BYnation,o_year DESC; Thanks and regards
The first thing to point out is that the estimated cost is measured in terms of page reads while the actual time is measured in milliseconds. So even if the cost estimate is accurate it is unlikely that those numbers will be the same. -N -- Nathan C. Burnett Research Assistant, Wisconsin Network Disks Department of Computer Sciences University of Wisconsin - Madison ncb@cs.wisc.edu On Wed, 17 Jul 2002, Luis Alberto Amigo Navarro wrote: > I have a query and estimations and results don�t look similar, here is explain analyze: > > NOTICE: QUERY PLAN: > > Sort (cost=12443.90..12443.90 rows=1 width=93) (actual time=505331.94..505332.67 rows=175 loops=1) > -> Aggregate (cost=12443.88..12443.89 rows=1 width=93) (actual time=472520.29..505326.48 rows=175 loops=1) > -> Group (cost=12443.88..12443.89 rows=1 width=93) (actual time=472307.31..485173.92 rows=325302 loops=1) > -> Sort (cost=12443.88..12443.88 rows=1 width=93) (actual time=472307.24..473769.79 rows=325302 loops=1) > -> Nested Loop (cost=12439.25..12443.87 rows=1 width=93) (actual time=103787.68..441614.43 rows=325302loops=1) > -> Hash Join (cost=12439.25..12440.64 rows=1 width=85) (actual time=103733.76..120916.86 rows=325302loops=1) > -> Seq Scan on nation (cost=0.00..1.25 rows=25 width=15) (actual time=7.81..8.72 rows=25loops=1) > -> Hash (cost=12439.25..12439.25 rows=1 width=70) (actual time=103722.25..103722.25 rows=0loops=1) > -> Nested Loop (cost=0.00..12439.25 rows=1 width=70) (actual time=95.43..100162.91rows=325302 loops=1) > -> Nested Loop (cost=0.00..12436.23 rows=1 width=62) (actual time=84.91..47502.93rows=325302 loops=1) > -> Nested Loop (cost=0.00..12412.93 rows=4 width=24) (actual time=66.86..8806.01rows=43424 loops=1) > -> Seq Scan on part (cost=0.00..12399.00 rows=1 width=4) (actualtime=24.88..4076.81 rows=10856 loops=1) > -> Index Scan using partsupp_pkey on partsupp (cost=0.00..13.89rows=4 width=20) (actual time=0.20..0.34 rows=4 loops=10856) > -> Index Scan using l_partsupp_index on lineitem (cost=0.00..6.02 rows=1width=38) (actual time=0.20..0.61 rows=7 loops=43424) > -> Index Scan using supplier_pkey on supplier (cost=0.00..3.01 rows=1 width=8)(actual time=0.08..0.10 rows=1 loops=325302) > -> Index Scan using orders_pkey on orders (cost=0.00..3.22 rows=1 width=8) (actual time=0.85..0.87rows=1 loops=325302) > Total runtime: 505563.85 msec > > estimated 12000msec > > here is the query: > SELECT > nation, > o_year, > CAST((sum(amount))AS NUMERIC(10,2))AS sum_profit > FROM( > SELECT > nation.name AS nation, > EXTRACT(year FROM orders.orderdate) AS o_year, > lineitem.extendedprice*(1-lineitem.discount)-partsupp.supplycost*lineitem.quantity AS amount > FROM > part, > supplier, > lineitem, > partsupp, > orders, > nation > WHERE > supplier.suppkey=lineitem.suppkey > AND partsupp.suppkey=lineitem.suppkey > AND partsupp.partkey=lineitem.partkey > AND part.partkey=lineitem.partkey > AND orders.orderkey=lineitem.orderkey > AND supplier.nationkey=nation.nationkey > AND part.name LIKE '%green%' > ) AS profit > GROUP BY > nation, > o_year > ORDER BY > nation, > o_year DESC; > > lineitem is about 6M rows > partsupp 800K rows > part 200K rows > > any advice? > Thanks and regards > > >