Thread: Query planning question
Doing the following query
select distinct m.id, m.name
from manufacturer_manufacturer m
join product_product p on (p.manufacturer_id=m.id)
join retailer_offer o on (o.product_id=p.id)
where o.retailer_id=XXX and o.active
from manufacturer_manufacturer m
join product_product p on (p.manufacturer_id=m.id)
join retailer_offer o on (o.product_id=p.id)
where o.retailer_id=XXX and o.active
results in one of 2 query plans depending upon the value of XXX.
The first ignores the index on products and does a hash join which is very slow, the second uses the index and does a nested loop which is fast.
The first ignores the index on products and does a hash join which is very slow, the second uses the index and does a nested loop which is fast.
Am I right in assuming the planner thinks a sequential scan is quicker than 10k index hits, would tweaking the costs fix this or would i be better updating the stats for the product_id and manufacturer_id fields?
"Unique (cost=318308.62..321110.94 rows=1029 width=13) (actual time=5057.271..5296.973 rows=699 loops=1)"
" -> Sort (cost=318308.62..319242.73 rows=373642 width=13) (actual time=5057.270..5196.780 rows=455733 loops=1)"
" Sort Key: m.id, m.name"
" Sort Method: external merge Disk: 11032kB"
" -> Hash Join (cost=110196.74..283725.63 rows=373642 width=13) (actual time=1706.287..3451.352 rows=455733 loops=1)"
" Hash Cond: (p.manufacturer_id = m.id)"
" -> Hash Join (cost=110163.59..278554.90 rows=373642 width=4) (actual time=1705.652..3230.879 rows=455733 loops=1)"
" Hash Cond: (o.product_id = p.id)"
" -> Bitmap Heap Scan on retailer_offer o (cost=9418.68..157960.21 rows=373642 width=4) (actual time=120.277..382.208 rows=455733 loops=1)"
" Recheck Cond: ((retailer_id = 1347) AND active)"
" -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..9325.27 rows=373642 width=0) (actual time=79.503..79.503 rows=455829 loops=1)"
" Index Cond: (retailer_id = 1347)"
" -> Hash (cost=59067.07..59067.07 rows=2540307 width=8) (actual time=1584.994..1584.994 rows=2540324 loops=1)"
" -> Seq Scan on product_product p (cost=0.00..59067.07 rows=2540307 width=8) (actual time=0.008..698.313 rows=2540324 loops=1)"
" -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.627..0.627 rows=1029 loops=1)"
" -> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.007..0.278 rows=1029 loops=1)"
"Total runtime: 5310.663 ms"
" -> Sort (cost=318308.62..319242.73 rows=373642 width=13) (actual time=5057.270..5196.780 rows=455733 loops=1)"
" Sort Key: m.id, m.name"
" Sort Method: external merge Disk: 11032kB"
" -> Hash Join (cost=110196.74..283725.63 rows=373642 width=13) (actual time=1706.287..3451.352 rows=455733 loops=1)"
" Hash Cond: (p.manufacturer_id = m.id)"
" -> Hash Join (cost=110163.59..278554.90 rows=373642 width=4) (actual time=1705.652..3230.879 rows=455733 loops=1)"
" Hash Cond: (o.product_id = p.id)"
" -> Bitmap Heap Scan on retailer_offer o (cost=9418.68..157960.21 rows=373642 width=4) (actual time=120.277..382.208 rows=455733 loops=1)"
" Recheck Cond: ((retailer_id = 1347) AND active)"
" -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..9325.27 rows=373642 width=0) (actual time=79.503..79.503 rows=455829 loops=1)"
" Index Cond: (retailer_id = 1347)"
" -> Hash (cost=59067.07..59067.07 rows=2540307 width=8) (actual time=1584.994..1584.994 rows=2540324 loops=1)"
" -> Seq Scan on product_product p (cost=0.00..59067.07 rows=2540307 width=8) (actual time=0.008..698.313 rows=2540324 loops=1)"
" -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.627..0.627 rows=1029 loops=1)"
" -> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.007..0.278 rows=1029 loops=1)"
"Total runtime: 5310.663 ms"
"Unique (cost=43237.52..43266.80 rows=1029 width=13) (actual time=190.978..196.625 rows=276 loops=1)"
" -> Sort (cost=43237.52..43247.28 rows=3903 width=13) (actual time=190.977..192.431 rows=11298 loops=1)"
" Sort Key: m.id, m.name"
" Sort Method: quicksort Memory: 1037kB"
" -> Hash Join (cost=134.14..43004.70 rows=3903 width=13) (actual time=5.006..155.188 rows=11298 loops=1)"
" Hash Cond: (p.manufacturer_id = m.id)"
" -> Nested Loop (cost=100.99..42917.88 rows=3903 width=4) (actual time=4.363..146.421 rows=11298 loops=1)"
" -> Bitmap Heap Scan on retailer_offer o (cost=100.99..13663.63 rows=3903 width=4) (actual time=4.345..29.871 rows=11298 loops=1)"
" Recheck Cond: ((retailer_id = 1710) AND active)"
" -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..100.02 rows=3903 width=0) (actual time=2.368..2.368 rows=11380 loops=1)"
" Index Cond: (retailer_id = 1710)"
" -> Index Scan using product_product_pkey on product_product p (cost=0.00..7.48 rows=1 width=8) (actual time=0.008..0.009 rows=1 loops=11298)"
" Index Cond: (p.id = o.product_id)"
" -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.634..0.634 rows=1029 loops=1)"
" -> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.275 rows=1029 loops=1)"
"Total runtime: 196.716 ms"
Thanks
--
Got needs? Get Goblin'! - http://www.pricegoblin.co.uk/
"John Lister" <john.lister-ps@kickstone.com> writes: > Am I right in assuming the planner thinks a sequential scan is quicker than 10k index hits, would tweaking the costs fixthis or would i be better updating the stats for the product_id and manufacturer_id fields? AFAICT the planner did exactly the right things here. Your first example is fetching 40 times as many rows from retailer_offer as the second one is. If the planner had stuck with the nestloop plan, it would've taken about 40x as long, and been significantly slower than the hash join. regards, tom lane
> "John Lister" <john.lister-ps@kickstone.com> writes: >> Am I right in assuming the planner thinks a sequential scan is quicker >> than 10k index hits, would tweaking the costs fix this or would i be >> better updating the stats for the product_id and manufacturer_id fields? > > AFAICT the planner did exactly the right things here. Your first > example is fetching 40 times as many rows from retailer_offer as > the second one is. If the planner had stuck with the nestloop plan, > it would've taken about 40x as long, and been significantly slower > than the hash join. Cheers for the quick reply, maybe not the best values, see the following 2 plans with approx the same number of product rows but different results and times. I forgot to mention that the product table has 2.5M rows although this is apparent from the plans: with hash join: "Unique (cost=199627.47..199900.51 rows=1029 width=13) (actual time=2226.358..2238.255 rows=49 loops=1)" " -> Sort (cost=199627.47..199718.48 rows=36406 width=13) (actual time=2226.356..2230.342 rows=37086 loops=1)" " Sort Key: m.name, m.id" " Sort Method: quicksort Memory: 3276kB" " -> Hash Join (cost=101700.78..196869.37 rows=36406 width=13) (actual time=1759.983..2193.453 rows=37086 loops=1)" " Hash Cond: (p.manufacturer_id = m.id)" " -> Hash Join (cost=101667.62..196335.64 rows=36406 width=4) (actual time=1759.338..2174.826 rows=37086 loops=1)" " Hash Cond: (o.product_id = p.id)" " -> Bitmap Heap Scan on retailer_offer o (cost=921.66..84697.06 rows=36406 width=4) (actual time=12.168..49.759 rows=37086 loops=1)" " Recheck Cond: ((retailer_id = 5149) AND active)" " -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..912.56 rows=36406 width=0) (actual time=7.136..7.136 rows=37089 loops=1)" " Index Cond: (retailer_id = 5149)" " -> Hash (cost=59067.54..59067.54 rows=2540354 width=8) (actual time=1746.670..1746.670 rows=2540383 loops=1)" " -> Seq Scan on product_product p (cost=0.00..59067.54 rows=2540354 width=8) (actual time=0.012..787.095 rows=2540383 loops=1)" " -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.635..0.635 rows=1029 loops=1)" " -> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.296 rows=1029 loops=1)" "Total runtime: 2244.036 ms" and without: "Unique (cost=43237.53..43266.80 rows=1029 width=13) (actual time=410.191..421.953 rows=332 loops=1)" " -> Sort (cost=43237.53..43247.29 rows=3903 width=13) (actual time=410.189..414.351 rows=32959 loops=1)" " Sort Key: m.name, m.id" " Sort Method: quicksort Memory: 3384kB" " -> Hash Join (cost=134.15..43004.71 rows=3903 width=13) (actual time=16.356..328.938 rows=32959 loops=1)" " Hash Cond: (p.manufacturer_id = m.id)" " -> Nested Loop (cost=100.99..42917.89 rows=3903 width=4) (actual time=15.716..308.037 rows=32959 loops=1)" " -> Bitmap Heap Scan on retailer_offer o (cost=100.99..13663.64 rows=3903 width=4) (actual time=15.693..67.479 rows=32959 loops=1)" " Recheck Cond: ((retailer_id = 2016) AND active)" " -> Bitmap Index Scan on idx_retaileroffer_retailerid (cost=0.00..100.02 rows=3903 width=0) (actual time=7.863..7.863 rows=33369 loops=1)" " Index Cond: (retailer_id = 2016)" " -> Index Scan using product_product_pkey on product_product p (cost=0.00..7.48 rows=1 width=8) (actual time=0.006..0.006 rows=1 loops=32959)" " Index Cond: (p.id = o.product_id)" " -> Hash (cost=20.29..20.29 rows=1029 width=13) (actual time=0.627..0.627 rows=1029 loops=1)" " -> Seq Scan on manufacturer_manufacturer m (cost=0.00..20.29 rows=1029 width=13) (actual time=0.009..0.270 rows=1029 loops=1)" "Total runtime: 422.058 ms" You can see that the sequential scan is significantly slower than the index scan (i've tried to mitigate any caching by the OS with these results). Postgresql 8.3.5 running on a Quad Core Xeon 2Ghz with 12Gb ram. All costs set to defaults, shared_buffers=4.2GB and effective_cache=6Gb. I thought with the later versions more shared_buffers was better, is this too much?? Thanks JOHN