Thread: Instances where enable_seqscan = false is good
Same query, executed twice, once using seqscan enabled and the other with it disabled. Difference is nearly night and day. How can I persuade PG to use the index w/o resorting to setting seqscan = false (actually, I don't know what are the pro or cons - I read posts from the archives far back as 2005 but that was dealing with inconsistencies in the text string eg: warwa(s/z/etc..) which caused it to pick seq scans.) PPl in IRC suggested setting default_statistics = 100 but I didn't find that useful nor helpful. Also, tables has been vacuum and analysed. Using Seq-scans --------------- QUERY PLAN -------------------------------------------------------------------------------------------------- Limit (cost=4430.53..50173.70 rows=1000 width=47) (actual time=21832.092..43771.536 rows=228 loops=1) -> Hash Join (cost=4430.53..260866.77 rows=5606 width=47) (actual time=21832.088..43770.927 rows=228 loops=1) Hash Cond: ((trz.number)::text = (drv.number)::text) -> Seq Scan on zone trz (cost=0.00..233254.27 rows=6148222 width=39) (actual time=22.807..31891.591 rows=6181910 loops=1) Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL)) -> Hash (cost=4414.39..4414.39 rows=1291 width=24) (actual time=171.911..171.911 rows=12591 loops=1) -> Bitmap Heap Scan on drv (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594 rows=12591 loops=1) Recheck Cond: ((code)::text = 'NRN15'::text) -> Bitmap Index Scan on idx_drv (cost=0.00..30.11 rows=1291 width=0) (actual time=62.199..62.199 rows=12649 loops=1) Index Cond: ((code)::text = 'NRN15'::text) Total runtime: 43772.045 ms (11 rows) set enable_seqscan = false; QUERY PLAN ----------------------------------------------------------------------------------------------------------- -- Limit (cost=0.00..69314.54 rows=1000 width=47) (actual time=122.920..553.538 rows=228 loops=1) -> Nested Loop (cost=0.00..388646.63 rows=5607 width=47) (actual time=122.915..552.956 rows=228 loops=1) -> Index Scan using idx_drv on drv (cost=0.00..5077.64 rows=1293 width=24) (actual time=38.164..110.933 rows=12591 loops=1) Index Cond: ((code)::text = 'NRN15'::text) -> Index Scan using idx_trz_sn on zone trz (cost=0.00..295.10 rows=120 width=39) (actual time=0.021..0.021 rows=0 loops=12591) Index Cond: ((drv.number)::text = (trz.number)::text) Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL)) Total runtime: 553.964 ms (8 rows)
Attachment
"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes: > Same query, executed twice, once using seqscan enabled and the other > with it disabled. Difference is nearly night and day. > > > How can I persuade PG to use the index w/o resorting to setting seqscan > = false The usual knob to fiddle with is random_page_cost. If your database fits mostly in memory you may want to turn it down from the default of 4 to something closer to 1. Perhaps 2 or even 1.5 or so. But don't do it based on a single query under testing conditions, use a wide variety of queries under production conditions. > QUERY PLAN > -------------------------------------------------------------------------------------------------- > Limit (cost=4430.53..50173.70 rows=1000 width=47) (actual time=21832.092..43771.536 rows=228 loops=1) > -> Hash Join (cost=4430.53..260866.77 rows=5606 width=47) (actual time=21832.088..43770.927 rows=228 loops=1) The difference between the predicted and actual rows is suspicious. let's look lower down to see where it comes from. > Hash Cond: ((trz.number)::text = (drv.number)::text) > -> Seq Scan on zone trz (cost=0.00..233254.27 rows=6148222 width=39) (actual time=22.807..31891.591 rows=6181910loops=1) > Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL)) This part looks ok 615k versus 618k is pretty good. > -> Hash (cost=4414.39..4414.39 rows=1291 width=24) (actual time=171.911..171.911 rows=12591 loops=1) Ah, this is off by an order of magnitude, that's bad. > -> Bitmap Heap Scan on drv (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594 rows=12591loops=1) > Recheck Cond: ((code)::text = 'NRN15'::text) > -> Bitmap Index Scan on idx_drv (cost=0.00..30.11 rows=1291 width=0) (actual time=62.199..62.199rows=12649 loops=1) > Index Cond: ((code)::text = 'NRN15'::text) So you might want to increase the statistics target for the "code" column. Incidentally the way this is written makes me wonder what data type "code" is defined as. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote: > "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes: > > > > How can I persuade PG to use the index w/o resorting to setting seqscan > > = false > > The usual knob to fiddle with is random_page_cost. If your database fits > mostly in memory you may want to turn it down from the default of 4 to > something closer to 1. I tried down to 0.4 before it resorted to using the index. The DB shouldn't fit into memory (I think) that table alone has ~8million rows at ~1.5G size > > -> Hash (cost=4414.39..4414.39 rows=1291 width=24) (actual time=171.911..171.911 rows=12591 loops=1) > > Ah, this is off by an order of magnitude, that's bad. > having said so, still don't understand why.. > > -> Bitmap Heap Scan on drv (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594 rows=12591loops=1) > > Recheck Cond: ((code)::text = 'NRN15'::text) > > -> Bitmap Index Scan on idx_drv (cost=0.00..30.11 rows=1291 width=0) (actual time=62.199..62.199rows=12649 loops=1) > > Index Cond: ((code)::text = 'NRN15'::text) > > So you might want to increase the statistics target for the "code" column. Err.. how come? (newbie) it's scanning the index there. What's bad is that it's using Seq_scans on the "zone" table. > Incidentally the way this is written makes me wonder what data type "code" is > defined as. code is defined as varchar(5) data type. I'm changing all of the normal char(n) to varchar(n) columns.. BTW, thanks for helping. Not using seq scans does really make a huge difference as you can clearly see from the timing. Total runtime: 43772.045 ms Total runtime: 553.964 ms
On Tue, 2007-09-04 at 10:06 +0800, Ow Mun Heng wrote: > On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote: > > "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes: > > > -> Bitmap Heap Scan on drv (cost=30.44..4414.39 rows=1291 width=24) (actual time=62.980..142.594 rows=12591loops=1) > > > Recheck Cond: ((code)::text = 'NRN15'::text) > > > -> Bitmap Index Scan on idx_drv (cost=0.00..30.11 rows=1291 width=0) (actual time=62.199..62.199rows=12649 loops=1) > > > Index Cond: ((code)::text = 'NRN15'::text) > > > > So you might want to increase the statistics target for the "code" column. > Err.. how come? (newbie) it's scanning the index there. What's bad is > that it's using Seq_scans on the "zone" table. Code is now increased to Stat level of 100 ---------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=25113.04..30733.57 rows=1000 width=47) (actual time=19666.832..39961.032 rows=228 loops=1) -> Hash Join (cost=25113.04..324620.08 rows=53288 width=47) (actual time=19666.826..39960.437 rows=228 loops=1) Hash Cond: ((trz.number)::text = (drv.number)::text) -> Seq Scan on zone trz (cost=0.00..234363.75 rows=6394431 width=39) (actual time=17.635..29164.929 rows=6222984loops=1) Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL)) -> Hash (cost=24965.72..24965.72 rows=11785 width=24) (actual time=215.851..215.851 rows=12591 loops=1) -> Bitmap Heap Scan on drv (cost=243.76..24965.72 rows=11785 width=24) (actual time=50.910..188.894 rows=12591loops=1) Recheck Cond: ((code)::text = 'NRN15'::text) -> Bitmap Index Scan on idx_drv (cost=0.00..240.82 rows=11785 width=0) (actual time=49.180..49.180rows=12591 loops=1) Index Cond: ((code)::text = 'NRN15'::text) Total runtime: 39961.703 ms Does seem to be slightly better (from 43772ms) trz.number stat level increased to 100 & code to 100 QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=25113.04..30733.57 rows=1000 width=47) (actual time=22152.398..42430.820 rows=228 loops=1) -> Hash Join (cost=25113.04..324620.08 rows=53288 width=47) (actual time=22152.392..42430.212 rows=228 loops=1) Hash Cond: ((trz.number)::text = (drv.number)::text) -> Seq Scan on zone trz (cost=0.00..234363.75 rows=6394431 width=39) (actual time=11.840..28808.222 rows=6222984loops=1) Filter: ((zn_2 <> 0::numeric) AND (zn_2 IS NOT NULL)) -> Hash (cost=24965.72..24965.72 rows=11785 width=24) (actual time=2646.652..2646.652 rows=12591 loops=1) -> Bitmap Heap Scan on drv (cost=243.76..24965.72 rows=11785 width=24) (actual time=50.628..2600.132 rows=12591loops=1) Recheck Cond: ((code)::text = 'NRN15'::text) -> Bitmap Index Scan on idx_drvl (cost=0.00..240.82 rows=11785 width=0) (actual time=38.436..38.436rows=12591 loops=1) Index Cond: ((code)::text = 'NRN15'::text) Total runtime: 42431.358 ms hmm..not much difference.. What else can be done?? Many Thanks..
"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes: > On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote: >> "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes: >> > >> > How can I persuade PG to use the index w/o resorting to setting seqscan >> > = false >> >> The usual knob to fiddle with is random_page_cost. If your database fits >> mostly in memory you may want to turn it down from the default of 4 to >> something closer to 1. > > I tried down to 0.4 before it resorted to using the index. The DB > shouldn't fit into memory (I think) that table alone has ~8million rows > at ~1.5G size Values under 1 are nonsensical. Basically being as low as 1 means you're telling the database that a random access i/o takes the same amount of time as a sequential i/o. (Actually we have sequential_page_cost now so I guess instead of "1" I should say "the same as sequential_page_cost" but I'm assuming you haven't modified sequential_page_cost from the default of 1 have you?) -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Tue, 2007-09-04 at 05:15 +0100, Gregory Stark wrote: > "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes: > > > On Mon, 2007-09-03 at 11:31 +0100, Gregory Stark wrote: > >> "Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes: > >> > > >> > How can I persuade PG to use the index w/o resorting to setting seqscan > >> > = false > >> > >> The usual knob to fiddle with is random_page_cost. If your database fits > >> mostly in memory you may want to turn it down from the default of 4 to > >> something closer to 1. > > > > I tried down to 0.4 before it resorted to using the index. The DB > > shouldn't fit into memory (I think) that table alone has ~8million rows > > at ~1.5G size > > Values under 1 are nonsensical. exactly, might as well use enable_seqscan=false. So it's still default at 4 > Basically being as low as 1 means you're > telling the database that a random access i/o takes the same amount of time as > a sequential i/o. (Actually we have sequential_page_cost now so I guess > instead of "1" I should say "the same as sequential_page_cost" but I'm > assuming you haven't modified sequential_page_cost from the default of 1 have > you?) Have not changed anything in that area. Question is.. Do I need to? or should I try out something just to see how it is? (any) Recommendations would be good.
"Ow Mun Heng" <Ow.Mun.Heng@wdc.com> writes: > Have not changed anything in that area. Question is.. Do I need to? or > should I try out something just to see how it is? > (any) Recommendations would be good. Sorry, I don't have all the original plans. Can you post the explain analyze with and without enable_seqscan now that the stats are giving good predictions? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com