Re: Query not using index pgsql 8.2.3 - Mailing list pgsql-general
From | Henrik Zagerholm |
---|---|
Subject | Re: Query not using index pgsql 8.2.3 |
Date | |
Msg-id | CF8A4C11-DB11-450E-AA64-0FC6537AFC7D@mac.se Whole thread Raw |
In response to | Re: Query not using index pgsql 8.2.3 (Michael Fuhr <mike@fuhr.org>) |
Responses |
Re: Query not using index pgsql 8.2.3
|
List | pgsql-general |
23 mar 2007 kl. 13:34 skrev Michael Fuhr: > On Fri, Mar 23, 2007 at 12:41:58PM +0100, Henrik Zagerholm wrote: >> 23 mar 2007 kl. 12:33 skrev Michael Fuhr: >>> The row count estimate for fk_filetype_id = 83 is high by an order >>> of magnitude: >>> >>>> Bitmap Index Scan on tbl_file_idx6 (cost=0.00..25.65 rows=1251 >>>> width=0) (actual time=21.958..21.958 rows=112 loops=1) >>>> >>>> Index Cond: (fk_filetype_id = 83) >>> >>> Have you run ANALYZE or VACUUM ANALYZE on these tables recently? >>> If so then you might try increasing the statistics target for >>> tbl_file.fk_filetype_id and perhaps some of the columns in the join >>> conditions. >> >> I did a vacuum full and reindex on all tables. > > VACUUM FULL is seldom (if ever) necessary if you're running plain > VACUUM (without FULL) often enough, either manually or via autovacuum. > >> Now I also did a vacuum analyze on tbl_acl (the biggest table with >> about 4.5 millin rows) >> >> Same result. > > I'd suggest analyzing all tables. The bad estimate I mentioned > appears to be for a column in tbl_file so if you didn't analyze > that table then the query plan probably won't improve. > >> But do you mean if the row_count estimate is big it can't use any >> index on any other table within the JOINs? > > High row count estimates make the planner think that scanning entire > tables would be faster than using indexes. The more of a table a > query must fetch the less efficient an index scan becomes, to the > point that a sequential scan is faster than an index scan. > >> Any specific parameters I should adjust? > > If analyzing the tables doesn't improve the row count estimates > then try increasing some columns' statistics targets and re-analyze > the table or just that column. Example: > > ALTER TABLE tbl_file ALTER fk_filetype_id SET STATISTICS 100; > ANALYZE tbl_file (fk_filetype_id); > I analyzed all tables and found no difference but after changing the statistics for fk_filetype_id I got way better results. Still there is a big cost on table tbl_file_structure_id it is making a Bitmap heap scan. Could I do something to improve this? Thanks for all your help so far! Limit (cost=115168.22..115169.12 rows=20 width=173) (actual time=5138.401..5138.982 rows=20 loops=1) -> GroupAggregate (cost=115168.22..115341.60 rows=3853 width=173) (actual time=5138.394..5138.929 rows=20 loops=1) -> Sort (cost=115168.22..115177.85 rows=3853 width=173) (actual time=5138.339..5138.430 rows=73 loops=1) Sort Key: tbl_file.file_name, tbl_file.file_ctime, tbl_structure.structure_path, tbl_computer.pk_computer_id, tbl_filetype.filetype_icon, tbl_computer.computer_name, tbl_share.share_name, tbl_share.share_path, tbl_file_structure.fk_file_id -> Nested Loop (cost=8.48..114938.74 rows=3853 width=173) (actual time=185.588..5118.684 rows=730 loops=1) -> Seq Scan on tbl_filetype (cost=0.00..1.25 rows=1 width=18) (actual time=0.036..0.053 rows=1 loops=1) Filter: (83 = pk_filetype_id) -> Hash Join (cost=8.48..114898.96 rows=3853 width=171) (actual time=185.538..5116.063 rows=730 loops=1) Hash Cond: (tbl_archive.fk_share_id = tbl_share.pk_share_id) -> Nested Loop (cost=6.27..114843.77 rows=3853 width=146) (actual time=167.608..5093.255 rows=730 loops=1) -> Nested Loop (cost=6.27..32079.13 rows=1167 width=146) (actual time=130.594..3448.927 rows=330 loops=1) Join Filter: (tbl_archive.pk_archive_id = tbl_structure.fk_archive_id) -> Nested Loop (cost=4.89..31185.00 rows=1167 width=138) (actual time=119.861..3402.527 rows=330 loops=1) -> Nested Loop (cost=4.89..24413.49 rows=1167 width=56) (actual time=82.203..2097.142 rows=330 loops=1) -> Index Scan using tbl_file_idx6 on tbl_file (cost=0.00..344.56 rows=114 width=40) (actual time=13.164..15.475 rows=106 loops=1) Index Cond: (fk_filetype_id = 83) -> Bitmap Heap Scan on tbl_file_structure (cost=4.89..210.27 rows=69 width=24) (actual time=8.736..19.606 rows=3 loops=106) Recheck Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id) -> Bitmap Index Scan on tbl_file_structure_idx (cost=0.00..4.88 rows=69 width=0) (actual time=2.574..2.574 rows=3 loops=106) Index Cond: (tbl_file.pk_file_id = tbl_file_structure.fk_file_id) -> Index Scan using tbl_structure_pkey on tbl_structure (cost=0.00..5.79 rows=1 width=98) (actual time=3.942..3.945 rows=1 loops=330) Index Cond: (tbl_structure.pk_structure_id = tbl_file_structure.fk_structure_id) -> Materialize (cost=1.37..1.71 rows=34 width=16) (actual time=0.034..0.077 rows=34 loops=330) -> Seq Scan on tbl_archive (cost=0.00..1.34 rows=34 width=16) (actual time=10.606..10.693 rows=34 loops=1) Filter: archive_complete -> Index Scan using tbl_acl_idx on tbl_acl (cost=0.00..70.47 rows=36 width=8) (actual time=4.964..4.971 rows=2 loops=330) Index Cond: (tbl_acl.fk_file_structure_id = tbl_file_structure.pk_file_structure_id) -> Hash (cost=2.16..2.16 rows=4 width=41) (actual time=17.890..17.890 rows=4 loops=1) -> Hash Join (cost=1.07..2.16 rows=4 width=41) (actual time=17.848..17.872 rows=4 loops=1) Hash Cond: (tbl_share.fk_computer_id = tbl_computer.pk_computer_id) -> Seq Scan on tbl_share (cost=0.00..1.04 rows=4 width=29) (actual time=0.093..0.099 rows=4 loops=1) -> Hash (cost=1.03..1.03 rows=3 width=20) (actual time=17.724..17.724 rows=3 loops=1) -> Seq Scan on tbl_computer (cost=0.00..1.03 rows=3 width=20) (actual time=17.697..17.706 rows=3 loops=1) Total runtime: 5139.608 ms > The default statistics target is 10; the maximum value is 1000. > > On some systems I've found that reducing random_page_cost from 4 > (the default) to 2 gives more realistic cost estimates for index > scans. > > -- > Michael Fuhr
pgsql-general by date: