Massive performance issues - Mailing list pgsql-performance
From | Matthew Sackman |
---|---|
Subject | Massive performance issues |
Date | |
Msg-id | 20050901174231.GE23970@pongo.lshift.net Whole thread Raw |
Responses |
Re: Massive performance issues
Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues |
List | pgsql-performance |
Hi, I'm having performance issues with a table consisting of 2,043,133 rows. The schema is: \d address Table "public.address" Column | Type | Modifiers ----------------------+------------------------+----------- postcode_top | character varying(2) | not null postcode_middle | character varying(4) | not null postcode_bottom | character varying(7) | not null postcode | character varying(10) | not null property_type | character varying(15) | not null sale_type | character varying(10) | not null flat_extra | character varying(100) | not null number | character varying(100) | not null street | character varying(100) | not null locality_1 | character varying(100) | not null locality_2 | character varying(100) | not null city | character varying(100) | not null county | character varying(100) | not null Indexes: "address_city_index" btree (city) "address_county_index" btree (county) "address_locality_1_index" btree (locality_1) "address_locality_2_index" btree (locality_2) "address_pc_bottom_index" btree (postcode_bottom) "address_pc_middle_index" btree (postcode_middle) "address_pc_top_index" btree (postcode_top) "address_pc_top_middle_bottom_index" btree (postcode_top, postcode_middle, postcode_bottom) "address_pc_top_middle_index" btree (postcode_top, postcode_middle) "address_postcode_index" btree (postcode) "address_property_type_index" btree (property_type) "address_street_index" btree (street) "street_prefix" btree (lower("substring"((street)::text, 1, 1))) This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a SATA harddrive. Queries such as: select locality_2 from address where locality_2 = 'Manchester'; are taking 14 seconds to complete, and this is only 2 years worth of data - we will have up to 15 years (so over 15 million rows). Interestingly, doing: explain select locality_2 from address where locality_2 = 'Manchester'; gives QUERY PLAN ---------------------------------------------------------------- Seq Scan on address (cost=0.00..80677.16 rows=27923 width=12) Filter: ((locality_2)::text = 'Manchester'::text) but: explain select locality_1 from address where locality_1 = 'Manchester'; gives QUERY PLAN ---------------------------------------------------------------- Index Scan using address_locality_1_index on address (cost=0.00..69882.18 rows=17708 width=13) Index Cond: ((locality_1)::text = 'Manchester'::text) Sadly, using the index makes things worse, the query taking 17 seconds. locality_1 has 16650 distinct values and locality_2 has 1156 distinct values. Whilst the locality_2 query is in progress, both the disk and the CPU are maxed out with the disk constantly reading at 60MB/s and the CPU rarely dropping under 100% load. With the locality_1 query in progress, the CPU is maxed out but the disk is reading at just 3MB/s. Obviously, to me, this is a problem, I need these queries to be under a second to complete. Is this unreasonable? What can I do to make this "go faster"? I've considered normalising the table but I can't work out whether the slowness is in dereferencing the pointers from the index into the table or in scanning the index in the first place. And normalising the table is going to cause much pain when inserting values and I'm not entirely sure if I see why normalising it should cause a massive performance improvement. I need to get to the stage where I can run queries such as: select street, locality_1, locality_2, city from address where (city = 'Nottingham' or locality_2 = 'Nottingham' or locality_1 = 'Nottingham') and upper(substring(street from 1 for 1)) = 'A' group by street, locality_1, locality_2, city order by street limit 20 offset 0 and have the results very quickly. Any help most gratefully received (even if it's to say that I should be posting to a different mailing list!). Many thanks, Matthew
pgsql-performance by date: