Home > mailing lists

Massive performance issues - Mailing list pgsql-performance

From	Matthew Sackman
Subject	Massive performance issues
Date	September 1, 2005 14:43:14
Msg-id	20050901174231.GE23970@pongo.lshift.net Whole thread
Responses	Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues Re: Massive performance issues
List	pgsql-performance

Tree view

Hi,

I'm having performance issues with a table consisting of 2,043,133 rows. The
schema is:

\d address
                  Table "public.address"
        Column        |          Type          | Modifiers
----------------------+------------------------+-----------
 postcode_top         | character varying(2)   | not null
 postcode_middle      | character varying(4)   | not null
 postcode_bottom      | character varying(7)   | not null
 postcode             | character varying(10)  | not null
 property_type        | character varying(15)  | not null
 sale_type            | character varying(10)  | not null
 flat_extra           | character varying(100) | not null
 number               | character varying(100) | not null
 street               | character varying(100) | not null
 locality_1           | character varying(100) | not null
 locality_2           | character varying(100) | not null
 city                 | character varying(100) | not null
 county               | character varying(100) | not null
Indexes:
    "address_city_index" btree (city)
    "address_county_index" btree (county)
    "address_locality_1_index" btree (locality_1)
    "address_locality_2_index" btree (locality_2)
    "address_pc_bottom_index" btree (postcode_bottom)
    "address_pc_middle_index" btree (postcode_middle)
    "address_pc_top_index" btree (postcode_top)
    "address_pc_top_middle_bottom_index" btree (postcode_top,
                             postcode_middle, postcode_bottom)
    "address_pc_top_middle_index" btree (postcode_top, postcode_middle)
    "address_postcode_index" btree (postcode)
    "address_property_type_index" btree (property_type)
    "address_street_index" btree (street)
    "street_prefix" btree (lower("substring"((street)::text, 1, 1)))

This is with postgresql 7.4 running on linux 2.6.11 with a 3GHz P4 and a
SATA harddrive.

Queries such as:

select locality_2 from address where locality_2 = 'Manchester';

are taking 14 seconds to complete, and this is only 2 years worth of
data - we will have up to 15 years (so over 15 million rows).

Interestingly, doing:
explain select locality_2 from address where locality_2 = 'Manchester';
gives
                           QUERY PLAN
----------------------------------------------------------------
 Seq Scan on address  (cost=0.00..80677.16 rows=27923 width=12)
   Filter: ((locality_2)::text = 'Manchester'::text)

but:
explain select locality_1 from address where locality_1 = 'Manchester';
gives
                           QUERY PLAN
----------------------------------------------------------------
 Index Scan using address_locality_1_index on address
(cost=0.00..69882.18 rows=17708 width=13)
   Index Cond: ((locality_1)::text = 'Manchester'::text)

Sadly, using the index makes things worse, the query taking 17 seconds.

locality_1 has 16650 distinct values and locality_2 has 1156 distinct
values.

Whilst the locality_2 query is in progress, both the disk and the CPU
are maxed out with the disk constantly reading at 60MB/s and the CPU
rarely dropping under 100% load.

With the locality_1 query in progress, the CPU is maxed out but the disk
is reading at just 3MB/s.

Obviously, to me, this is a problem, I need these queries to be under a
second to complete. Is this unreasonable? What can I do to make this "go
faster"? I've considered normalising the table but I can't work out
whether the slowness is in dereferencing the pointers from the index
into the table or in scanning the index in the first place. And
normalising the table is going to cause much pain when inserting values
and I'm not entirely sure if I see why normalising it should cause a
massive performance improvement.

I need to get to the stage where I can run queries such as:
select street, locality_1, locality_2, city from address
where (city = 'Nottingham' or locality_2 = 'Nottingham'
       or locality_1 = 'Nottingham')
  and upper(substring(street from 1 for 1)) = 'A'
group by street, locality_1, locality_2, city
order by street
limit 20 offset 0

and have the results very quickly.

Any help most gratefully received (even if it's to say that I should be
posting to a different mailing list!).

Many thanks,

Matthew

pgsql-performance by date:

From: "Luke Lonergan"
Date: 01 September 2005, 14:38:06
Subject: Re: Need for speed 3

From: "Merlin Moncure"
Date: 01 September 2005, 15:34:54
Subject: Re: Massive performance issues

Massive performance issues - Mailing list pgsql-performance

Previous

Next