Home > mailing lists

Re: Non-unique index performance - Mailing list pgsql-general

From	Sezai YILMAZ
Subject	Re: Non-unique index performance
Date	June 24, 2005 07:13:04
Msg-id	42BBDCAF.7080906@pro-g.com.tr Whole thread Raw
In response to	Re: Non-unique index performance (Richard Huxton <dev@archonet.com>)
Responses	Re: Non-unique index performance
List	pgsql-general

Tree view

Richard Huxton wrote:

> Sezai YILMAZ wrote:
>
>> Hello!
>>
>> I have a table eith name person as described below. It has an unique
>> index for id column (it is also primary key) and has an index for
>> parent column.
>>
>> If I run a query with where clause on id column it uses the index
>> (look at the first explain analyze result; it says "Index Scan
>> using...") and the query for 582856 rows table results in 225,893 ms.
>>
>> But, if I run another query with where clause on parent column it
>> does not use the index (look at the second explain analyze result; it
>> says "Seq Scan using...") and the query for 582856 rows table results
>> in 11192.913 ms.
>>
>> Why the difference of both queries is so dramatical for unique and
>> non-unique indexed columns? Why PostgreSQL does not use the
>> non-unique indexes (it says that it does sequential scan)?
>
>
> Because it thinks it will be faster/cheaper.
>
>> I have to use an index on non-unique column. What is the solution for
>> that? Is there a way to speed up non-unique indexes?
>
>
> You don't want to force it to use an index, you want it to make better
> estimates. Let's have a look...
>
>> *****************************************************************
>> test=> \d person
>>           Table "public.person"
>> Column  |         Type          | Modifiers
>> ---------+-----------------------+-----------
>> name    | character varying(30) |
>> surname | character varying(30) |
>> id      | integer               | not null
>> parent  | integer               |
>> Indexes:
>>    "person_pkey" primary key, btree (id)
>>    "parent_ndx" btree (parent)
>
>
> OK - all very simple. And you've said there are about 580,000 rows.
>
>> test=> explain analyze select id,name from person where id in
>> ('17201', '338191', '244319', '515209', '20415');
>
>
> Why are you quoting integers?

I qouted them to use indexes. The other method is type casting the
values to indexed column type. I prefer the quoting method.

>> test=> explain analyze select * from person where parent in ('17201',
>> '338191', '244319', '515209', '20415');
>>                                                   QUERY PLAN
>> -----------------------------------------------------------------------------------------------------------------
>>
>> Seq Scan on person  (cost=0.00..35871.26 rows=14427 width=32) (actual
>> time=0.063..11192.809 rows=5 loops=1)
>>   Filter: ((parent = 17201) OR (parent = 338191) OR (parent = 244319)
>> OR (parent = 515209) OR (parent = 20415))
>> Total runtime: 11192.913 ms
>
>
> Hmm - for some reason it's expecting 14427 rows to be returned. If
> there were that many matches, then it might well be a better choice
> than going back and fore between the index and the table all the time.
>
> So - we need to find out why it thinks there will be so many rows
> returned.
>
> 1. VACUUM FULL ANALYSE person;
> 2. re-run the explain

This solved the problem. Now it takes about 213 ms.

Thanks in advance.

--
sy

pgsql-general by date:

From: Richard Huxton
Date: 24 June 2005, 06:50:44
Subject: Re: Postgres 8.0 windows processes, field testing, and

From: Richard Huxton
Date: 24 June 2005, 07:25:50
Subject: Re: Non-unique index performance

Re: Non-unique index performance - Mailing list pgsql-general

Previous

Next