Home > mailing lists

Re: indexing for distinct search in timestamp based table - Mailing list pgsql-performance

From	Vladimir Sitnikov
Subject	Re: indexing for distinct search in timestamp based table
Date	September 5, 2008 12:10:47
Msg-id	1d709ecc0809050810m42bfbedfk98ac7d83f13fb7@mail.gmail.com Whole thread Raw
In response to	indexing for distinct search in timestamp based table ("Rainer Mager" <rainer@vanten.com>)
Responses	Re: indexing for distinct search in timestamp based table
List	pgsql-performance

Tree view

You might get great improvement for '%' cases using index on channel_name(<field>, start_time) and a little bit of pl/pgsql

Basically, you need to implement the following algorithm:

1) curr_<field> = ( select min(<field>) from ad_log )

2) record_exists = ( select 1 from ad_log where <field>=cur_<field> and _all_other_conditions limit 1 )

3) if record_exists==1 then add curr_<field> to the results

3) curr_<field> = (select min(<field>) from ad_log where <field> > curr_<field> )

4) if curr_<field> is not null then goto 2

I believe it might make sense implement this approach in the core (I would call it "index distinct scan")

That could dramatically improve "select distinct <column> from <table>" and "select <column> from <table> group by <column>" kind of queries when there exists an index on <column> and a particular column has very small number of distinct values.

For instance: say a table has 10'000'000 rows, while column of interest has only 20 distinct values. In that case, the database will be able to get every of those 20 values in virtually 20 index lookups.

What does the community think about that?

pgsql-performance by date:

From: "Nikolas Everett"
Date: 05 September 2008, 11:24:06
Subject: Re: SAN and full_page_writes

From: Alvaro Herrera
Date: 05 September 2008, 13:25:48
Subject: Re: too many clog files

Re: indexing for distinct search in timestamp based table - Mailing list pgsql-performance

Previous

Next