Home > mailing lists

Re: [Fwd: Index Advisor] - Mailing list pgsql-hackers

From	Gurjeet Singh
Subject	Re: [Fwd: Index Advisor]
Date	November 20, 2006 02:02:39
Msg-id	65937bea0611192135q2f9e8a74he5b84b5b0880f5be@mail.gmail.com Whole thread Raw
In response to	Re: [Fwd: Index Advisor] (Kai-Uwe Sattler <kus@tu-ilmenau.de>)
Responses	Re: [Fwd: Index Advisor] Re: [Fwd: Index Advisor]
List	pgsql-hackers

Tree view

On 11/19/06, Kai-Uwe Sattler <kus@tu-ilmenau.de> wrote:

Hi,
> .) The SELECTs in the pg_advise are returning wrong results, when
> the same index is suggested twice, because of the SUM() aggregates.
I don't think that this is a bug. If the same index is recommended
for two different queries it will appear two times in
pg_indexadvisor. So, if you want to calculate the overall benefit of
this index, then you have to sum up the local benefits for each query.

If this is intended behaviour, then its okay.

> .) I doubt that on a table t(a,b), for a suggestion of idx(b,a),
> pg_advise will
> suggest idx(a,b);
?? Not sure, if I understand you right. idx(b,a) and idx(a,b) are
completely different indexes. Why should pg_advise suggest idx(a,b).

I am referring to the way get_column_names() is coded. First, the SQL for the portal does not guarantee any order of the result; secondly, the 'for' loops that follow, will always output the columns in their increasing order of attribute number. Here's a small way to reproduce the bug, that I cooked up just now:

Change the SQL in read_advisor_output() to:

res = PQexec(conn, "DECLARE myportal CURSOR FOR "
            "SELECT    relname,"
                    "int2vector_to_string(index_attrs) AS colids,"
                    "MAX(index_pages) AS size_in_pages,"
                    "SUM(profit) AS benefit,"
                    "SUM(profit)/MAX(index_pages) AS gain "
            "FROM    pg_indexadvisor,"
                    "pg_class "
            "WHERE    backend_pid = 0 "
            "AND        rel_oid = pg_class.oid "
            "GROUP BY    relname, colids "
            "ORDER BY gain DESC");

Notice the backend_pid = 0. Now insert the following into pg_indexadvisor:

insert into pg_indexadvisor values( 1259, '2 1', 2, 1000, 20,0,0 );

This should prompt the advisor to generate the statement:

create index advidx_1 on pg_class ( relnamespace,relname);

But instead, it will output this:

create index advidx_1 on pg_class ( relname,relnamespace );

Now run the advisor with any workload, and inspect the output.

$ pg_advise.exe -d postgres -h localhost -p 5432 -U gsk -o create_index.sql workload.sql

We should tokenize the list of attribute numbers (column_ids variable) in get_column_names() and query them individually.

But there is another bug: if there are recommendations like idx
(a,b,c), idx(a,b) and idx(a) it would be a good idea to create just
idx(a). I will add this to pg_advise as an optional feature.

I'd say it's a new feature request and not a bug :) But I don't understand why would you want to not build idx(a,b,c) in such a situation? idx(a,b,c) can be useful in places where idx(a,b) or idx(a) is required, but the same can't be said about idx(a) or idx(a,b) being useful where idx(a,b,c) is required!

Best regards,

--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com

pgsql-hackers by date:

From: Alvaro Herrera
Date: 19 November 2006, 18:38:26
Subject: Re: [GENERAL] Allowing SYSDATE to Work

From: Michael Fuhr
Date: 20 November 2006, 02:03:24
Subject: Statistics visibility in SERIALIZABLE transactions

Re: [Fwd: Index Advisor] - Mailing list pgsql-hackers

Previous

Next