Re: [Fwd: Index Advisor] - Mailing list pgsql-hackers
From | Gurjeet Singh |
---|---|
Subject | Re: [Fwd: Index Advisor] |
Date | |
Msg-id | 65937bea0611192135q2f9e8a74he5b84b5b0880f5be@mail.gmail.com Whole thread Raw |
In response to | Re: [Fwd: Index Advisor] (Kai-Uwe Sattler <kus@tu-ilmenau.de>) |
Responses |
Re: [Fwd: Index Advisor]
Re: [Fwd: Index Advisor] |
List | pgsql-hackers |
On 11/19/06, Kai-Uwe Sattler <kus@tu-ilmenau.de> wrote:
If this is intended behaviour, then its okay.
I am referring to the way get_column_names() is coded. First, the SQL for the portal does not guarantee any order of the result; secondly, the 'for' loops that follow, will always output the columns in their increasing order of attribute number. Here's a small way to reproduce the bug, that I cooked up just now:
Change the SQL in read_advisor_output() to:
res = PQexec(conn, "DECLARE myportal CURSOR FOR "
"SELECT relname,"
"int2vector_to_string(index_attrs) AS colids,"
"MAX(index_pages) AS size_in_pages,"
"SUM(profit) AS benefit,"
"SUM(profit)/MAX(index_pages) AS gain "
"FROM pg_indexadvisor,"
"pg_class "
"WHERE backend_pid = 0 "
"AND rel_oid = pg_class.oid "
"GROUP BY relname, colids "
"ORDER BY gain DESC");
Notice the backend_pid = 0. Now insert the following into pg_indexadvisor:
insert into pg_indexadvisor values( 1259, '2 1', 2, 1000, 20,0,0 );
This should prompt the advisor to generate the statement:
create index advidx_1 on pg_class ( relnamespace,relname);
But instead, it will output this:
create index advidx_1 on pg_class ( relname,relnamespace );
Now run the advisor with any workload, and inspect the output.
$ pg_advise.exe -d postgres -h localhost -p 5432 -U gsk -o create_index.sql workload.sql
We should tokenize the list of attribute numbers (column_ids variable) in get_column_names() and query them individually.
Hi,
> .) The SELECTs in the pg_advise are returning wrong results, when
> the same index is suggested twice, because of the SUM() aggregates.
I don't think that this is a bug. If the same index is recommended
for two different queries it will appear two times in
pg_indexadvisor. So, if you want to calculate the overall benefit of
this index, then you have to sum up the local benefits for each query.
If this is intended behaviour, then its okay.
> .) I doubt that on a table t(a,b), for a suggestion of idx(b,a),
> pg_advise will
> suggest idx(a,b);
?? Not sure, if I understand you right. idx(b,a) and idx(a,b) are
completely different indexes. Why should pg_advise suggest idx(a,b).
I am referring to the way get_column_names() is coded. First, the SQL for the portal does not guarantee any order of the result; secondly, the 'for' loops that follow, will always output the columns in their increasing order of attribute number. Here's a small way to reproduce the bug, that I cooked up just now:
Change the SQL in read_advisor_output() to:
res = PQexec(conn, "DECLARE myportal CURSOR FOR "
"SELECT relname,"
"int2vector_to_string(index_attrs) AS colids,"
"MAX(index_pages) AS size_in_pages,"
"SUM(profit) AS benefit,"
"SUM(profit)/MAX(index_pages) AS gain "
"FROM pg_indexadvisor,"
"pg_class "
"WHERE backend_pid = 0 "
"AND rel_oid = pg_class.oid "
"GROUP BY relname, colids "
"ORDER BY gain DESC");
Notice the backend_pid = 0. Now insert the following into pg_indexadvisor:
insert into pg_indexadvisor values( 1259, '2 1', 2, 1000, 20,0,0 );
This should prompt the advisor to generate the statement:
create index advidx_1 on pg_class ( relnamespace,relname);
But instead, it will output this:
Now run the advisor with any workload, and inspect the output.
$ pg_advise.exe -d postgres -h localhost -p 5432 -U gsk -o create_index.sql workload.sql
We should tokenize the list of attribute numbers (column_ids variable) in get_column_names() and query them individually.
But there is another bug: if there are recommendations like idx
(a,b,c), idx(a,b) and idx(a) it would be a good idea to create just
idx(a). I will add this to pg_advise as an optional feature.
I'd say it's a new feature request and not a bug :) But I don't understand why would you want to not build idx(a,b,c) in such a situation? idx(a,b,c) can be useful in places where idx(a,b) or idx(a) is required, but the same can't be said about idx(a) or idx(a,b) being useful where idx(a,b,c) is required!
Best regards,
--
gurjeet[.singh]@EnterpriseDB.com
singh.gurjeet@{ gmail | hotmail | yahoo }.com
pgsql-hackers by date: