Thread: How to get started hacking on pgsql

How to get started hacking on pgsql

From

Greg Stark

Date:

04 December 2003, 13:56:36

I have an idea for what I think may be a very simple optimization for postgres
to make. I would like to try my hand at implementing it, but the last time I
tried I apparently started off in the wrong direction. 

In the following query, the sort step is completely unnecessary. The order is
already guaranteed by the index:


test=# create table test (a integer,b integer);
CREATE TABLE
test=# create index test_i on test(a,b);
CREATE INDEX
test=# explain select * from test where a=1 order by b;                              QUERY PLAN
      
 
-------------------------------------------------------------------------Sort  (cost=5.95..5.96 rows=6 width=8)  Sort
Key:b  ->  Index Scan using test_i on test  (cost=0.00..5.87 rows=6 width=8)        Index Cond: (a = 1)
 
(4 rows)



At what point in the process would it make sense to check for this?
Where should I be looking in the code?

-- 
greg

Re: How to get started hacking on pgsql

From

Hannu Krosing

Date:

04 December 2003, 17:03:06

Greg Stark kirjutas N, 04.12.2003 kell 19:55:
> I have an idea for what I think may be a very simple optimization for postgres
> to make. I would like to try my hand at implementing it, but the last time I
> tried I apparently started off in the wrong direction. 
> 
> In the following query, the sort step is completely unnecessary. The order is
> already guaranteed by the index:
> 
> 
> test=# create table test (a integer,b integer);
> CREATE TABLE
> test=# create index test_i on test(a,b);
> CREATE INDEX
> test=# explain select * from test where a=1 order by b;
>                                QUERY PLAN                                
> -------------------------------------------------------------------------
>  Sort  (cost=5.95..5.96 rows=6 width=8)
>    Sort Key: b
>    ->  Index Scan using test_i on test  (cost=0.00..5.87 rows=6 width=8)
>          Index Cond: (a = 1)
> (4 rows)
>
> At what point in the process would it make sense to check for this?

Why not rewrite it as:

test=# explain select * from test where a=1 order by a,b;
---------------------------------------------------------------------Index Scan using test_i on test  (cost=0.00..17.07
rows=5width=8)  Index Cond: (a = 1)
 
(2 rows)

> Where should I be looking in the code?

Try to find where the modified query is tested for. It's probably be
inside the optimizer, as index scan + no sort is not always faster than
seq scan + sort, as shown by the same query after vacuum analyze (on an
empty table)

hannu=# vacuum analyze test;
VACUUM
hannu=# explain select * from test where a=1 order by a,b;                       QUERY PLAN
-----------------------------------------------------------Sort  (cost=0.01..0.02 rows=1 width=8)  Sort Key: a, b  ->
SeqScan on test  (cost=0.00..0.00 rows=1 width=8)        Filter: (a = 1)
 
(4 rows)

---------------
Hannu

Re: How to get started hacking on pgsql

From

Hannu Krosing

Date:

04 December 2003, 17:11:17

Hannu Krosing kirjutas N, 04.12.2003 kell 23:01:

> 
> > Where should I be looking in the code?
> 
> Try to find where the modified query is tested for. It's probably be
> inside the optimizer, as index scan + no sort is not always faster than
> seq scan + sort, as shown by the same query after vacuum analyze (on an
> empty table)

OTOH, it may be that all combinations of sort and index and where are
not watched in the optimiser proper at all (too compliaced and/or too
costly), but a keyhole optimiser is run over its resulting  "best" plan
to remove redundant sorts (but it misses combinations of sort and where
like the one in your example)

---------------
Hannu

Re: How to get started hacking on pgsql

From

Tom Lane

Date:

05 December 2003, 01:16:46

Greg Stark <gsstark@mit.edu> writes:
> At what point in the process would it make sense to check for this?

You'd need to mess with the code that generates "pathkeys" describing
the sort ordering of index scans --- read about pathkeys in
src/backend/optimizer/README.  As Hannu notes nearby, the existing
code is not broken for cases likeexplain select * from test where a=1 order by a,b;
and it would not be cool to break that case while fixingexplain select * from test where a=1 order by b;
This probably means that you'd need to offer up multiple pathkey
descriptions of an index's sort order, ie, both ((a), (b)) and ((b)).
I'm not sure how painful that would be.  You could quick-hack it by
generating multiple indexscan Paths that are really identical but have
different pathkeys --- but I think that would have unpleasant
consequences for planning time ... it'd be better to attach multiple
pathkeys to a single Path.
        regards, tom lane