Thread: How to get started hacking on pgsql
I have an idea for what I think may be a very simple optimization for postgres to make. I would like to try my hand at implementing it, but the last time I tried I apparently started off in the wrong direction. In the following query, the sort step is completely unnecessary. The order is already guaranteed by the index: test=# create table test (a integer,b integer); CREATE TABLE test=# create index test_i on test(a,b); CREATE INDEX test=# explain select * from test where a=1 order by b; QUERY PLAN -------------------------------------------------------------------------Sort (cost=5.95..5.96 rows=6 width=8) Sort Key:b -> Index Scan using test_i on test (cost=0.00..5.87 rows=6 width=8) Index Cond: (a = 1) (4 rows) At what point in the process would it make sense to check for this? Where should I be looking in the code? -- greg
Greg Stark kirjutas N, 04.12.2003 kell 19:55: > I have an idea for what I think may be a very simple optimization for postgres > to make. I would like to try my hand at implementing it, but the last time I > tried I apparently started off in the wrong direction. > > In the following query, the sort step is completely unnecessary. The order is > already guaranteed by the index: > > > test=# create table test (a integer,b integer); > CREATE TABLE > test=# create index test_i on test(a,b); > CREATE INDEX > test=# explain select * from test where a=1 order by b; > QUERY PLAN > ------------------------------------------------------------------------- > Sort (cost=5.95..5.96 rows=6 width=8) > Sort Key: b > -> Index Scan using test_i on test (cost=0.00..5.87 rows=6 width=8) > Index Cond: (a = 1) > (4 rows) > > At what point in the process would it make sense to check for this? Why not rewrite it as: test=# explain select * from test where a=1 order by a,b; ---------------------------------------------------------------------Index Scan using test_i on test (cost=0.00..17.07 rows=5width=8) Index Cond: (a = 1) (2 rows) > Where should I be looking in the code? Try to find where the modified query is tested for. It's probably be inside the optimizer, as index scan + no sort is not always faster than seq scan + sort, as shown by the same query after vacuum analyze (on an empty table) hannu=# vacuum analyze test; VACUUM hannu=# explain select * from test where a=1 order by a,b; QUERY PLAN -----------------------------------------------------------Sort (cost=0.01..0.02 rows=1 width=8) Sort Key: a, b -> SeqScan on test (cost=0.00..0.00 rows=1 width=8) Filter: (a = 1) (4 rows) --------------- Hannu
Hannu Krosing kirjutas N, 04.12.2003 kell 23:01: > > > Where should I be looking in the code? > > Try to find where the modified query is tested for. It's probably be > inside the optimizer, as index scan + no sort is not always faster than > seq scan + sort, as shown by the same query after vacuum analyze (on an > empty table) OTOH, it may be that all combinations of sort and index and where are not watched in the optimiser proper at all (too compliaced and/or too costly), but a keyhole optimiser is run over its resulting "best" plan to remove redundant sorts (but it misses combinations of sort and where like the one in your example) --------------- Hannu
Greg Stark <gsstark@mit.edu> writes: > At what point in the process would it make sense to check for this? You'd need to mess with the code that generates "pathkeys" describing the sort ordering of index scans --- read about pathkeys in src/backend/optimizer/README. As Hannu notes nearby, the existing code is not broken for cases likeexplain select * from test where a=1 order by a,b; and it would not be cool to break that case while fixingexplain select * from test where a=1 order by b; This probably means that you'd need to offer up multiple pathkey descriptions of an index's sort order, ie, both ((a), (b)) and ((b)). I'm not sure how painful that would be. You could quick-hack it by generating multiple indexscan Paths that are really identical but have different pathkeys --- but I think that would have unpleasant consequences for planning time ... it'd be better to attach multiple pathkeys to a single Path. regards, tom lane