Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers
From | Alena Rybakina |
---|---|
Subject | Re: POC, WIP: OR-clause support for indexes |
Date | |
Msg-id | a30691c4-a773-43bc-8c8f-339e6388c60e@postgrespro.ru Whole thread Raw |
In response to | Re: POC, WIP: OR-clause support for indexes (Alena Rybakina <a.rybakina@postgrespro.ru>) |
List | pgsql-hackers |
Sorry, I forgot to apply my patches. For the first experiment was 0001-OR-to-ANY-in-parser-and-ANY-to-OR-in-index.diff and for the second experiment was 0002-OR-to-ANY-in-index.diff. On 30.11.2023 11:00, Alena Rybakina wrote: > Hi! > >> >>> Honestly, it seems very hard to avoid the conclusion that this >>> transformation is being done at too early a stage. Parse analysis is >>> not the time to try to do query optimization. I can't really believe >>> that there's a way to produce a committable patch along these lines. >>> Ideally, a transformation like this should be done after we know what >>> plan shape we're using (or considering using), so that we can make >>> cost-based decisions about whether to transform or not. But at the >>> very least it should happen somewhere in the planner. There's really >>> no justification for parse analysis rewriting the SQL that the user >>> entered. >> >> Here, we assume that array operation is generally better than many ORs. >> As a result, it should be more effective to make OR->ANY >> transformation in the parser (it is a relatively lightweight >> operation here) and, as a second phase, decompose that in the optimizer. >> We implemented earlier prototypes in different places of the >> optimizer, and I'm convinced that only this approach resolves the >> issues we found. >> Does this approach look weird? Maybe. We can debate it in this thread. > > I think this is incorrect, and the example of A. Korotkov confirms > this. If we perform the conversion at the parsing stage, we will skip > the more important conversion using OR expressions. I'll show you in > the example below. > > First of all, I will describe my idea to combine two approaches to > obtaining plans with OR to ANY transformation and ANY to OR > transformation. I think they are both good, and we can't work with > just one of them, we should consider both the option of OR > expressions, and with ANY. > > I did this by creating a RelOptInfo with which has references from the > original RelOptInfo, for which conversion is possible either from > ANY->OR, or vice versa. After obtaining the necessary transformation, > I started the procedure for obtaining the seq and index paths for both > relations and then calculated their cost. The relation with the lowest > cost is considered the best. > > I'm not sure if this is the best approach, but it's less complicated. > > I noticed that I got a lower cost for not the best plan, but I think > this corresponds to another topic related to the wrong estimate > calculation. > > 1. The first patch is a mixture of the original patch (when we perform > the conversion of OR to ANY at the parsing stage), and when we perform > the conversion at the index creation stage with the conversion to an > OR expression. We can see that the query proposed by A.Korotkov did > not have the best plan with ANY expression at all, and even despite > receiving a query with OR expressions, we cannot get anything better > than SeqScan, due to the lack of effective logical transformations > that would have been performed if we had left the OR expressions. > > So, I got query plans using enable_or_transformation if it is enabled: > > postgres=# create table test as (select (random()*10)::int x, > (random()*1000) y > from generate_series(1,1000000) i); > create index test_x_1_y on test (y) where x = 1; > create index test_x_2_y on test (y) where x = 2; > vacuum analyze test; > SELECT 1000000 > CREATE INDEX > CREATE INDEX > VACUUM > postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; > WARNING: cost with original approach: - 20440.000000 > WARNING: cost with OR to ANY applied transfomation: - 15440.000000 > QUERY PLAN > -------------------------------------------------------------------------- > > Gather (cost=1000.00..12690.10 rows=1 width=12) > Workers Planned: 2 > -> Parallel Seq Scan on test (cost=0.00..11690.00 rows=1 width=12) > Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision)) > (4 rows) > > and if it is off: > > postgres=# set enable_or_transformation =off; > SET > postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; > QUERY PLAN > -------------------------------------------------------------------------------------------------------------- > > Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) > Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = > '100'::double precision) AND (x = 2))) > -> BitmapOr (cost=8.60..8.60 rows=1 width=0) > -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30 rows=1 > width=0) > Index Cond: (y = '100'::double precision) > -> Bitmap Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 > width=0) > Index Cond: (y = '100'::double precision) > (7 rows) > > 2. The second patch is my patch version when I moved the OR > transformation in the s index formation stage: > > So, I got the best query plan despite the possible OR to ANY > transformation: > > postgres=# create table test as (select (random()*10)::int x, > (random()*1000) y > from generate_series(1,1000000) i); > create index test_x_1_y on test (y) where x = 1; > create index test_x_2_y on test (y) where x = 2; > vacuum analyze test; > SELECT 1000000 > CREATE INDEX > CREATE INDEX > VACUUM > postgres=# explain select * from test where (x = 1 or x = 2) and y = 100; > WARNING: cost with original approach: - 12.618000 > WARNING: cost with OR to ANY applied transfomation: - 15440.000000 > QUERY PLAN > -------------------------------------------------------------------------------------------------------------- > > Bitmap Heap Scan on test (cost=8.60..12.62 rows=1 width=12) > Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = > '100'::double precision) AND (x = 2))) > -> BitmapOr (cost=8.60..8.60 rows=1 width=0) > -> Bitmap Index Scan on test_x_1_y (cost=0.00..4.30 rows=1 > width=0) > Index Cond: (y = '100'::double precision) > -> Bitmap Index Scan on test_x_2_y (cost=0.00..4.30 rows=1 > width=0) > Index Cond: (y = '100'::double precision) > (7 rows) > > > -- Regards, Alena Rybakina Postgres Professional
Attachment
pgsql-hackers by date: