Home > mailing lists

Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

From	Alena Rybakina
Subject	Re: POC, WIP: OR-clause support for indexes
Date	November 30, 2023 08:05:31
Msg-id	a30691c4-a773-43bc-8c8f-339e6388c60e@postgrespro.ru Whole thread Raw
In response to	Re: POC, WIP: OR-clause support for indexes (Alena Rybakina <a.rybakina@postgrespro.ru>)
List	pgsql-hackers

Tree view

Sorry, I forgot to apply my patches. For the first experiment was 
0001-OR-to-ANY-in-parser-and-ANY-to-OR-in-index.diff and for the second 
experiment was 0002-OR-to-ANY-in-index.diff.

On 30.11.2023 11:00, Alena Rybakina wrote:
> Hi!
>
>>
>>> Honestly, it seems very hard to avoid the conclusion that this
>>> transformation is being done at too early a stage. Parse analysis is
>>> not the time to try to do query optimization. I can't really believe
>>> that there's a way to produce a committable patch along these lines.
>>> Ideally, a transformation like this should be done after we know what
>>> plan shape we're using (or considering using), so that we can make
>>> cost-based decisions about whether to transform or not. But at the
>>> very least it should happen somewhere in the planner. There's really
>>> no justification for parse analysis rewriting the SQL that the user
>>> entered.
>>
>> Here, we assume that array operation is generally better than many ORs.
>> As a result, it should be more effective to make OR->ANY 
>> transformation in the parser (it is a relatively lightweight 
>> operation here) and, as a second phase, decompose that in the optimizer.
>> We implemented earlier prototypes in different places of the 
>> optimizer, and I'm convinced that only this approach resolves the 
>> issues we found.
>> Does this approach look weird? Maybe. We can debate it in this thread.
>
> I think this is incorrect, and the example of A. Korotkov confirms 
> this. If we perform the conversion at the parsing stage, we will skip 
> the more important conversion using OR expressions. I'll show you in 
> the example below.
>
> First of all, I will describe my idea to combine two approaches to 
> obtaining plans with OR to ANY transformation and ANY to OR 
> transformation. I think they are both good, and we can't work with 
> just one of them, we should consider both the option of OR 
> expressions, and with ANY.
>
> I did this by creating a RelOptInfo with which has references from the 
> original RelOptInfo, for which conversion is possible either from 
> ANY->OR, or vice versa. After obtaining the necessary transformation, 
> I started the procedure for obtaining the seq and index paths for both 
> relations and then calculated their cost. The relation with the lowest 
> cost is considered the best.
>
> I'm not sure if this is the best approach, but it's less complicated.
>
> I noticed that I got a lower cost for not the best plan, but I think 
> this corresponds to another topic related to the wrong estimate 
> calculation.
>
> 1. The first patch is a mixture of the original patch (when we perform 
> the conversion of OR to ANY at the parsing stage), and when we perform 
> the conversion at the index creation stage with the conversion to an 
> OR expression. We can see that the query proposed by A.Korotkov did 
> not have the best plan with ANY expression at all, and even despite 
> receiving a query with OR expressions, we cannot get anything better 
> than SeqScan, due to the lack of effective logical transformations 
> that would have been performed if we had left the OR expressions.
>
> So, I got query plans using enable_or_transformation if it is enabled:
>
> postgres=# create table test as (select (random()*10)::int x, 
> (random()*1000) y
> from generate_series(1,1000000) i);
> create index test_x_1_y on test (y) where x = 1;
> create index test_x_2_y on test (y) where x = 2;
> vacuum analyze test;
> SELECT 1000000
> CREATE INDEX
> CREATE INDEX
> VACUUM
> postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
> WARNING:  cost with original approach: - 20440.000000
> WARNING:  cost with OR to ANY applied transfomation: - 15440.000000
>                                 QUERY PLAN
> -------------------------------------------------------------------------- 
>
>  Gather  (cost=1000.00..12690.10 rows=1 width=12)
>    Workers Planned: 2
>    ->  Parallel Seq Scan on test  (cost=0.00..11690.00 rows=1 width=12)
>          Filter: (((x = 1) OR (x = 2)) AND (y = '100'::double precision))
> (4 rows)
>
> and if it is off:
>
> postgres=# set enable_or_transformation =off;
> SET
> postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
>                                                   QUERY PLAN
> -------------------------------------------------------------------------------------------------------------- 
>
>  Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
>    Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = 
> '100'::double precision) AND (x = 2)))
>    ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
>          ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1 
> width=0)
>                Index Cond: (y = '100'::double precision)
>          ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1 
> width=0)
>                Index Cond: (y = '100'::double precision)
> (7 rows)
>
> 2. The second patch is my patch version when I moved the OR 
> transformation in the s index formation stage:
>
> So, I got the best query plan despite the possible OR to ANY 
> transformation:
>
> postgres=# create table test as (select (random()*10)::int x, 
> (random()*1000) y
> from generate_series(1,1000000) i);
> create index test_x_1_y on test (y) where x = 1;
> create index test_x_2_y on test (y) where x = 2;
> vacuum analyze test;
> SELECT 1000000
> CREATE INDEX
> CREATE INDEX
> VACUUM
> postgres=# explain select * from test where (x = 1 or x = 2) and y = 100;
> WARNING:  cost with original approach: - 12.618000
> WARNING:  cost with OR to ANY applied transfomation: - 15440.000000
>                                                   QUERY PLAN
> -------------------------------------------------------------------------------------------------------------- 
>
>  Bitmap Heap Scan on test  (cost=8.60..12.62 rows=1 width=12)
>    Recheck Cond: (((y = '100'::double precision) AND (x = 1)) OR ((y = 
> '100'::double precision) AND (x = 2)))
>    ->  BitmapOr  (cost=8.60..8.60 rows=1 width=0)
>          ->  Bitmap Index Scan on test_x_1_y  (cost=0.00..4.30 rows=1 
> width=0)
>                Index Cond: (y = '100'::double precision)
>          ->  Bitmap Index Scan on test_x_2_y  (cost=0.00..4.30 rows=1 
> width=0)
>                Index Cond: (y = '100'::double precision)
> (7 rows)
>
>
>
-- 
Regards,
Alena Rybakina
Postgres Professional

Attachment

pgsql-hackers by date:

From: Alena Rybakina
Date: 30 November 2023, 08:00:28
Subject: Re: POC, WIP: OR-clause support for indexes

From: Amit Kapila
Date: 30 November 2023, 08:05:32
Subject: Re: pg_upgrade and logical replication

Re: POC, WIP: OR-clause support for indexes - Mailing list pgsql-hackers

Attachment

Previous

Next