Home > mailing lists

Re: Removing unneeded self joins - Mailing list pgsql-hackers

From	Alexander Korotkov
Subject	Re: Removing unneeded self joins
Date	October 18, 2023 18:50:37
Msg-id	CAPpHfdvJ3jBQaJQKrf69K++kXi+Tvzn6TXO-DKXj0KVJjvMebg@mail.gmail.com Whole thread Raw
In response to	Re: Removing unneeded self joins (Andrei Lepikhov <a.lepikhov@postgrespro.ru>)
Responses	Re: Removing unneeded self joins
List	pgsql-hackers

Tree view

On Mon, Oct 16, 2023 at 11:28 AM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 12/10/2023 18:32, Alexander Korotkov wrote:
> > On Thu, Oct 5, 2023 at 12:17 PM Andrei Lepikhov
> > <a.lepikhov@postgrespro.ru> wrote:
> >> On 4/10/2023 14:34, Alexander Korotkov wrote:
> >>>   > Relid replacement machinery is the most contradictory code here. We used
> >>>   > a utilitarian approach and implemented a simplistic variant.
> >>>
> >>>   > > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> >>>   > > they are not necessary.  [1] points that infrastructure from [2] might
> >>>   > > be useful.  The patchset from [2] seems committed mow.  However, I
> >>>   > > can't see it is directly helpful in this matter.  Could we just skip
> >>>   > > adding IS NOT NULL clause for the columns, that have
> >>>   > > pg_attribute.attnotnull set?
> >>>   > Thanks for the links, I will look into that case.
> >> To be more precise, in the attachment, you can find a diff to the main
> >> patch, which shows the volume of changes to achieve the desired behaviour.
> >> Some explains in regression tests shifted. So, I've made additional tests:
> >>
> >> DROP TABLE test CASCADE;
> >> CREATE TABLE test (a int, b int not null);
> >> CREATE UNIQUE INDEX abc ON test(b);
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b;
> >> CREATE UNIQUE INDEX abc1 ON test(a,b);
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b;
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b AND (t1.a=t2.a OR t2.a=t1.a);
> >> DROP INDEX abc1;
> >> explain SELECT * FROM test t1 JOIN test t2 ON (t1.a=t2.a)
> >> WHERE t1.b=t2.b AND (t1.b=t2.b OR t2.b=t1.b);
> >>
> >> We have almost the results we wanted to have. But in the last explain
> >> you can see that nothing happened with the OR clause. We should use the
> >> expression mutator instead of walker to handle such clauses. But It
> >> doesn't process the RestrictInfo node ... I'm inclined to put a solution
> >> of this issue off for a while.
> >
> > OK.  I think it doesn't worth to eliminate IS NULL quals with this
> > complexity (at least at this stage of work).
> >
> > I made improvements over the code.  Mostly new comments, grammar
> > corrections of existing comments and small refactoring.
> >
> > Also, I found that the  suggestion from David Rowley [1] to qsort
> > array of relations to faster find duplicates is still unaddressed.
> > I've implemented it.  That helps to evade quadratic complexity with
> > large number of relations.
> >
> > Also I've incorporated improvements from Alena Rybakina except one for
> > skipping SJ removal when no SJ quals is found.  It's not yet clear for
> > me if this check fix some cases. But at least optimization got skipped
> > in some useful cases (as you can see in regression tests).
>
> I would like to propose one more minor improvement (see in attachment).
> The idea here is that after removing a self-join and changing clauses we
> should re-probe the set of relids with the same Oid, because we can find
> more removable self-joins (see the demo test in join.sql).


Thank you, I've integrated this into the patch.  BTW, the patch
introduces two new GUC variables: enable_self_join_removal,
self_join_search_limit.  enable_self_join_removal variable turns
on/off optimization at all.  self_join_search_limit variable limits
its usage by the number of joins.  AFICS, self_join_search_limit is
intended to protect us from quadratic complexity self-join removal
has.  I tried to reproduce the extreme case.

SELECT count(*) FROM pgbench_accounts a0, pgbench_accounts a1, ...,
pgbench_accounts a100 WHERE a0.aid = 1 AND a1.aid = a0.aid + 1 AND ...
AND a100.aid = a99.aid + 1;

This query took 3778.432 ms with self-join removal disabled, and
3756.009 ms with self-join removal enabled.  So, no measurable
overhead.  Similar to the higher number of joins.  Can you imagine
some extreme case when self-join removal could introduce significant
overhead in comparison with other optimizer parts?  If not, should we
remove self_join_search_limit GUC?

------
Regards,
Alexander Korotkov

Attachment

0001-Remove-useless-self-joins-v45.patch

pgsql-hackers by date:

From: Stephen Frost
Date: 18 October 2023, 18:48:22
Subject: Re: [PoC/RFC] Multiple passwords, interval expirations

From: Stephen Frost
Date: 18 October 2023, 19:00:42
Subject: Re: Add the ability to limit the amount of memory that can be allocated to backends.

Re: Removing unneeded self joins - Mailing list pgsql-hackers

Attachment

Previous

Next