Re: Non-trivial condition is only propagated to one side of JOIN - Mailing list pgsql-hackers

From Tobias Hoffmann
Subject Re: Non-trivial condition is only propagated to one side of JOIN
Date
Msg-id b00b800b-591c-5699-b0e2-c6550056153d@thax.hardliners.org
Whole thread Raw
In response to Re: Non-trivial condition is only propagated to one side of JOIN  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 25/08/2024 19:28, Tom Lane wrote:
> For this particular case, you could probably get somewhere by
> writing
>
> SELECT * FROM view1 WHERE site_id = 1
> UNION ALL
> SELECT * FROM view1 WHERE site_id IS NULL;
>
Thank you for your suggestion, Tom.

Unfortunately, as I now understand, nothing *except*  `var = const` can 
ever be propagated to the second branch of the join.
In particular, even just `WHERE site_id IS NULL` no longer propagates 
like `WHERE site_id = 1` does.
Other cases, that do not propagate, include `WHERE site_id IN (1, 2)`.

I'll probably have to find another workaround to my current problem.

----

More generally, I think that the currently possible set is very 
restrictive and affects not just edge-cases;

my SQL-Engine-implementation-fu is far from good enough for the 
necessary changes, though.

Here are some more thoughts:

> Maybe some machinery could be built that would do something useful
> with an OR clause of this form,
>
> [...]
> An important point here is that "WHERE A = B AND p(A)" does not permit
> us to deduce "p(B)" for arbitrary conditions p(),

IMO the relevant equality should be the `ON tbl2.site_id IS NOT DISTINCT 
FROM tbl1.site_id` (resp. `ON tbl2.site_id = tbl1.site_id`, but nulls 
probably need special care), which should allow any predicate `p` only 
depending on `tbl1.site_id` (i.e.`WHERE p(tbl1.site_id)`, from 
"outside") to be pulled "inside" the INNER JOIN or LEFT JOIN, because no 
row which does not satisfy `p(tbl1.site_id)`, and, via equivalence, 
`p(tbl2.site_id)` could ever be part of the result.

More specifically, given a WHERE clause in CNF (i.e. `p0(...) AND 
p1(...) AND (p2a(...) OR p2b(...)) AND ...`), every top-level term which 
only uses variables which are deemed equivalent, should be allowed to 
propagate.

If this is too difficult, not just single constants (`var = const`), but 
sets of constants (`var = ANY(...)`) and/or especially `var IS NULL`) 
should be considered.

Just my 2ct...

   Tobias





pgsql-hackers by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: type cache cleanup improvements
Next
From: Daniel Gustafsson
Date:
Subject: Re: how to log into commitfest.postgresql.org and begin review patch