Re: BUG #18627: Regression (15 -> 16) - Join removal not performed when join condition spans multiple tables - Mailing list pgsql-bugs

From Richard Guo
Subject Re: BUG #18627: Regression (15 -> 16) - Join removal not performed when join condition spans multiple tables
Date
Msg-id CAMbWs49f1+Dv7afcw4ngFw2cHLFRiqpzu-GypRVk_Ryd-gVRJw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18627: Regression (15 -> 16) - Join removal not performed when join condition spans multiple tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: BUG #18627: Regression (15 -> 16) - Join removal not performed when join condition spans multiple tables
List pgsql-bugs
On Mon, Sep 23, 2024 at 12:56 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What I'm
> thinking about is to test against joinrelids here instead of
> inputrelids.  As the adjacent comment says, that would get the
> wrong answer for "pushed down" conditions, but we could revert
> to the pre-v16 hack of checking relevant join conditions in
> the loop further down, ie put back this kluge:
>
>         if (RINFO_IS_PUSHED_DOWN(restrictinfo, joinrelids))
>         {
>             /*
>              * If such a clause actually references the inner rel then join
>              * removal has to be disallowed.  We have to check this despite
>              * the previous attr_needed checks because of the possibility of
>              * pushed-down clauses referencing the rel.
>              */
>             if (bms_is_member(innerrelid, restrictinfo->clause_relids))
>                 return false;
>             continue;            /* else, ignore; not useful here */
>         }
>
> That's ugly for sure, and it would only revert to the status
> quo ante.  But I'm not seeing a better way right now.

It seems to me that this approach basically reverts the changes in
8538519db.  I'm afraid this would re-introduce the bug fixed by that
commit.  For instance, this approach would incorrectly remove the join
in the query below:

create table t (a int unique, b int);

explain (costs off)
select 1 from t t1 left join t t2 on t1.a = t2.a where t1.b = coalesce(t2.b, 0);
    QUERY PLAN
------------------
 Seq Scan on t t1
(1 row)

... because it does not realize that there might be references to the
innerrel in ECs.  (Pre-v16 this join removal is prevented by the
delay_upper_joins flag.)

Besides, we have the logic that a PHV needn't prevent join removal if
it mentions the innerrel in ph_eval_at but its contained expression
doesn't actually reference the innerrel (see cad569205).  I think the
proposed approach would also break this logic.  For instance, in the
query below, the t2/t3 join should be removed but is prevented by the
PHV.

explain (costs off)
select 1 from t t1 left join
  (select 1 as x from t t2 left join t t3 on t2.a = t3.a) s on true
where s.x = 1;
                QUERY PLAN
------------------------------------------
 Nested Loop Left Join
   Filter: ((1) = 1)
   ->  Seq Scan on t t1
   ->  Materialize
         ->  Hash Left Join
               Hash Cond: (t2.a = t3.a)
               ->  Seq Scan on t t2
               ->  Hash
                     ->  Seq Scan on t t3
(9 rows)

Thanks
Richard



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Reference to - BUG #18349: ERROR: invalid DSA memory alloc request size 1811939328, CONTEXT: parallel worker
Next
From: Никита Калинин
Date:
Subject: Re: BUG #18609: Repeated installcheck failure in test_pg_dump due to existing role