Thread: Support "Right Semi Join" plan shapes
Hi Richard Guo
I see that the test samples are all (exists) subqueries ,I think semi join should also support ( in) and ( any) subqueries. would you do more test on ( in) and ( any) subqueries?
Best whish
Hi Richard Guo
I did a simple test ,Subqueries of type (in) can be supported, There is a test sql that doesn't support it, and I think that's because it can't pull up the subqueries.
```
test=# explain (costs off) SELECT t1.* FROM prt1_adv t1 WHERE EXISTS (SELECT 1 FROM prt2_adv t2 WHERE t1.a = t2.b) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: t1.a
-> Hash Right Semi Join
Hash Cond: (t2.b = t1.a)
-> Append
-> Seq Scan on prt2_adv_p1 t2_1
-> Seq Scan on prt2_adv_p2 t2_2
-> Seq Scan on prt2_adv_p3 t2_3
-> Hash
-> Append
-> Seq Scan on prt1_adv_p1 t1_1
Filter: (b = 0)
-> Seq Scan on prt1_adv_p2 t1_2
Filter: (b = 0)
-> Seq Scan on prt1_adv_p3 t1_3
Filter: (b = 0)
(16 rows)
test=# explain (costs off) SELECT t1.* FROM prt1_adv t1 WHERE t1.a IN (SELECT t2.b FROM prt2_adv t2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: t1.a
-> Hash Right Semi Join
Hash Cond: (t2.b = t1.a)
-> Append
-> Seq Scan on prt2_adv_p1 t2_1
-> Seq Scan on prt2_adv_p2 t2_2
-> Seq Scan on prt2_adv_p3 t2_3
-> Hash
-> Append
-> Seq Scan on prt1_adv_p1 t1_1
Filter: (b = 0)
-> Seq Scan on prt1_adv_p2 t1_2
Filter: (b = 0)
-> Seq Scan on prt1_adv_p3 t1_3
Filter: (b = 0)
(16 rows)
test=#
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: t1.a
-> Hash Right Semi Join
Hash Cond: (t2.b = t1.a)
-> Append
-> Seq Scan on prt2_adv_p1 t2_1
-> Seq Scan on prt2_adv_p2 t2_2
-> Seq Scan on prt2_adv_p3 t2_3
-> Hash
-> Append
-> Seq Scan on prt1_adv_p1 t1_1
Filter: (b = 0)
-> Seq Scan on prt1_adv_p2 t1_2
Filter: (b = 0)
-> Seq Scan on prt1_adv_p3 t1_3
Filter: (b = 0)
(16 rows)
test=# explain (costs off) SELECT t1.* FROM prt1_adv t1 WHERE t1.a IN (SELECT t2.b FROM prt2_adv t2) AND t1.b = 0 ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: t1.a
-> Hash Right Semi Join
Hash Cond: (t2.b = t1.a)
-> Append
-> Seq Scan on prt2_adv_p1 t2_1
-> Seq Scan on prt2_adv_p2 t2_2
-> Seq Scan on prt2_adv_p3 t2_3
-> Hash
-> Append
-> Seq Scan on prt1_adv_p1 t1_1
Filter: (b = 0)
-> Seq Scan on prt1_adv_p2 t1_2
Filter: (b = 0)
-> Seq Scan on prt1_adv_p3 t1_3
Filter: (b = 0)
(16 rows)
test=#
test=# explain (costs off) SELECT t1.* FROM plt1_adv t1 WHERE EXISTS (SELECT 1 FROM plt2_adv t2 WHERE t1.a = t2.a AND t1.c = t2.c) AND t1.b < 10 ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: t1.a
-> Hash Right Semi Join
Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
-> Seq Scan on plt2_adv_p1 t2_1
-> Seq Scan on plt2_adv_p2 t2_2
-> Seq Scan on plt2_adv_p3 t2_3
-> Hash
-> Append
-> Seq Scan on plt1_adv_p1 t1_1
Filter: (b < 10)
-> Seq Scan on plt1_adv_p2 t1_2
Filter: (b < 10)
-> Seq Scan on plt1_adv_p3 t1_3
Filter: (b < 10)
(16 rows)
test=#
test=# explain (costs off) SELECT t1.* FROM plt1_adv t1 WHERE (t1.a, t1.c) IN (SELECT t2.a, t2.c FROM plt2_adv t2) AND t1.b < 10 ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: t1.a
-> Hash Right Semi Join
Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
-> Seq Scan on plt2_adv_p1 t2_1
-> Seq Scan on plt2_adv_p2 t2_2
-> Seq Scan on plt2_adv_p3 t2_3
-> Hash
-> Append
-> Seq Scan on plt1_adv_p1 t1_1
Filter: (b < 10)
-> Seq Scan on plt1_adv_p2 t1_2
Filter: (b < 10)
-> Seq Scan on plt1_adv_p3 t1_3
Filter: (b < 10)
(16 rows)
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: t1.a
-> Hash Right Semi Join
Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
-> Seq Scan on plt2_adv_p1 t2_1
-> Seq Scan on plt2_adv_p2 t2_2
-> Seq Scan on plt2_adv_p3 t2_3
-> Hash
-> Append
-> Seq Scan on plt1_adv_p1 t1_1
Filter: (b < 10)
-> Seq Scan on plt1_adv_p2 t1_2
Filter: (b < 10)
-> Seq Scan on plt1_adv_p3 t1_3
Filter: (b < 10)
(16 rows)
test=#
test=# explain (costs off) SELECT t1.* FROM plt1_adv t1 WHERE (t1.a, t1.c) IN (SELECT t2.a, t2.c FROM plt2_adv t2) AND t1.b < 10 ORDER BY t1.a;
QUERY PLAN
------------------------------------------------------
Sort
Sort Key: t1.a
-> Hash Right Semi Join
Hash Cond: ((t2.a = t1.a) AND (t2.c = t1.c))
-> Append
-> Seq Scan on plt2_adv_p1 t2_1
-> Seq Scan on plt2_adv_p2 t2_2
-> Seq Scan on plt2_adv_p3 t2_3
-> Hash
-> Append
-> Seq Scan on plt1_adv_p1 t1_1
Filter: (b < 10)
-> Seq Scan on plt1_adv_p2 t1_2
Filter: (b < 10)
-> Seq Scan on plt1_adv_p3 t1_3
Filter: (b < 10)
(16 rows)
```
```
test=# explain (costs off) select * from int4_tbl i4, tenk1 a
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;
QUERY PLAN
-------------------------------------------------
Hash Right Semi Join
Hash Cond: (b.twothousand = a.twothousand)
Join Filter: (a.fivethous <> b.fivethous)
-> Seq Scan on tenk1 b
-> Hash
-> Hash Join
Hash Cond: (a.tenthous = i4.f1)
-> Seq Scan on tenk1 a
-> Hash
-> Seq Scan on int4_tbl i4
(10 rows)
test=# explain (costs off ) SELECT *
FROM int4_tbl i4, tenk1 a
WHERE (a.twothousand, a.fivethous) IN (
SELECT b.twothousand, b.fivethous
FROM tenk1 b
WHERE a.twothousand = b.twothousand and a.fivethous <> b.fivethous
)
AND i4.f1 = a.tenthous;
QUERY PLAN
----------------------------------------------------------------------------------------
Nested Loop
Join Filter: (i4.f1 = a.tenthous)
-> Seq Scan on tenk1 a
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on tenk1 b
Filter: ((a.fivethous <> fivethous) AND (a.twothousand = twothousand))
-> Materialize
-> Seq Scan on int4_tbl i4
(9 rows)
where exists(select * from tenk1 b
where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
and i4.f1 = a.tenthous;
QUERY PLAN
-------------------------------------------------
Hash Right Semi Join
Hash Cond: (b.twothousand = a.twothousand)
Join Filter: (a.fivethous <> b.fivethous)
-> Seq Scan on tenk1 b
-> Hash
-> Hash Join
Hash Cond: (a.tenthous = i4.f1)
-> Seq Scan on tenk1 a
-> Hash
-> Seq Scan on int4_tbl i4
(10 rows)
test=# explain (costs off ) SELECT *
FROM int4_tbl i4, tenk1 a
WHERE (a.twothousand, a.fivethous) IN (
SELECT b.twothousand, b.fivethous
FROM tenk1 b
WHERE a.twothousand = b.twothousand and a.fivethous <> b.fivethous
)
AND i4.f1 = a.tenthous;
QUERY PLAN
----------------------------------------------------------------------------------------
Nested Loop
Join Filter: (i4.f1 = a.tenthous)
-> Seq Scan on tenk1 a
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on tenk1 b
Filter: ((a.fivethous <> fivethous) AND (a.twothousand = twothousand))
-> Materialize
-> Seq Scan on int4_tbl i4
(9 rows)
test=# set enable_nestloop =off;
SET
test=# explain (costs off ) SELECT *
FROM int4_tbl i4, tenk1 a
WHERE (a.twothousand, a.fivethous) IN (
SELECT b.twothousand, b.fivethous
FROM tenk1 b
WHERE a.twothousand = b.twothousand and a.fivethous <> b.fivethous
)
AND i4.f1 = a.tenthous;
QUERY PLAN
----------------------------------------------------------------------------------------
Hash Join
Hash Cond: (a.tenthous = i4.f1)
-> Seq Scan on tenk1 a
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on tenk1 b
Filter: ((a.fivethous <> fivethous) AND (a.twothousand = twothousand))
-> Hash
-> Seq Scan on int4_tbl i4
(9 rows)
SET
test=# explain (costs off ) SELECT *
FROM int4_tbl i4, tenk1 a
WHERE (a.twothousand, a.fivethous) IN (
SELECT b.twothousand, b.fivethous
FROM tenk1 b
WHERE a.twothousand = b.twothousand and a.fivethous <> b.fivethous
)
AND i4.f1 = a.tenthous;
QUERY PLAN
----------------------------------------------------------------------------------------
Hash Join
Hash Cond: (a.tenthous = i4.f1)
-> Seq Scan on tenk1 a
Filter: (SubPlan 1)
SubPlan 1
-> Seq Scan on tenk1 b
Filter: ((a.fivethous <> fivethous) AND (a.twothousand = twothousand))
-> Hash
-> Seq Scan on int4_tbl i4
(9 rows)
```
wenhui qiu <qiuwenhuifx@gmail.com> 于2023年12月15日周五 14:40写道:
Hi Richard GuoI see that the test samples are all (exists) subqueries ,I think semi join should also support ( in) and ( any) subqueries. would you do more test on ( in) and ( any) subqueries?
Best whish
On Mon, 24 Jun 2024 at 17:59, Richard Guo <guofenglinux@gmail.com> wrote: > Thank you for reviewing. > > On Mon, Jun 24, 2024 at 1:27 PM Li Japin <japinli@hotmail.com> wrote: >> + /* >> + * For now we do not support RIGHT_SEMI join in mergejoin or nestloop >> + * join. >> + */ >> + if (jointype == JOIN_RIGHT_SEMI) >> + return; >> + >> >> How about adding some reasons here? > > I've included a brief explanation in select_mergejoin_clauses. > Thank you for updating the patch. >> + * this is a right-semi join, or this is a right/right-anti/full join and >> + * there are nonmergejoinable join clauses. The executor's mergejoin >> >> Maybe we can put the right-semi join together with the right/right-anti/full >> join. Is there any other significance by putting it separately? > > I don't think so. The logic is different: for right-semi join we will > always set *mergejoin_allowed to false, while for right/right-anti/full > join it is set to false only if there are nonmergejoinable join clauses. > Got it. Thanks for the explanation. >> Maybe the following comments also should be updated. Right? > > Correct. And there are a few more places where we need to mention > JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment > for SpecialJoinInfo. > > > I noticed that this patch changes the plan of a query in join.sql from > a semi join to right semi join, compromising the original purpose of > this query, which was to test the fix for neqjoinsel's behavior for > semijoins (see commit 7ca25b7d). > > -- > -- semijoin selectivity for <> > -- > explain (costs off) > select * from int4_tbl i4, tenk1 a > where exists(select * from tenk1 b > where a.twothousand = b.twothousand and a.fivethous <> b.fivethous) > and i4.f1 = a.tenthous; > > So I've changed this test case a bit so that it is still testing what it > is supposed to test. > > In passing, I've also updated the commit message to clarify that this > patch does not address the support of "Right Semi Join" for merge joins. > Tested and looks good to me! -- Regrads, Japin Li
Hi Japin Li
Thank you for your reviewing ,This way the notes are more accurate and complete. Thanks also to the author for updating the patch ,I also tested the new patch ,It looks good to me
Regrads
Thank you for your reviewing ,This way the notes are more accurate and complete. Thanks also to the author for updating the patch ,I also tested the new patch ,It looks good to me
Regrads
Japin Li <japinli@hotmail.com> 于2024年6月25日周二 08:51写道:
On Mon, 24 Jun 2024 at 17:59, Richard Guo <guofenglinux@gmail.com> wrote:
> Thank you for reviewing.
>
> On Mon, Jun 24, 2024 at 1:27 PM Li Japin <japinli@hotmail.com> wrote:
>> + /*
>> + * For now we do not support RIGHT_SEMI join in mergejoin or nestloop
>> + * join.
>> + */
>> + if (jointype == JOIN_RIGHT_SEMI)
>> + return;
>> +
>>
>> How about adding some reasons here?
>
> I've included a brief explanation in select_mergejoin_clauses.
>
Thank you for updating the patch.
>> + * this is a right-semi join, or this is a right/right-anti/full join and
>> + * there are nonmergejoinable join clauses. The executor's mergejoin
>>
>> Maybe we can put the right-semi join together with the right/right-anti/full
>> join. Is there any other significance by putting it separately?
>
> I don't think so. The logic is different: for right-semi join we will
> always set *mergejoin_allowed to false, while for right/right-anti/full
> join it is set to false only if there are nonmergejoinable join clauses.
>
Got it. Thanks for the explanation.
>> Maybe the following comments also should be updated. Right?
>
> Correct. And there are a few more places where we need to mention
> JOIN_RIGHT_SEMI, such as in reduce_outer_joins_pass2 and in the comment
> for SpecialJoinInfo.
>
>
> I noticed that this patch changes the plan of a query in join.sql from
> a semi join to right semi join, compromising the original purpose of
> this query, which was to test the fix for neqjoinsel's behavior for
> semijoins (see commit 7ca25b7d).
>
> --
> -- semijoin selectivity for <>
> --
> explain (costs off)
> select * from int4_tbl i4, tenk1 a
> where exists(select * from tenk1 b
> where a.twothousand = b.twothousand and a.fivethous <> b.fivethous)
> and i4.f1 = a.tenthous;
>
> So I've changed this test case a bit so that it is still testing what it
> is supposed to test.
>
> In passing, I've also updated the commit message to clarify that this
> patch does not address the support of "Right Semi Join" for merge joins.
>
Tested and looks good to me!
--
Regrads,
Japin Li
On Thu, 04 Jul 2024 at 17:17, Richard Guo <guofenglinux@gmail.com> wrote: > On Fri, Jun 28, 2024 at 3:21 PM Richard Guo <guofenglinux@gmail.com> wrote: >> On Fri, Jun 28, 2024 at 2:54 PM Richard Guo <guofenglinux@gmail.com> wrote: >> > I've refined this test case further to make it more stable by using an >> > additional filter 'a.tenthous < 5000'. Besides, I noticed a surplus >> > blank line in ExecHashJoinImpl(). I've removed it in the v7 patch. >> >> BTW, I've also verified the empty-rel optimization for hash join and >> AFAICT it works correctly for the new right-semi join. > > Here is a new rebase. > > Barring objections, I'm planning to push it soon. > Thanks for updating the patch. It looks good to me, except for a minor nitpick: s/right-semijoin/right-semi join/ -- Regrads, Japin Li
On Thu, Jul 4, 2024 at 11:18 PM Japin Li <japinli@hotmail.com> wrote: > On Thu, 04 Jul 2024 at 17:17, Richard Guo <guofenglinux@gmail.com> wrote: > > Here is a new rebase. > > > > Barring objections, I'm planning to push it soon. Pushed. Thanks for all the reviews. > Thanks for updating the patch. It looks good to me, except for a minor nitpick: > > s/right-semijoin/right-semi join/ I did not take this one. The comment nearby for RIGHT_ANTI uses 'right-antijoin', and I think we'd better adopt a consistent pattern for RIGHT_SEMI. Thanks Richard