Re: pull-up subquery if JOIN-ON contains refs to upper-query - Mailing list pgsql-hackers

From Alena Rybakina
Subject Re: pull-up subquery if JOIN-ON contains refs to upper-query
Date
Msg-id 0c498546-8af0-4aec-98fa-ad4f373cf711@yandex.ru
Whole thread Raw
In response to Re: pull-up subquery if JOIN-ON contains refs to upper-query  (Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>)
List pgsql-hackers

Hi!

On 03.09.2025 00:07, Ilia Evdokimov wrote:

Hi,

I've reviewed this patch, and I have suggestion about the approach.

Currently, the patch extends 'convert_EXISTS_sublick_to_join' with quite complex logic (clause collection, volatile checks, rewriting join quals, etc). While it works, the amount of branching and special cases makes the function harder to follow.

Looking at the logic, it seems that a large part of the complexity comes from trying to directly adapt 'convert_EXISTS_sublink_to_join' instead of factoring out a dedicated path. An alternative would be to introduce a separate function 'convert_EXISTS_sublink_to_lateral_join' - with a similar API to 'convert_ANY_sublink_to_join'. Such a function can focus only on the EXISTS-to-join case, while keeping the existing function shorter and easier to reason about.

I even made some first rough sketches of this approach (not a finished patch, just an outline). Of course, it would still need proper adaptation, but I think it demonstrates that the overall structure can be kept simpler.

What do you think about refactoring in this direction?


I think this approach isn’t fully correct. By forming a join between a subquery and the outer relation, you effectively force the optimizer to choose a Nested Loop join. Furthermore, it prevents the planner from exploring all join orders between the subquery’s tables and the outer relation, so we may miss a more optimal plan.

With your patch, I consistently get the following plan. I even disabled nested loops to see whether the planner could switch to a Hash Join or Merge Join, but those aren’t applicable with lateral parameters in this pattern.

CREATE TABLE ta (id int PRIMARY KEY, val int);
INSERT INTO ta VALUES (1,1), (2,2);

CREATE TABLE tb (id int PRIMARY KEY, aval int);
INSERT INTO tb VALUES (1,1), (2,1);

CREATE TABLE tc (id int PRIMARY KEY, aid int);
INSERT INTO tc VALUES (3,5), (1,5);

CREATE TABLE td (id int PRIMARY KEY, aid int);
INSERT INTO td VALUES (1,6), (2,7), (3,8), (4,9);

CREATE TABLE te (id int PRIMARY KEY, aid int);
INSERT INTO te VALUES (5,6), (6,7), (7,8), (4,9), (1,1);

SET enable_nestloop = OFF;

EXPLAIN ANALYZE
SELECT ta.id
FROM ta
WHERE EXISTS (
  SELECT 1
  FROM tb
  WHERE tb.id = ta.id
    AND EXISTS (SELECT 1 FROM tc WHERE tc.id = tb.id)
);

                                                           QUERY PLAN                                                           
--------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Semi Join  (cost=0.31..37017.50 rows=2260 width=4) (actual time=0.116..0.142 rows=2.00 loops=1)
   Disabled: true
   Buffers: local hit=9
   ->  Seq Scan on ta  (cost=0.00..32.60 rows=2260 width=4) (actual time=0.037..0.039 rows=2.00 loops=1)
         Buffers: local hit=1

   ->  Nested Loop Semi Join  (cost=0.31..16.36 rows=1 width=0) (actual time=0.045..0.046 rows=1.00 loops=2)
         Disabled: true
         Buffers: local hit=8
         ->  Index Only Scan using tb_pkey on tb  (cost=0.15..8.17 rows=1 width=4) (actual time=0.030..0.030 rows=1.00 loops=2)
               Index Cond: (id = ta.id)
               Heap Fetches: 2
               Index Searches: 2
               Buffers: local hit=4
         ->  Index Only Scan using tc_pkey on tc  (cost=0.15..8.17 rows=1 width=4) (actual time=0.010..0.010 rows=1.00 loops=2)
               Index Cond: (id = ta.id)
               Heap Fetches: 2
               Index Searches: 2
               Buffers: local hit=4
 Planning Time: 0.539 ms
 Execution Time: 0.252 ms
(20 rows)

Anyway, thank you for the work and attention here - your feedback was useful!

I’ve also rebased the patch on current master.


Attachment

pgsql-hackers by date:

Previous
From: Holger Hoffstätte
Date:
Subject: Re: [PATCH] jit: fix build with LLVM-21
Next
From: jian he
Date:
Subject: Re: CAST(... ON DEFAULT) - WIP build on top of Error-Safe User Functions