Thread: BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan

The following bug has been logged on the website:

Bug reference:      18904
Logged by:          jinhui lai
Email address:      jh.lai@qq.com
PostgreSQL version: 17.4
Operating system:   ubuntu
Description:

Dear Postgres Developer,
If you have a query of the form:
Q1 INTERSECT Q2 ... INTERSECT Qn,
In such cases,  you know that query Qn always returns an empty set(e.g., a
query with WHERE 1=2), then the entire intersection will always be empty.
I think that such queries should be eliminated during optimization, as they
will always return an empty set and should never consume execution time.
Best regards,
1. How to repeat?
docker pull postgres:latest
docker run --name postgres -e POSTGRES_PASSWORD=1213 -d -p 5432:5432
postgres
docker exec -it postgres psql -U postgres -c "CREATE DATABASE testdb;"
docker exec -it postgres pgbench -U postgres -i -s 10 testdb
docker exec -it  postgres psql -U postgres
psql (17.4 (Debian 17.4-1.pgdg120+2))
Type "help" for help.
\timing on
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
 aid | bid | abalance | filler | tid | bid | tbalance | filler
-----+-----+----------+--------+-----+-----+----------+--------
(0 rows)
Time: 0.499 ms
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers
INTERSECT
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
 aid | bid | abalance | filler | tid | bid | tbalance | filler
-----+-----+----------+--------+-----+-----+----------+--------
(0 rows)
Time: 16058.769 ms (00:16.059)
EXPLAIN SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers
INTERSECT
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
QUERY PLAN
---------------------------------------------------------------------------------------------------
 HashSetOp Intersect  (cost=0.00..4776396.28 rows=1 width=708)
   ->  Append  (cost=0.00..2776396.25 rows=100000001 width=708)
         ->  Subquery Scan on "*SELECT* 2"  (cost=0.00..0.00 rows=1
width=453)
               ->  Result  (cost=0.00..0.00 rows=0 width=449)
                     One-Time Filter: false
         ->  Subquery Scan on "*SELECT* 1"  (cost=0.00..2276396.25
rows=100000000 width=453)
               ->  Nested Loop  (cost=0.00..1276396.25 rows=100000000
width=449)
                     ->  Seq Scan on pgbench_accounts  (cost=0.00..26394.00
rows=1000000 width=97)
                     ->  Materialize  (cost=0.00..2.50 rows=100 width=352)
                           ->  Seq Scan on pgbench_tellers  (cost=0.00..2.00
rows=100 width=352)
 JIT:
   Functions: 9
   Options: Inlining true, Optimization true, Expressions true, Deforming
true
(13 rows)
2. What do i expect to see?
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers
INTERSECT
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
 aid | bid | abalance | filler | tid | bid | tbalance | filler
-----+-----+----------+--------+-----+-----+----------+--------
(0 rows)
Time: 0.499 ms (few time)
3. What do i see instead
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers
INTERSECT
SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2;
 aid | bid | abalance | filler | tid | bid | tbalance | filler
-----+-----+----------+--------+-----+-----+----------+--------
(0 rows)
Time: 16058.769 ms (00:16.059)


On Sunday, April 27, 2025, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      18904
Logged by:          jinhui lai
Email address:      jh.lai@qq.com
PostgreSQL version: 17.4
Operating system:   ubuntu
Description:       

Dear Postgres Developer,
If you have a query of the form:
Q1 INTERSECT Q2 ... INTERSECT Qn,
In such cases,  you know that query Qn always returns an empty set(e.g., a
query with WHERE 1=2), then the entire intersection will always be empty.
I think that such queries should be eliminated during optimization, as they
will always return an empty set and should never consume execution time.

These failure to optimize requests are not bugs and are better discussed on the -general list where some sense of demand can be ascertained.

There is little desire to evaluate where clause expressions in the manner you propose, and while executing the subcomponents in most-restrictive to least-restrictive would be nice - stopping should any of them return no rows - it’s seems like quite a niche situation to spend time on.

David J.

"David G. Johnston" <david.g.johnston@gmail.com> writes:
> On Sunday, April 27, 2025, PG Bug reporting form <noreply@postgresql.org>
> wrote:
>> If you have a query of the form:
>> Q1 INTERSECT Q2 ... INTERSECT Qn,
>> In such cases,  you know that query Qn always returns an empty set(e.g., a
>> query with WHERE 1=2), then the entire intersection will always be empty.
>> I think that such queries should be eliminated during optimization, as they
>> will always return an empty set and should never consume execution time.

> These failure to optimize requests are not bugs and are better discussed on
> the -general list where some sense of demand can be ascertained.

Indeed.  To get something like this in, you have to demonstrate that
the required developer effort and planner runtime will be repaid by
successfully optimizing a nontrivial fraction of real-world cases.
I'm not really convinced about that in most of these cases.

In this specific case, it seems like it would not take very many extra
cycles for plan_set_operations and its subroutines to notice that an
input relation is "dummy" (proven empty) and then simplify the set
operation accordingly.  But it might still not be worth doing, if it
complicates that already-complicated code a lot.  You'd have to be
careful about ALL vs. DISTINCT for instance.

Anyway, as David said, this is not a bug.  If it's something you
really want to see happen, try writing a patch yourself.

            regards, tom lane



On Mon, 28 Apr 2025 at 06:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, as David said, this is not a bug.  If it's something you
> really want to see happen, try writing a patch yourself.

This is already somewhat better in v18 thanks to [1].
generate_nonunion_paths() already has some code to put the smallest
child on the left, and the recent changes in nodeSetOp.c means very
little work will be done when the left side is empty.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=276279295



Dear PostgreSQL Developer,

My point is that this kind of query, which is always logically empty, should not be executed at all. Executing it not only wastes time, but more concerningly, it can lead to incorrect results. I have discovered bugs in MySQL, MariaDB, and TiDB where such queries, which should have returned empty result sets, instead produced incorrect non-empty results due to being actually executed. Since I’ve only recently started working with PostgreSQL in the past two days, I haven’t yet tested whether it exhibits similar bugs. If I do find any such cases, I will provide concrete examples. Therefore, my additional point is that avoiding the execution of such queries can fundamentally eliminate these types of logical bugs at their source.

Best regrerds,
原始邮件

发件人:David Rowley <dgrowleyml@gmail.com>
发件时间:2025年4月29日 19:57
收件人:Tom Lane <tgl@sss.pgh.pa.us>
抄送:David G. Johnston <david.g.johnston@gmail.com>, jh.lai@qq.com <jh.lai@qq.com>, pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org>
主题:Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan

On Mon, 28 Apr 2025 at 06:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Anyway, as David said, this is not a bug.  If it's something you
> really want to see happen, try writing a patch yourself.

This is already somewhat better in v18 thanks to [1].
generate_nonunion_paths() already has some code to put the smallest
child on the left, and the recent changes in nodeSetOp.c means very
little work will be done when the left side is empty.

David

[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=276279295

On Wed, 30 Apr 2025 at 00:13, Jinhui Lai <jh.lai@qq.com> wrote:
My point is that this kind of query, which is always logically empty, should not be executed at all. Executing it not only wastes time, but more concerningly, it can lead to incorrect results. I have discovered bugs in MySQL, MariaDB, and TiDB where such queries, which should have returned empty result sets, instead produced incorrect non-empty results due to being actually executed. Since I’ve only recently started working with PostgreSQL in the past two days, I haven’t yet tested whether it exhibits similar bugs. If I do find any such cases, I will provide concrete examples. Therefore, my additional point is that avoiding the execution of such queries can fundamentally eliminate these types of logical bugs at their source.

I appreciate you pointing these things out. There are certainly other places in our codebase where we are better at optimising things when finding a dummy relation. e.g INNER JOINs.  Your part about "it can lead to incorrect results", you've shown us no evidence of that. If you have some, please demonstrate the incorrect results.

All we're telling you here is that you've shown us no evidence that there's an actual bug here.  Per Tom's email in [1], he doesn't seem particularly against making improvements in this area and I'm not either, but not part of a bug fix. Any changes for this would be for PostgreSQL 19 at the earliest.

If you're keen to work on the patch, have a look at is_dummy_rel() in joinrels.c. Likely that'll need to be modified to handle SubqueryScanPaths. The bulk of the code changes after that will go into generate_nonunion_paths(). Take note about Tom's warning about INTERSECT ALL vs INTERSECT. If you do come up with a patch, post it to the pgsql-hackers mailing list and explain what you'd like to do. See the guidelines in [2].

David

On Tue, 29 Apr 2025 at 18:13, Jinhui Lai <jh.lai@qq.com> wrote:
My point is that this kind of query, which is always logically empty, should not be executed at all. Executing it not only wastes time, but more concerningly, it can lead to incorrect results. I have discovered bugs in MySQL, MariaDB, and TiDB where such queries, which should have returned empty result sets, instead produced incorrect non-empty results due to being actually executed.

So, you fear unfound bugs.

Since I’ve only recently started working with PostgreSQL in the past two days, I haven’t yet tested whether it exhibits similar bugs. If I do find any such cases, I will provide concrete examples. Therefore, my additional point is that avoiding the execution of such queries can fundamentally eliminate these types of logical bugs at their source.

If you find bugs the team is normally very good in finding those, and any bug you find can affect not only "logically empty" but some other queries, so it is better to fix the real bug. In fact you stop a testing tool, sending some logically empty queries and asserting the empty result.

OTOH adding a "logically empty prover" adds complexity, maintenance effort and bug surface to the system. You could end up with incorrectly empty queries due to a bug in the "prover" classifying a query as empty where it was not. And it could, and I believe it would, result in overall decreased performance for most of the user base.

Francisco Olarte.