Thread: BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan
BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18904 Logged by: jinhui lai Email address: jh.lai@qq.com PostgreSQL version: 17.4 Operating system: ubuntu Description: Dear Postgres Developer, If you have a query of the form: Q1 INTERSECT Q2 ... INTERSECT Qn, In such cases, you know that query Qn always returns an empty set(e.g., a query with WHERE 1=2), then the entire intersection will always be empty. I think that such queries should be eliminated during optimization, as they will always return an empty set and should never consume execution time. Best regards, 1. How to repeat? docker pull postgres:latest docker run --name postgres -e POSTGRES_PASSWORD=1213 -d -p 5432:5432 postgres docker exec -it postgres psql -U postgres -c "CREATE DATABASE testdb;" docker exec -it postgres pgbench -U postgres -i -s 10 testdb docker exec -it postgres psql -U postgres psql (17.4 (Debian 17.4-1.pgdg120+2)) Type "help" for help. \timing on SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2; aid | bid | abalance | filler | tid | bid | tbalance | filler -----+-----+----------+--------+-----+-----+----------+-------- (0 rows) Time: 0.499 ms SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers INTERSECT SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2; aid | bid | abalance | filler | tid | bid | tbalance | filler -----+-----+----------+--------+-----+-----+----------+-------- (0 rows) Time: 16058.769 ms (00:16.059) EXPLAIN SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers INTERSECT SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2; QUERY PLAN --------------------------------------------------------------------------------------------------- HashSetOp Intersect (cost=0.00..4776396.28 rows=1 width=708) -> Append (cost=0.00..2776396.25 rows=100000001 width=708) -> Subquery Scan on "*SELECT* 2" (cost=0.00..0.00 rows=1 width=453) -> Result (cost=0.00..0.00 rows=0 width=449) One-Time Filter: false -> Subquery Scan on "*SELECT* 1" (cost=0.00..2276396.25 rows=100000000 width=453) -> Nested Loop (cost=0.00..1276396.25 rows=100000000 width=449) -> Seq Scan on pgbench_accounts (cost=0.00..26394.00 rows=1000000 width=97) -> Materialize (cost=0.00..2.50 rows=100 width=352) -> Seq Scan on pgbench_tellers (cost=0.00..2.00 rows=100 width=352) JIT: Functions: 9 Options: Inlining true, Optimization true, Expressions true, Deforming true (13 rows) 2. What do i expect to see? SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers INTERSECT SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2; aid | bid | abalance | filler | tid | bid | tbalance | filler -----+-----+----------+--------+-----+-----+----------+-------- (0 rows) Time: 0.499 ms (few time) 3. What do i see instead SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers INTERSECT SELECT * FROM pgbench_accounts CROSS JOIN pgbench_tellers WHERE 1=2; aid | bid | abalance | filler | tid | bid | tbalance | filler -----+-----+----------+--------+-----+-----+----------+-------- (0 rows) Time: 16058.769 ms (00:16.059)
BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan
From
"David G. Johnston"
Date:
On Sunday, April 27, 2025, PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:
Bug reference: 18904
Logged by: jinhui lai
Email address: jh.lai@qq.com
PostgreSQL version: 17.4
Operating system: ubuntu
Description:
Dear Postgres Developer,
If you have a query of the form:
Q1 INTERSECT Q2 ... INTERSECT Qn,
In such cases, you know that query Qn always returns an empty set(e.g., a
query with WHERE 1=2), then the entire intersection will always be empty.
I think that such queries should be eliminated during optimization, as they
will always return an empty set and should never consume execution time.
These failure to optimize requests are not bugs and are better discussed on the -general list where some sense of demand can be ascertained.
There is little desire to evaluate where clause expressions in the manner you propose, and while executing the subcomponents in most-restrictive to least-restrictive would be nice - stopping should any of them return no rows - it’s seems like quite a niche situation to spend time on.
David J.
Re: BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan
From
Tom Lane
Date:
"David G. Johnston" <david.g.johnston@gmail.com> writes: > On Sunday, April 27, 2025, PG Bug reporting form <noreply@postgresql.org> > wrote: >> If you have a query of the form: >> Q1 INTERSECT Q2 ... INTERSECT Qn, >> In such cases, you know that query Qn always returns an empty set(e.g., a >> query with WHERE 1=2), then the entire intersection will always be empty. >> I think that such queries should be eliminated during optimization, as they >> will always return an empty set and should never consume execution time. > These failure to optimize requests are not bugs and are better discussed on > the -general list where some sense of demand can be ascertained. Indeed. To get something like this in, you have to demonstrate that the required developer effort and planner runtime will be repaid by successfully optimizing a nontrivial fraction of real-world cases. I'm not really convinced about that in most of these cases. In this specific case, it seems like it would not take very many extra cycles for plan_set_operations and its subroutines to notice that an input relation is "dummy" (proven empty) and then simplify the set operation accordingly. But it might still not be worth doing, if it complicates that already-complicated code a lot. You'd have to be careful about ALL vs. DISTINCT for instance. Anyway, as David said, this is not a bug. If it's something you really want to see happen, try writing a patch yourself. regards, tom lane
Re: BUG #18904: INTERSECT with an impossible where should eliminate both from the query plan
From
David Rowley
Date:
On Mon, 28 Apr 2025 at 06:02, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Anyway, as David said, this is not a bug. If it's something you > really want to see happen, try writing a patch yourself. This is already somewhat better in v18 thanks to [1]. generate_nonunion_paths() already has some code to put the smallest child on the left, and the recent changes in nodeSetOp.c means very little work will be done when the left side is empty. David [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=276279295
Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan
From
"Jinhui Lai"
Date:
Dear PostgreSQL Developer,
My point is that this kind of query, which is always logically empty, should not be executed at all. Executing it not only wastes time, but more concerningly, it can lead to incorrect results. I have discovered bugs in MySQL, MariaDB, and TiDB where such queries, which should have returned empty result sets, instead produced incorrect non-empty results due to being actually executed. Since I’ve only recently started working with PostgreSQL in the past two days, I haven’t yet tested whether it exhibits similar bugs. If I do find any such cases, I will provide concrete examples. Therefore, my additional point is that avoiding the execution of such queries can fundamentally eliminate these types of logical bugs at their source.
Best regrerds,
Best regrerds,
原始邮件
发件人:David Rowley <dgrowleyml@gmail.com> 发件时间:2025年4月29日 19:57 收件人:Tom Lane <tgl@sss.pgh.pa.us> 抄送:David G. Johnston <david.g.johnston@gmail.com>, jh.lai@qq.com <jh.lai@qq.com>, pgsql-bugs@lists.postgresql.org <pgsql-bugs@lists.postgresql.org> 主题:Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan |
> Anyway, as David said, this is not a bug. If it's something you
> really want to see happen, try writing a patch yourself.
This is already somewhat better in v18 thanks to [1].
generate_nonunion_paths() already has some code to put the smallest
child on the left, and the recent changes in nodeSetOp.c means very
little work will be done when the left side is empty.
David
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=276279295
Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan
From
David Rowley
Date:
On Wed, 30 Apr 2025 at 00:13, Jinhui Lai <jh.lai@qq.com> wrote:
My point is that this kind of query, which is always logically empty, should not be executed at all. Executing it not only wastes time, but more concerningly, it can lead to incorrect results. I have discovered bugs in MySQL, MariaDB, and TiDB where such queries, which should have returned empty result sets, instead produced incorrect non-empty results due to being actually executed. Since I’ve only recently started working with PostgreSQL in the past two days, I haven’t yet tested whether it exhibits similar bugs. If I do find any such cases, I will provide concrete examples. Therefore, my additional point is that avoiding the execution of such queries can fundamentally eliminate these types of logical bugs at their source.
I appreciate you pointing these things out. There are certainly other places in our codebase where we are better at optimising things when finding a dummy relation. e.g INNER JOINs. Your part about "it can lead to incorrect results", you've shown us no evidence of that. If you have some, please demonstrate the incorrect results.
All we're telling you here is that you've shown us no evidence that there's an actual bug here. Per Tom's email in [1], he doesn't seem particularly against making improvements in this area and I'm not either, but not part of a bug fix. Any changes for this would be for PostgreSQL 19 at the earliest.
If you're keen to work on the patch, have a look at is_dummy_rel() in joinrels.c. Likely that'll need to be modified to handle SubqueryScanPaths. The bulk of the code changes after that will go into generate_nonunion_paths(). Take note about Tom's warning about INTERSECT ALL vs INTERSECT. If you do come up with a patch, post it to the pgsql-hackers mailing list and explain what you'd like to do. See the guidelines in [2].
David
Re: BUG #18904: INTERSECT with an impossible where should eliminateboth from the query plan
From
Francisco Olarte
Date:
On Tue, 29 Apr 2025 at 18:13, Jinhui Lai <jh.lai@qq.com> wrote:
My point is that this kind of query, which is always logically empty, should not be executed at all. Executing it not only wastes time, but more concerningly, it can lead to incorrect results. I have discovered bugs in MySQL, MariaDB, and TiDB where such queries, which should have returned empty result sets, instead produced incorrect non-empty results due to being actually executed.
So, you fear unfound bugs.
Since I’ve only recently started working with PostgreSQL in the past two days, I haven’t yet tested whether it exhibits similar bugs. If I do find any such cases, I will provide concrete examples. Therefore, my additional point is that avoiding the execution of such queries can fundamentally eliminate these types of logical bugs at their source.
If you find bugs the team is normally very good in finding those, and any bug you find can affect not only "logically empty" but some other queries, so it is better to fix the real bug. In fact you stop a testing tool, sending some logically empty queries and asserting the empty result.
OTOH adding a "logically empty prover" adds complexity, maintenance effort and bug surface to the system. You could end up with incorrectly empty queries due to a bug in the "prover" classifying a query as empty where it was not. And it could, and I believe it would, result in overall decreased performance for most of the user base.
Francisco Olarte.