On 1/10/2024 18:43, Tom Lane wrote: > In any case, in this toy example that lacks an ANALYZE step, > the selectivity estimates are mostly going to be garbage.
Thanks for the replies. I'm just checking if a bug is present here is a bug. Even with ANALYZE, the first EXPLAIN estimates more rows compared to the second, even though the second WHERE clause is
less restrictive.
ANALYZE;
ANALYZE
EXPLAIN SELECT t2.c0 FROM t2 WHERE t2.c0 IN (t2.c0); QUERY PLAN -------------------------------------------------- Seq Scan on t2 (cost=0.00..1.02 rows=2 width=4) Filter: (c0 IS NOT NULL) (2 rows)
EXPLAIN SELECT t2.c0 FROM t2 WHERE (t2.c0 IN (t2.c0)) OR (t2.c0 > 4); QUERY PLAN -------------------------------------------------- Seq Scan on t2 (cost=0.00..1.03 rows=1 width=4) Filter: ((c0 = c0) OR (c0 > 4)) (2 rows)
DROP DATABASE IF EXISTS database4; CREATE DATABASE database4 WITH ENCODING 'UTF8' TEMPLATE template0; \c database4; CREATE TABLE t2(c0 int); INSERT INTO t2(c0) VALUES(1); INSERT INTO t2(c0) VALUES(2); ANALYZE; EXPLAIN SELECT t2.c0 FROM t2 WHERE t2.c0 IN (t2.c0); EXPLAIN SELECT t2.c0 FROM t2 WHERE (t2.c0 IN (t2.c0)) OR (t2.c0 > 4);
Regards, Ming Wei
On Wed, 2 Oct 2024 at 08:35, Andrei Lepikhov <lepihov@gmail.com> wrote:
On 1/10/2024 18:43, Tom Lane wrote: > PG Bug reporting form <noreply@postgresql.org> writes: >> Given predicate A and B, it is expected that size (SELECT X where A) <= >> size (SELECT X WHERE A or B) >> However, `EXPLAIN SELECT t2.c0 FROM t2 WHERE t2.c0 IN (t2.c0)` returns >> rows=2537 > > I don't see any particular bug here. If you look closely at the > EXPLAIN output, you'll see that "t2.c0 IN (t2.c0)" is transformed > to "c0 IS NOT NULL" --- but only if it's at top level. So we're > estimating selectivities for two quite different conditions in > this example. > > The NOT NULL bit happens because a top-level equality clause > is transformed into an "EquivalenceClass", and then when we > notice the class has only one member, we prefer to spit out > "x IS NOT NULL" rather than "x = x". That has the same effect > (at top level of WHERE, anyway) and tends to be estimated > more accurately. I think their question was about why 'x IN (x)' transforms differently at the top and inside the OR clause. It is pretty typical question.