Thread: BUG #2334: WHERE IN (SUBSELECT) fails when column is null

BUG #2334: WHERE IN (SUBSELECT) fails when column is null

From

"Patrick Narkinsky"

Date:

18 March 2006, 19:54:06

The following bug has been logged online:

Bug reference:      2334
Logged by:          Patrick Narkinsky
Email address:      patrick@narkinsky.com
PostgreSQL version: 8.1.3
Operating system:   Mac OS X
Description:        WHERE IN (SUBSELECT) fails when column is null
Details:

This may be expected behavior, but it certainly doesn't seem right to me,
and it works as expected in sqlite.

The database is as follows:

BEGIN TRANSACTION;
create table a (
id integer,
text varchar(20)
);
INSERT INTO a VALUES(0,'test');
INSERT INTO a VALUES(1,'test2');
create table b (
id integer,
a_id integer);
INSERT INTO b VALUES(0,NULL);
INSERT INTO b VALUES(1,NULL);
INSERT INTO b VALUES(2,NULL);
COMMIT;

The following query returns everything in a in sqlite, but returns nothing
in postgresql:

select * from a where a.id not in (select a_id from b);

On postgresql, it works as expected when a_id has a non-null value.  I'm not
expert enough on SQL to say which is wrong, but it appears to me that the
SQLite behavior makes a lot more sense.

Re: BUG #2334: WHERE IN (SUBSELECT) fails when column is null

From

Stephan Szabo

Date:

18 March 2006, 20:13:49

On Fri, 17 Mar 2006, Patrick Narkinsky wrote:

> This may be expected behavior, but it certainly doesn't seem right to me,
> and it works as expected in sqlite.
>
> The database is as follows:
>
> BEGIN TRANSACTION;
> create table a (
> id integer,
> text varchar(20)
> );
> INSERT INTO a VALUES(0,'test');
> INSERT INTO a VALUES(1,'test2');
> create table b (
> id integer,
> a_id integer);
> INSERT INTO b VALUES(0,NULL);
> INSERT INTO b VALUES(1,NULL);
> INSERT INTO b VALUES(2,NULL);
> COMMIT;
>
> The following query returns everything in a in sqlite, but returns nothing
> in postgresql:
>
> select * from a where a.id not in (select a_id from b);

AFAICS, our behavior follows SQL.

a NOT IN b is NOT(a IN b)
IN is defined in terms of = ANY.
a =ANY (b) is basically (by my reading of 8.8 anyway):
 True if a = bi for some bi in b
 False if b is empty or a <> bi for all bi in b
 Unknown otherwise
Since a <> NULL returns unknown, the second one won't come up, so the
whole expression won't ever be true after the negation.  It might be false
or it might be unknown.

Re: BUG #2334: WHERE IN (SUBSELECT) fails when column is null

From

Marinos Yannikos

Date:

22 March 2006, 00:57:22

Stephan Szabo schrieb:
> AFAICS, our behavior follows SQL.
>
> a NOT IN b is NOT(a IN b)
> IN is defined in terms of = ANY.
> a =ANY (b) is basically (by my reading of 8.8 anyway):
>  True if a = bi for some bi in b
>  False if b is empty or a <> bi for all bi in b
>  Unknown otherwise
> Since a <> NULL returns unknown, the second one won't come up, so the
> whole expression won't ever be true after the negation.  It might be false
> or it might be unknown.
>

Not having read 8.8, I encountered this today and found it odd as well.
It would mean that the old popular optimization, back when "A IN B" was
much slower, was not correct:

select * from foo where a not in (select b from bar)

used to be written as:

select * from foo where not exists (select 1 from bar where a=b)

These queries have different results now when b is NULL for some rows.
It doesn't look right to me (but if the Standard requires it, what can
we do...).

Regards,
  Marinos

Re: BUG #2334: WHERE IN (SUBSELECT) fails when column is null

From

Stephan Szabo

Date:

22 March 2006, 10:06:37

On Wed, 22 Mar 2006, Marinos Yannikos wrote:

> Stephan Szabo schrieb:
> > AFAICS, our behavior follows SQL.
> >
> > a NOT IN b is NOT(a IN b)
> > IN is defined in terms of = ANY.
> > a =ANY (b) is basically (by my reading of 8.8 anyway):
> >  True if a = bi for some bi in b
> >  False if b is empty or a <> bi for all bi in b
> >  Unknown otherwise
> > Since a <> NULL returns unknown, the second one won't come up, so the
> > whole expression won't ever be true after the negation.  It might be false
> > or it might be unknown.
> >
>
> Not having read 8.8, I encountered this today and found it odd as well.
> It would mean that the old popular optimization, back when "A IN B" was
> much slower, was not correct:
>
> select * from foo where a not in (select b from bar)
>
> used to be written as:
>
> select * from foo where not exists (select 1 from bar where a=b)

Yep, in->exists I believe is the same, but not in->not exists is
different. Exists and subqueries should probably have been done
differently in SQL, but alas.

> These queries have different results now when b is NULL for some rows.
> It doesn't look right to me (but if the Standard requires it, what can
> we do...).

It actually makes some sense if you think about null as an unknown value.

If you ask is 1 in the set (1, 2, unknown), you can definately say yes.
If you ask is 3 in the set (1, 2, unknown), you can't be sure, because
that unknown might be 3.
For any x that's of the correct type for the set, you'll never be able to
say no due to that unknown.

If you ask is 1 not in the set (1, 2, unknown) you can definately say no.
If you ask is 3 not in the set (1, 2, unknown) you again can't be sure.
For any x that's of the correct type for the set, you'll never be able to
say yes due to that unknown.