Thread: BUG #2334: WHERE IN (SUBSELECT) fails when column is null
The following bug has been logged online: Bug reference: 2334 Logged by: Patrick Narkinsky Email address: patrick@narkinsky.com PostgreSQL version: 8.1.3 Operating system: Mac OS X Description: WHERE IN (SUBSELECT) fails when column is null Details: This may be expected behavior, but it certainly doesn't seem right to me, and it works as expected in sqlite. The database is as follows: BEGIN TRANSACTION; create table a ( id integer, text varchar(20) ); INSERT INTO a VALUES(0,'test'); INSERT INTO a VALUES(1,'test2'); create table b ( id integer, a_id integer); INSERT INTO b VALUES(0,NULL); INSERT INTO b VALUES(1,NULL); INSERT INTO b VALUES(2,NULL); COMMIT; The following query returns everything in a in sqlite, but returns nothing in postgresql: select * from a where a.id not in (select a_id from b); On postgresql, it works as expected when a_id has a non-null value. I'm not expert enough on SQL to say which is wrong, but it appears to me that the SQLite behavior makes a lot more sense.
On Fri, 17 Mar 2006, Patrick Narkinsky wrote: > This may be expected behavior, but it certainly doesn't seem right to me, > and it works as expected in sqlite. > > The database is as follows: > > BEGIN TRANSACTION; > create table a ( > id integer, > text varchar(20) > ); > INSERT INTO a VALUES(0,'test'); > INSERT INTO a VALUES(1,'test2'); > create table b ( > id integer, > a_id integer); > INSERT INTO b VALUES(0,NULL); > INSERT INTO b VALUES(1,NULL); > INSERT INTO b VALUES(2,NULL); > COMMIT; > > The following query returns everything in a in sqlite, but returns nothing > in postgresql: > > select * from a where a.id not in (select a_id from b); AFAICS, our behavior follows SQL. a NOT IN b is NOT(a IN b) IN is defined in terms of = ANY. a =ANY (b) is basically (by my reading of 8.8 anyway): True if a = bi for some bi in b False if b is empty or a <> bi for all bi in b Unknown otherwise Since a <> NULL returns unknown, the second one won't come up, so the whole expression won't ever be true after the negation. It might be false or it might be unknown.
Stephan Szabo schrieb: > AFAICS, our behavior follows SQL. > > a NOT IN b is NOT(a IN b) > IN is defined in terms of = ANY. > a =ANY (b) is basically (by my reading of 8.8 anyway): > True if a = bi for some bi in b > False if b is empty or a <> bi for all bi in b > Unknown otherwise > Since a <> NULL returns unknown, the second one won't come up, so the > whole expression won't ever be true after the negation. It might be false > or it might be unknown. > Not having read 8.8, I encountered this today and found it odd as well. It would mean that the old popular optimization, back when "A IN B" was much slower, was not correct: select * from foo where a not in (select b from bar) used to be written as: select * from foo where not exists (select 1 from bar where a=b) These queries have different results now when b is NULL for some rows. It doesn't look right to me (but if the Standard requires it, what can we do...). Regards, Marinos
On Wed, 22 Mar 2006, Marinos Yannikos wrote: > Stephan Szabo schrieb: > > AFAICS, our behavior follows SQL. > > > > a NOT IN b is NOT(a IN b) > > IN is defined in terms of = ANY. > > a =ANY (b) is basically (by my reading of 8.8 anyway): > > True if a = bi for some bi in b > > False if b is empty or a <> bi for all bi in b > > Unknown otherwise > > Since a <> NULL returns unknown, the second one won't come up, so the > > whole expression won't ever be true after the negation. It might be false > > or it might be unknown. > > > > Not having read 8.8, I encountered this today and found it odd as well. > It would mean that the old popular optimization, back when "A IN B" was > much slower, was not correct: > > select * from foo where a not in (select b from bar) > > used to be written as: > > select * from foo where not exists (select 1 from bar where a=b) Yep, in->exists I believe is the same, but not in->not exists is different. Exists and subqueries should probably have been done differently in SQL, but alas. > These queries have different results now when b is NULL for some rows. > It doesn't look right to me (but if the Standard requires it, what can > we do...). It actually makes some sense if you think about null as an unknown value. If you ask is 1 in the set (1, 2, unknown), you can definately say yes. If you ask is 3 in the set (1, 2, unknown), you can't be sure, because that unknown might be 3. For any x that's of the correct type for the set, you'll never be able to say no due to that unknown. If you ask is 1 not in the set (1, 2, unknown) you can definately say no. If you ask is 3 not in the set (1, 2, unknown) you again can't be sure. For any x that's of the correct type for the set, you'll never be able to say yes due to that unknown.