Thread: BUG #9007: List comparison
The following bug has been logged on the website: Bug reference: 9007 Logged by: Stefan Kirchev Email address: stefan.kirchev@gmail.com PostgreSQL version: 9.1.0 Operating system: Linux Ubuntu Server Description: Using two tables to extract differences fails to show any result. Table `table1` has two column of type integer, table `table2` has the same structure. Using the following query fails to show the expected result: select * from table1 where (c1, c2) not in (select c1, c2 from table2); Adding a dummy condition in the internal query helps in getting results: select * from table1 where (c1, c2) not in (select c1, c2 from table2 where c1 <> 0); Consider c1 does not nave values equal to 0: pnp=# select c1, c2 from table2 where c1 = 0; c1 | c2 ----+---- (0 rows) Tested on versions 8.4 and 9.1. Here are the query plans on v8.4: pnp=# explain select * from table1 where (c1, c2) not in (select c1, c2 from table2 where c1 <> 0); QUERY PLAN -------------------------------------------------------------------- Seq Scan on table1 (cost=290.31..681.84 rows=9951 width=8) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on table2 (cost=0.00..253.58 rows=14692 width=8) Filter: (c1 <> 0) (5 rows) pnp=# explain select * from table1 where (c1, c2) not in (select c1, c2 from table2); QUERY PLAN -------------------------------------------------------------------- Seq Scan on table1 (cost=253.57..645.11 rows=9951 width=8) Filter: (NOT (hashed subplan)) SubPlan -> Seq Scan on table2 (cost=0.00..216.66 rows=14766 width=8) (4 rows) pnp=# In the query plan the rows shown are half of the real table records: pnp=# select count(*) from table1; count ------- 20880 (1 row) pnp=# select count(*) from table2; count ------- 15557 (1 row) pnp=# Is that a buffers size issue? The server is pretty powerful and handles a much bigger tables of about 100GB without loosing breath.
On Tue, Jan 28, 2014 at 10:56 AM, <stefan.kirchev@gmail.com> wrote: > select * from table1 where (c1, c2) not in (select c1, c2 from table2); Note that if there are any (NULL, NULL) values in table2 then NOT IN *always* returns zero results. This counter-intuitive behavior is mandated by the SQL specification and also makes the NOT IN clause hard to optimize. It's not a bug, however. It's almost always faster and more foolproof to convert them to NOT EXISTS clauses like this: select * from table1 where not exists (select * from table2 where (table1.c1, table1.c2) = (table2.c1, table2.c2)); db=# create table table1 (c1, c2) as values (1, 1); db=# create table table2 (c1, c2) as values (2, 2); db=# select * from table1 where (c1, c2) not in (select c1, c2 from table2); c1 | c2 ----+---- 1 | 1 (1 row) db=# insert into table2 values(null, null); db=# select * from table1 where (c1, c2) not in (select c1, c2 from table2); c1 | c2 ----+---- (0 rows) db=# select * from table1 where not exists (select * from table2 where (table1.c1, table1.c2) = (table2.c1, table2.c2)); c1 | c2 ----+---- 1 | 1 (1 row) > Is that a buffers size issue? No, tuning parameters should never affect the correctness of returned results, only time. Regards, Marti