Thread: BUG #9007: List comparison

BUG #9007: List comparison

From

stefan.kirchev@gmail.com

Date:

28 January 2014, 13:21:58

The following bug has been logged on the website:

Bug reference:      9007
Logged by:          Stefan Kirchev
Email address:      stefan.kirchev@gmail.com
PostgreSQL version: 9.1.0
Operating system:   Linux Ubuntu Server
Description:

Using two tables to extract differences fails to show any result.
Table `table1` has two column of type integer, table `table2` has the same
structure. Using the following query fails to show the expected result:

select * from table1 where (c1, c2) not in (select c1, c2 from table2);

Adding a dummy condition in the internal query helps in getting results:

select * from table1 where (c1, c2) not in (select c1, c2 from table2 where
c1 <> 0);

Consider c1 does not nave values equal to 0:

pnp=# select c1, c2 from table2 where c1 = 0;
 c1 | c2
----+----
(0 rows)

Tested on versions 8.4 and 9.1. Here are the query plans on v8.4:
pnp=# explain select * from table1 where (c1, c2) not in (select c1, c2 from
table2 where c1 <> 0);
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on table1  (cost=290.31..681.84 rows=9951 width=8)
   Filter: (NOT (hashed subplan))
   SubPlan
     ->  Seq Scan on table2  (cost=0.00..253.58 rows=14692 width=8)
           Filter: (c1 <> 0)
(5 rows)

pnp=# explain select * from table1 where (c1, c2) not in (select c1, c2 from
table2);
                             QUERY PLAN
--------------------------------------------------------------------
 Seq Scan on table1  (cost=253.57..645.11 rows=9951 width=8)
   Filter: (NOT (hashed subplan))
   SubPlan
     ->  Seq Scan on table2  (cost=0.00..216.66 rows=14766 width=8)
(4 rows)

pnp=#

In the query plan the rows shown are half of the real table records:

pnp=# select count(*) from table1;
 count
-------
 20880
(1 row)

pnp=# select count(*) from table2;
 count
-------
 15557
(1 row)

pnp=#

Is that a buffers size issue? The server is pretty powerful and handles a
much bigger tables of about 100GB without loosing breath.

Re: BUG #9007: List comparison

From

Marti Raudsepp

Date:

28 January 2014, 13:59:51

On Tue, Jan 28, 2014 at 10:56 AM,  <stefan.kirchev@gmail.com> wrote:
> select * from table1 where (c1, c2) not in (select c1, c2 from table2);

Note that if there are any (NULL, NULL) values in table2 then NOT IN
*always* returns zero results. This counter-intuitive behavior is
mandated by the SQL specification and also makes the NOT IN clause
hard to optimize. It's not a bug, however.

It's almost always faster and more foolproof to convert them to NOT
EXISTS clauses like this:
  select * from table1 where not exists
    (select * from table2 where (table1.c1, table1.c2) = (table2.c1,
table2.c2));

db=# create table table1 (c1, c2) as values (1, 1);
db=# create table table2 (c1, c2) as values (2, 2);
db=# select * from table1 where (c1, c2) not in (select c1, c2 from table2);
 c1 | c2
----+----
  1 |  1
(1 row)

db=# insert into table2 values(null, null);
db=# select * from table1 where (c1, c2) not in (select c1, c2 from table2);
 c1 | c2
----+----
(0 rows)

db=# select * from table1 where not exists (select * from table2 where
(table1.c1, table1.c2) = (table2.c1, table2.c2));
 c1 | c2
----+----
  1 |  1
(1 row)

> Is that a buffers size issue?

No, tuning parameters should never affect the correctness of returned
results, only time.

Regards,
Marti