Re: optimizing queries using IN and EXISTS - Mailing list pgsql-performance

From Tom Lane
Subject Re: optimizing queries using IN and EXISTS
Date
Msg-id 3691.1342650974@sss.pgh.pa.us
Whole thread Raw
In response to optimizing queries using IN and EXISTS  (Nick Hofstede <Nick.Hofstede@inventivegroup.com>)
Responses Re: optimizing queries using IN and EXISTS
List pgsql-performance
Nick Hofstede <Nick.Hofstede@inventivegroup.com> writes:
> I'm surprised at the difference in speed/execution plan between two logically equivalent queries, one using IN, the
otherusing EXISTS. (At least I think they are logically equivalent) 

> SELECT *
> FROM   foo
> WHERE  'text6' IN (SELECT value
>                    FROM   bar
>                           JOIN foo AS foo2
>                             ON bar.foo_ref = foo2.id
>                    WHERE  foo2.id = foo.id)

Hm.  convert_ANY_sublink_to_join() rejects subqueries that contain any
Vars of the parent query level, so the reference to foo.id prevents this
from being converted to a semijoin.  However, it seems like that's
overly restrictive.  I'm not sure that we could remove the test
altogether, but at least outer vars used in WHERE seem safe.

In the meantime, you can recast like this:

SELECT *
FROM   foo
WHERE  ('text6', id) IN (SELECT value, foo2.id
                   FROM   bar
                          JOIN foo AS foo2
                            ON bar.foo_ref = foo2.id)

and still get a semijoin plan from an IN-style query.

            regards, tom lane

pgsql-performance by date:

Previous
From: Nick Hofstede
Date:
Subject: Re: optimizing queries using IN and EXISTS
Next
From: David Kerr
Date:
Subject: Re: Process 11812 still waiting for ExclusiveLock on extension of relation