Thread: possible bug in 8.4

possible bug in 8.4

From

Grzegorz Jaskiewicz

Date:

18 December 2008, 13:15:41

create table a(a int not null);
insert into a(a) select generate_series(1,6000000);
create table b as select * from a where a%10 <> 0;
create index fooa on a(a);
alter table b alter column a set not null;
create index foob on b(a);

vacuum analyze verbose;


gj=# explain select a.a from a where a not in (select a from b);                               QUERY PLAN
------------------------------------------------------------------------- Seq Scan on a
(cost=99035.00..257874197565.00rows=3000000 width=4)   Filter: (NOT (subplan))   SubPlan     ->  Materialize
(cost=99035.00..171493.00rows=5400000 width=4)           ->  Seq Scan on b  (cost=0.00..75177.00 rows=5400000  
 
width=4)
(5 rows)


that's absolutely humongous cost, and it really does take ages before  
this thing finishes (had to kill it after an hour).

For change, same type of query (should return same thing)

gj=# explain analyze select a.a from a left join b on a.a=b.a where  
b.a is null;                                                            QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------
MergeAnti Join  (cost=0.00..350302.50 rows=600000 width=4) (actual  
 
time=0.534..10851.995 rows=600000 loops=1)   Merge Cond: (a.a = b.a)   ->  Index Scan using fooa on a
(cost=0.00..166209.78rows=6000000  
 
width=4) (actual time=0.232..3128.438 rows=6000000 loops=1)   ->  Index Scan using foob on b  (cost=0.00..149592.72
rows=5400000 
 
width=4) (actual time=0.161..2778.569 rows=5400000 loops=1) Total runtime: 10938.592 ms
(5 rows)

Time: 10939,414 ms

Re: possible bug in 8.4

From

Gregory Stark

Date:

18 December 2008, 13:46:01

Grzegorz Jaskiewicz <gj@pointblue.com.pl> writes:

> gj=# explain select a.a from a where a not in (select a from b);
>                                QUERY PLAN
> -------------------------------------------------------------------------
>  Seq Scan on a  (cost=99035.00..257874197565.00 rows=3000000 width=4)
>    Filter: (NOT (subplan))
>    SubPlan
>      ->  Materialize  (cost=99035.00..171493.00 rows=5400000 width=4)
>            ->  Seq Scan on b  (cost=0.00..75177.00 rows=5400000  width=4)
> (5 rows)
>
>
> that's absolutely humongous cost, and it really does take ages before  this
> thing finishes (had to kill it after an hour).

I think Postgres can't do better because there could be a NULL in the
subquery. If there's a NULL in the subquery then no record would match.

Now your column is NOT NULL so Postgres could do better but AFAIK we don't
look at column constraints like NOT NULL when planning. Historically we
couldn't because we didn't have plan invalidation -- and the plan you posted
below with the Anti-Join is brand new in 8.4 -- so there is room for
improvement but it's not exactly a bug.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning

Re: possible bug in 8.4

From

Tom Lane

Date:

18 December 2008, 15:14:23

Gregory Stark <stark@enterprisedb.com> writes:
> I think Postgres can't do better because there could be a NULL in the
> subquery. If there's a NULL in the subquery then no record would match.

Yeah.  NOT IN does not have the right semantics to become an antijoin.

FWIW, you can use a NOT EXISTS:

select a.a from a where not exists (select 1 from b where a.a = b.a);

In HEAD this should give the identical plan to the leftjoin/is null
hack.
        regards, tom lane

Re: possible bug in 8.4

From

Gregory Stark

Date:

18 December 2008, 15:32:19

Tom Lane <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> I think Postgres can't do better because there could be a NULL in the
>> subquery. If there's a NULL in the subquery then no record would match.
>
> Yeah.  NOT IN does not have the right semantics to become an antijoin.

If we noticed that the columns in the subquery are all guaranteed to be not
null could we do it then?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's RemoteDBA services!

Re: possible bug in 8.4

From

Tom Lane

Date:

18 December 2008, 15:46:43

Gregory Stark <stark@enterprisedb.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Yeah.  NOT IN does not have the right semantics to become an antijoin.

> If we noticed that the columns in the subquery are all guaranteed to be not
> null could we do it then?

I think you'd also have to know that the outer-query value isn't null,
plus assume that the comparison operator can't return null for two
non-nulls (but we already assume that for btree/hash equality I think).

As you said, this would never have been safe before plan invalidation,
but it might be doable now.
        regards, tom lane

Re: possible bug in 8.4

From

Grzegorz Jaskiewicz

Date:

18 December 2008, 18:17:55

all I know, is that the same query will work on 8.3 in reasonably  
acceptable time frame.

Re: possible bug in 8.4

From

Gregory Stark

Date:

18 December 2008, 18:28:57

Grzegorz Jaskiewicz <gj@pointblue.com.pl> writes:

> all I know, is that the same query will work on 8.3 in reasonably  acceptable
> time frame.

What plan do you get in 8.3?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!

Re: possible bug in 8.4

From

Gregory Stark

Date:

18 December 2008, 18:36:29

Grzegorz Jaskiewicz <gj@pointblue.com.pl> writes:

> all I know, is that the same query will work on 8.3 in reasonably  acceptable
> time frame.
>

Because I see the exact same plan -- in fact with the exact same cost:

Welcome to psql 8.3.3, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms      \h for help with SQL commands      \? for help with psql commands      \g
orterminate with semicolon to execute query      \q to quit

postgres=# explain select a.a from a where a not in (select a from b);                              QUERY PLAN

-------------------------------------------------------------------------Seq Scan on a  (cost=99035.00..257874197565.00
rows=3000000width=4)  Filter: (NOT (subplan))  SubPlan    ->  Materialize  (cost=99035.00..171493.00 rows=5400000
width=4)         ->  Seq Scan on b  (cost=0.00..75177.00 rows=5400000 width=4)

(5 rows)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!

Re: possible bug in 8.4

From

"Kevin Grittner"

Date:

18 December 2008, 19:11:14

>>> Gregory Stark <stark@enterprisedb.com> wrote: 
> Grzegorz Jaskiewicz <gj@pointblue.com.pl> writes:
> 
>> all I know, is that the same query will work on 8.3 in reasonably  
> acceptable
>> time frame.
>>
> 
> Because I see the exact same plan -- in fact with the exact same
cost:
> 
> 
> Welcome to psql 8.3.3, the PostgreSQL interactive terminal.
I'm seeing the exact same plan on 8.3.5, too.
-Kevin

Re: possible bug in 8.4

From

Grzegorz Jaskiewicz

Date:

19 December 2008, 05:59:19

true, but as soon as I drop indices on both tables:

root=# explain analyze select a.a from a where a not in (select a from  
b);                                                     QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Seq Scan on a  (cost=88677.00..187207.00 rows=3000000 width=4)
(actual time=22803.470..26473.039 rows=600000 loops=1)  Filter: (NOT (hashed subplan))  SubPlan    ->  Seq Scan on b
(cost=0.00..75177.00rows=5400000 width=4)
 
(actual time=9.232..15828.904 rows=5400000 loops=1)
Total runtime: 29357.267 ms
(5 rows)

root=# select version();                                               version
--------------------------------------------------------------------------------------------------------
PostgreSQL 8.3.5 on i686-redhat-linux-gnu, compiled by GCC gcc (GCC)
3.4.6 20060404 (Red Hat 3.4.6-10)
(1 row)


I thought planner should be able to choose plan that costs less.

Re: possible bug in 8.4

From

Grzegorz Jaskiewicz

Date:

19 December 2008, 06:50:58

and the plan on 8.4:
gj=# explain select a.a from a where a not in (select a from b);                               QUERY PLAN
------------------------------------------------------------------------- Seq Scan on a
(cost=99035.00..257874197565.00rows=3000000 width=4)   Filter: (NOT (subplan))   SubPlan     ->  Materialize
(cost=99035.00..171493.00rows=5400000 width=4)           ->  Seq Scan on b  (cost=0.00..75177.00 rows=5400000  
 
width=4)
(5 rows)

This looks like a regression to me, or am I wrong ?

Re: possible bug in 8.4

From

Tom Lane

Date:

19 December 2008, 09:08:41

Grzegorz Jaskiewicz <gj@pointblue.com.pl> writes:
>    Filter: (NOT (hashed subplan))                  ^^^^^^^^^^^^^^

If 8.3 does that, and 8.4 doesn't, it's most likely because you are
using different work_mem settings.
        regards, tom lane

Re: possible bug in 8.4

From

Grzegorz Jaskiewicz

Date:

19 December 2008, 10:51:51

On 2008-12-19, at 13:07, Tom Lane wrote:

> Grzegorz Jaskiewicz <gj@pointblue.com.pl> writes:
>> Filter: (NOT (hashed subplan))
>           ^^^^^^^^^^^^^^
>
> If 8.3 does that, and 8.4 doesn't, it's most likely because you are
> using different work_mem settings.

you're right, as always :)

My point is, why is planner choosing plan on 8.4 that's obviously more  
expensive ? even without sort_mem (work_mem) set to higher value ?
Obviously the plan is quite expensive, so probably sorting it on disc  
- would be still cheaper.
The example is quite unrealistic, but I am trying different simple  
things to test planner differences between these two versions, and see  
if there's any improvement/regression.
This time I will set both work_mem to same value :)