Home > mailing lists

Performance problem with joined aggregate query - Mailing list pgsql-performance

From	Anssi Kääriäinen
Subject	Performance problem with joined aggregate query
Date	September 15, 2010 15:48:42
Msg-id	4C906706.40108@thl.fi Whole thread Raw
Responses	Re: Performance problem with joined aggregate query
List	pgsql-performance

Tree view

Hello all,

I am trying to use aggregate queries in views, and when joining these
views to other
tables, I get seq scan in the view, even if index scan would be clearly
better. The views
I am using in my Db are actually long pivot queries, but the following
simple test case is enough
to show the problem.

I will first show the table definitions, then the performance problem I
am having.

create table test1 (
    id serial primary key not null,
    other_id integer unique not null
);

create table test2 (
     id integer not null references test1(id),
     type integer,
     value text
);

create index test2_idx on test2(id);

insert into test1 select g, g+10000 from (select generate_series(1,
10000) as g) t;
insert into test2 select g, g%3, 'testval'||g from (select
generate_series(1, 10000) as g) t;
insert into test2 select g, (g+1)%3, 'testval'||g from (select
generate_series(1, 10000) as g) t;
insert into test2 select g, (g+2)%3, 'testval'||g from (select
generate_series(1, 10000) as g) t;

Now, the following query is fast:

select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.id = 1;
(0.6ms)

But the following query is slow (seqscan on test2):

select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;
(45ms)

The same problem can be seen when running:

select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.id in (1, 2);
(40ms runtime)

Fetching directly from test2 with id is fast:

select array_agg(value), id
from test2 where test2.id in (1, 2) group by id;

If I set enable_seqscan to off, then I get fast results:

select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.other_id in
(10001, 10002);
(0.6ms)

Or slow results, if the fetched rows happen to be in the end of the index:

select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.other_id = 20000;
(40ms)

Explain analyzes of the problematic query:

With enable_seqscan:

explain analyze select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;

Hash Join  (cost=627.48..890.48 rows=50 width=44) (actual
time=91.575..108.085 rows=1 loops=1)
    Hash Cond: (test2.id = test1.id)
    ->  HashAggregate  (cost=627.00..752.00 rows=10000 width=15) (actual
time=82.663..98.281 rows=10000 loops=1)
          ->  Seq Scan on test2  (cost=0.00..477.00 rows=30000 width=15)
(actual time=0.009..30.650 rows=30000 loops=1)
    ->  Hash  (cost=0.47..0.47 rows=1 width=8) (actual time=0.026..0.026
rows=1 loops=1)
          ->  Index Scan using test1_other_id_key on test1
(cost=0.00..0.47 rows=1 width=8) (actual time=0.018..0.021 rows=1 loops=1)
                Index Cond: (other_id = 10001)
Total runtime: 109.686 ms

Without enable_seqscan:

explain analyze select * from test1 inner join (select array_agg(value), id
from test2 group by id) t on test1.id = t.id where test1.other_id = 10001;

Merge Join  (cost=0.48..895.91 rows=50 width=44) (actual
time=0.066..0.085 rows=1 loops=1)
    Merge Cond: (test2.id = test1.id)
    ->  GroupAggregate  (cost=0.00..769.56 rows=10000 width=15) (actual
time=0.040..0.054 rows=2 loops=1)
          ->  Index Scan using test2_idx on test2  (cost=0.00..494.56
rows=30000 width=15) (actual time=0.017..0.030 rows=7 loops=1)
    ->  Sort  (cost=0.48..0.48 rows=1 width=8) (actual time=0.020..0.022
rows=1 loops=1)
          Sort Key: test1.id
          Sort Method:  quicksort  Memory: 17kB
          ->  Index Scan using test1_other_id_key on test1
(cost=0.00..0.47 rows=1 width=8) (actual time=0.010..0.012 rows=1 loops=1)
                Index Cond: (other_id = 10001)


  - Anssi Kääriäinen

pgsql-performance by date:

From: Mason Harding
Date: 15 September 2010, 15:48:33
Subject: Re: Slow SQL lookup due to every field being listed in SORT KEY

From: Tobias Brox
Date: 15 September 2010, 16:09:24
Subject: Re: locking issue on simple selects?

Performance problem with joined aggregate query - Mailing list pgsql-performance

Previous

Next