Home > mailing lists

8.4.7, incorrect estimate - Mailing list pgsql-performance

From	Wayne Conrad
Subject	8.4.7, incorrect estimate
Date	April 29, 2011 15:29:33
Msg-id	4DBB024B.5050500@databill.com Whole thread Raw
Responses	Re: 8.4.7, incorrect estimate Re: 8.4.7, incorrect estimate
List	pgsql-performance

Tree view

Howdy.  We've got a query that takes less than a second unless we add a
"order by" to it, after which it takes 40 seconds.  Here's the query:

select page_number, ps_id, ps_page_id from ps_page where ps_page_id in
(select ps_page_id from documents_ps_page where document_id in (select
document_id from temp_doc_ids)) order by ps_page_id;

The parts of the schema used in this query:

                                 Table "public.ps_page"
    Column    |  Type   |                          Modifiers

-------------+---------+--------------------------------------------------------------
  ps_page_id  | integer | not null default
nextval('ps_page_ps_page_id_seq'::regclass)
  ps_id       | integer | not null
  page_number | integer | not null
Indexes:
     "ps_page_pkey" PRIMARY KEY, btree (ps_page_id)
     "ps_page_ps_id_key" UNIQUE, btree (ps_id, page_number)

  Table "public.documents_ps_page"
    Column    |  Type   | Modifiers
-------------+---------+-----------
  document_id | text    | not null
  ps_page_id  | integer | not null
Indexes:
     "documents_ps_page_pkey" PRIMARY KEY, btree (document_id, ps_page_id)
     "documents_ps_page_ps_page_id_idx" btree (ps_page_id)

temp_doc_ids (temporary table):
   document_id text not null

The query with the "order by" (slow):

explain analyze select page_number, ps_id, ps_page_id from ps_page where
ps_page_id in (select ps_page_id from documents_ps_page where
document_id in (select document_id from temp_document_ids)) order by
ps_page_id
     Merge Semi Join  (cost=212570.02..3164648.31 rows=34398932
width=12) (actual time=54749.281..54749.295 rows=5 loops=1)
       Merge Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id)
       ->  Index Scan using ps_page_pkey on ps_page
(cost=0.00..2999686.03 rows=86083592 width=12) (actual
time=0.029..36659.393 rows=85591467 loops=1)
       ->  Sort  (cost=18139.39..18152.52 rows=6255 width=4) (actual
time=0.080..0.083 rows=5 loops=1)
             Sort Key: documents_ps_page.ps_page_id
             Sort Method:  quicksort  Memory: 25kB
             ->  Nested Loop  (cost=26.23..17808.09 rows=6255 width=4)
(actual time=0.044..0.073 rows=5 loops=1)
                   ->  HashAggregate  (cost=26.23..27.83 rows=200
width=32) (actual time=0.015..0.017 rows=5 loops=1)
                         ->  Seq Scan on temp_document_ids
(cost=0.00..23.48 rows=1310 width=32) (actual time=0.004..0.007 rows=5
loops=1)
                   ->  Index Scan using documents_ps_page_pkey on
documents_ps_page  (cost=0.00..88.59 rows=31 width=42) (actual
time=0.009..0.010 rows=1 loops=5)
                         Index Cond: (documents_ps_page.document_id =
(temp_document_ids.document_id)::text)
     Total runtime: 54753.028 ms

The query without the "order by" (fast):

production=> explain analyze select page_number, ps_id, ps_page_id from
ps_page where ps_page_id in (select ps_page_id from documents_ps_page
where document_id in (select document_id from temp_doc_ids));

    QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------------
  Nested Loop  (cost=17821.42..87598.71 rows=34398932 width=12) (actual
time=0.099..0.136 rows=5 loops=1)
    ->  HashAggregate  (cost=17821.42..17871.46 rows=6255 width=4)
(actual time=0.083..0.096 rows=5 loops=1)
          ->  Nested Loop  (cost=26.23..17808.28 rows=6255 width=4)
(actual time=0.047..0.076 rows=5 loops=1)
                ->  HashAggregate  (cost=26.23..27.83 rows=200 width=32)
(actual time=0.014..0.015 rows=5 loops=1)
                      ->  Seq Scan on temp_doc_ids  (cost=0.00..23.48
rows=1310 width=32) (actual time=0.005..0.005 rows=5 loops=1)
                ->  Index Scan using documents_ps_page_pkey on
documents_ps_page  (cost=0.00..88.59 rows=31 width=42) (actual
time=0.010..0.010 rows=1 loops=5)
                      Index Cond: (documents_ps_page.document_id =
temp_doc_ids.document_id)
    ->  Index Scan using ps_page_pkey on ps_page  (cost=0.00..11.14
rows=1 width=12) (actual time=0.007..0.007 rows=1 loops=5)
          Index Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id)
  Total runtime: 0.213 ms
(10 rows)

We notice that in all cases, the plans contain some estimated row counts
that differ quite a bit from the actual row counts.  We tried increasing
(from 100 to 1,000 and 10,000) the statistics targets for each of the
indexed columns, one at a time, and analyzing the table/column with each
change.  This had no effect.

Postgres version 8.4.7 on AMD64, Debian Linux "wheezy" (aka "testing").

Where should we look next?

pgsql-performance by date:

From:
Date: 29 April 2011, 13:52:27
Subject: Re: FUSION-IO io cards

From: Greg Smith
Date: 29 April 2011, 15:56:18
Subject: Re: Performance

8.4.7, incorrect estimate - Mailing list pgsql-performance

Previous

Next