8.4.7, incorrect estimate - Mailing list pgsql-performance
From | Wayne Conrad |
---|---|
Subject | 8.4.7, incorrect estimate |
Date | |
Msg-id | 4DBB024B.5050500@databill.com Whole thread Raw |
Responses |
Re: 8.4.7, incorrect estimate
Re: 8.4.7, incorrect estimate |
List | pgsql-performance |
Howdy. We've got a query that takes less than a second unless we add a "order by" to it, after which it takes 40 seconds. Here's the query: select page_number, ps_id, ps_page_id from ps_page where ps_page_id in (select ps_page_id from documents_ps_page where document_id in (select document_id from temp_doc_ids)) order by ps_page_id; The parts of the schema used in this query: Table "public.ps_page" Column | Type | Modifiers -------------+---------+-------------------------------------------------------------- ps_page_id | integer | not null default nextval('ps_page_ps_page_id_seq'::regclass) ps_id | integer | not null page_number | integer | not null Indexes: "ps_page_pkey" PRIMARY KEY, btree (ps_page_id) "ps_page_ps_id_key" UNIQUE, btree (ps_id, page_number) Table "public.documents_ps_page" Column | Type | Modifiers -------------+---------+----------- document_id | text | not null ps_page_id | integer | not null Indexes: "documents_ps_page_pkey" PRIMARY KEY, btree (document_id, ps_page_id) "documents_ps_page_ps_page_id_idx" btree (ps_page_id) temp_doc_ids (temporary table): document_id text not null The query with the "order by" (slow): explain analyze select page_number, ps_id, ps_page_id from ps_page where ps_page_id in (select ps_page_id from documents_ps_page where document_id in (select document_id from temp_document_ids)) order by ps_page_id Merge Semi Join (cost=212570.02..3164648.31 rows=34398932 width=12) (actual time=54749.281..54749.295 rows=5 loops=1) Merge Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id) -> Index Scan using ps_page_pkey on ps_page (cost=0.00..2999686.03 rows=86083592 width=12) (actual time=0.029..36659.393 rows=85591467 loops=1) -> Sort (cost=18139.39..18152.52 rows=6255 width=4) (actual time=0.080..0.083 rows=5 loops=1) Sort Key: documents_ps_page.ps_page_id Sort Method: quicksort Memory: 25kB -> Nested Loop (cost=26.23..17808.09 rows=6255 width=4) (actual time=0.044..0.073 rows=5 loops=1) -> HashAggregate (cost=26.23..27.83 rows=200 width=32) (actual time=0.015..0.017 rows=5 loops=1) -> Seq Scan on temp_document_ids (cost=0.00..23.48 rows=1310 width=32) (actual time=0.004..0.007 rows=5 loops=1) -> Index Scan using documents_ps_page_pkey on documents_ps_page (cost=0.00..88.59 rows=31 width=42) (actual time=0.009..0.010 rows=1 loops=5) Index Cond: (documents_ps_page.document_id = (temp_document_ids.document_id)::text) Total runtime: 54753.028 ms The query without the "order by" (fast): production=> explain analyze select page_number, ps_id, ps_page_id from ps_page where ps_page_id in (select ps_page_id from documents_ps_page where document_id in (select document_id from temp_doc_ids)); QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=17821.42..87598.71 rows=34398932 width=12) (actual time=0.099..0.136 rows=5 loops=1) -> HashAggregate (cost=17821.42..17871.46 rows=6255 width=4) (actual time=0.083..0.096 rows=5 loops=1) -> Nested Loop (cost=26.23..17808.28 rows=6255 width=4) (actual time=0.047..0.076 rows=5 loops=1) -> HashAggregate (cost=26.23..27.83 rows=200 width=32) (actual time=0.014..0.015 rows=5 loops=1) -> Seq Scan on temp_doc_ids (cost=0.00..23.48 rows=1310 width=32) (actual time=0.005..0.005 rows=5 loops=1) -> Index Scan using documents_ps_page_pkey on documents_ps_page (cost=0.00..88.59 rows=31 width=42) (actual time=0.010..0.010 rows=1 loops=5) Index Cond: (documents_ps_page.document_id = temp_doc_ids.document_id) -> Index Scan using ps_page_pkey on ps_page (cost=0.00..11.14 rows=1 width=12) (actual time=0.007..0.007 rows=1 loops=5) Index Cond: (ps_page.ps_page_id = documents_ps_page.ps_page_id) Total runtime: 0.213 ms (10 rows) We notice that in all cases, the plans contain some estimated row counts that differ quite a bit from the actual row counts. We tried increasing (from 100 to 1,000 and 10,000) the statistics targets for each of the indexed columns, one at a time, and analyzing the table/column with each change. This had no effect. Postgres version 8.4.7 on AMD64, Debian Linux "wheezy" (aka "testing"). Where should we look next?
pgsql-performance by date: