Planner having way wrong estimate for group aggregate - Mailing list pgsql-performance
From | Steinar H. Gunderson |
---|---|
Subject | Planner having way wrong estimate for group aggregate |
Date | |
Msg-id | 20040918170117.GA3354@uio.no Whole thread Raw |
Responses |
Re: Planner having way wrong estimate for group aggregate
|
List | pgsql-performance |
Hi, I'm using PostgreSQL 7.4 on a table with ~700.000 rows looking like this: Table "public.enkeltsalg" Column | Type | Modifiers ------------+--------------------------+------------------------------------------------------- id | integer | not null default nextval('"enkeltsalg_id_seq"'::text) kommentar | text | not null default ''::text antall | numeric(14,4) | not null belop | numeric(10,0) | not null type | character(1) | not null tid | timestamp with time zone | default now() eksternid | integer | kasseid | integer | baraapning | integer | salgspris | integer | firma | integer | bongid | integer | Indexes: "enkeltsalg_pkey" primary key, btree (id) "enkeltsalg_aapn" btree (baraapning) "enkeltsalg_aapn_pris" btree (baraapning, salgspris) "enkeltsalg_aapn_type" btree (baraapning, "type") "enkeltsalg_pris" btree (salgspris) Check constraints: "enkeltsalg_type_valid" CHECK ("type" = 'K'::bpchar OR "type" = 'B'::bpchar OR "type" = 'M'::bpchar OR "type" = 'T'::bpchar) And I'm doing the query (after VACUUM ANALYZE) smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=108062.34..114477.98 rows=172735 width=17) (actual time=20977.544..23890.020 rows=361 loops=1) -> Sort (cost=108062.34..109912.99 rows=740263 width=17) (actual time=20947.372..21627.107 rows=710720 loops=1) Sort Key: date_trunc('day'::text, tid) -> Seq Scan on enkeltsalg (cost=0.00..18010.29 rows=740263 width=17) (actual time=0.091..7180.528 rows=710720loops=1) Total runtime: 23908.538 ms (5 rows) Now, as you can see, the GroupAggregate here is _way_ off, so the planner makes the wrong choice (it should do a hash aggregate). If I set sort_mem to 131072 instead of 16384, it does a hash aggregate (which is 10 seconds instead of 24), but I can't have sort_mem that high generally. Now, my first notion was creating a functional index to help the planner: smt=# create index enkeltsalg_dag on enkeltsalg ( date_trunc('day',tid) ); CREATE INDEX smt=# vacuum analyze; VACUUM However, this obviously didn't help the planner (this came as a surprise to me, but probably won't come as a surprise to the more seasoned users here :-) ): smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=103809.15..110017.11 rows=175512 width=17) (actual time=21061.357..23917.370 rows=361 loops=1) -> Sort (cost=103809.15..105585.95 rows=710720 width=17) (actual time=21032.239..21695.674 rows=710720 loops=1) Sort Key: date_trunc('day'::text, tid) -> Seq Scan on enkeltsalg (cost=0.00..17641.00 rows=710720 width=17) (actual time=0.091..7231.387 rows=710720loops=1) Total runtime: 23937.791 ms (5 rows) I also tried to increase the statistics on the "tid" column: smt=# alter table enkeltsalg alter column tid set statistics 500; ALTER TABLE smt=# analyze enkeltsalg; ANALYZE However, this made the planner only do a _worse_ estimate: smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato from enkeltsalg group by date_trunc('day',tid); QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------- GroupAggregate (cost=107906.59..114449.09 rows=199715 width=17) (actual time=20947.197..23794.389 rows=361 loops=1) -> Sort (cost=107906.59..109754.56 rows=739190 width=17) (actual time=20918.001..21588.735 rows=710720 loops=1) Sort Key: date_trunc('day'::text, tid) -> Seq Scan on enkeltsalg (cost=0.00..17996.88 rows=739190 width=17) (actual time=0.092..7166.488 rows=710720loops=1) Total runtime: 23814.624 ms (5 rows) Actually, it seems that the higher I set statistics on "tid", the worse the estimate becomes. Also, I was told (on #postgresql :-) ) to include the following information: smt=# select n_distinct from pg_stats where attname='tid'; n_distinct ------------ -0.270181 (1 row) Any ideas for speeding this up? /* Steinar */ -- Homepage: http://www.sesse.net/
pgsql-performance by date: