Re: Optimizing suggestions - Mailing list pgsql-general
From | David Ford |
---|---|
Subject | Re: Optimizing suggestions |
Date | |
Msg-id | 3D15FE17.7040103@blue-labs.org Whole thread Raw |
In response to | Optimizing suggestions (David Ford <david+cert@blue-labs.org>) |
Responses |
Re: Optimizing suggestions
|
List | pgsql-general |
Tom Lane wrote: >David Ford <david+cert@blue-labs.org> writes: > > >>As stated above, I'd like to fetch a count of rows >>per day for the last N days. >> >> > >GROUP BY seems like the obvious answer... > So it does, bmilter=> explain analyze select count(*) from history group by timestamp::date; NOTICE: QUERY PLAN: Aggregate (cost=1784.68..1878.46 rows=1876 width=8) (actual time=515.60..661.22 rows=8 loops=1) -> Group (cost=1784.68..1831.57 rows=18755 width=8) (actual time=510.75..623.29 rows=19310 loops=1) -> Sort (cost=1784.68..1784.68 rows=18755 width=8) (actual time=510.73..535.28 rows=19310 loops=1) -> Seq Scan on history (cost=0.00..453.55 rows=18755 width=8) (actual time=0.05..354.04 rows=19310 loops=1) Total runtime: 667.70 msec EXPLAIN and.. bmilter=> explain analyze select count(*) from history where timestamp::date > (now()::date - '7 days'::interval) group by timestamp::date ; NOTICE: QUERY PLAN: Aggregate (cost=1175.93..1207.19 rows=625 width=8) (actual time=1233.34..1349.87 rows=7 loops=1) -> Group (cost=1175.93..1191.56 rows=6252 width=8) (actual time=1204.19..1313.24 rows=18693 loops=1) -> Sort (cost=1175.93..1175.93 rows=6252 width=8) (actual time=1204.17..1227.90 rows=18693 loops=1) -> Seq Scan on history (cost=0.00..781.76 rows=6252 width=8) (actual time=8.52..1049.05 rows=18693 loops=1) Total runtime: 1356.16 msec EXPLAIN bmilter=> select count(*) from history where timestamp::date > (now()::date - '7 days'::interval) group by timestamp::date ; count ------- 3743 3414 3513 3077 2642 1553 751 (7 rows) Now.. :) How can I further improve on this? This would be faster if I used two columns of 'date' and 'time', yes? >>In the near future I'm going to split this table up into many sub tables >> >> > >Why? Aren't you just going to have a need for lots of joins, if you >do that? > > regards, tom lane > No, actually most of this data will be drawn into focused reporting for graphing metrics. Only when I drill down to a specific (small) set of message serial numbers, will I be doing a join of all this data. In my naivety, which is more performance effective? Most of the time I will be searching for a count of statistics on a single column. Thank you for your assistance, knowledge is appreciated, David p.s. I am assuming count(*) v.s. count(column) is always faster because it's doing a generic row count v.s. accumulating stats on a column inside a row?
pgsql-general by date: