Re: Huge Data sets, simple queries - Mailing list pgsql-performance

From Tom Lane
Subject Re: Huge Data sets, simple queries
Date
Msg-id 18925.1138474508@sss.pgh.pa.us
Whole thread Raw
In response to Re: Huge Data sets, simple queries  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-performance
I wrote:
> (We might need to tweak the planner to discourage selecting
> HashAggregate in the presence of DISTINCT aggregates --- I don't
> remember whether it accounts for the sortmem usage in deciding
> whether the hash will fit in memory or not ...)

Ah, I take that all back after checking the code: we don't use
HashAggregate at all when there are DISTINCT aggregates, precisely
because of this memory-blow-out problem.

For both your group-by-date query and the original group-by-month query,
the plan of attack is going to be to read the original input in grouping
order (either via sort or indexscan, with sorting probably preferred
unless the table is pretty well correlated with the index) and then
sort/uniq on the DISTINCT value within each group.  The OP is probably
losing on that step compared to your test because it's over much larger
groups than yours, forcing some spill to disk.  And most likely he's not
got an index on month, so the first sort is in fact a sort and not an
indexscan.

Bottom line is that he's probably doing a ton of on-disk sorting
where you're not doing any.  This makes me think Luke's theory about
inadequate disk horsepower may be on the money.

            regards, tom lane

pgsql-performance by date:

Previous
From: "Luke Lonergan"
Date:
Subject: Re: Huge Data sets, simple queries
Next
From: hubert depesz lubaczewski
Date:
Subject: Re: Huge Data sets, simple queries