Home > mailing lists

Re: Huge Data sets, simple queries - Mailing list pgsql-performance

From	Tom Lane
Subject	Re: Huge Data sets, simple queries
Date	January 28, 2006 14:55:13
Msg-id	18925.1138474508@sss.pgh.pa.us Whole thread Raw
In response to	Re: Huge Data sets, simple queries (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-performance

Tree view

I wrote:
> (We might need to tweak the planner to discourage selecting
> HashAggregate in the presence of DISTINCT aggregates --- I don't
> remember whether it accounts for the sortmem usage in deciding
> whether the hash will fit in memory or not ...)

Ah, I take that all back after checking the code: we don't use
HashAggregate at all when there are DISTINCT aggregates, precisely
because of this memory-blow-out problem.

For both your group-by-date query and the original group-by-month query,
the plan of attack is going to be to read the original input in grouping
order (either via sort or indexscan, with sorting probably preferred
unless the table is pretty well correlated with the index) and then
sort/uniq on the DISTINCT value within each group.  The OP is probably
losing on that step compared to your test because it's over much larger
groups than yours, forcing some spill to disk.  And most likely he's not
got an index on month, so the first sort is in fact a sort and not an
indexscan.

Bottom line is that he's probably doing a ton of on-disk sorting
where you're not doing any.  This makes me think Luke's theory about
inadequate disk horsepower may be on the money.

            regards, tom lane

pgsql-performance by date:

From: "Luke Lonergan"
Date: 28 January 2006, 13:54:15
Subject: Re: Huge Data sets, simple queries

From: hubert depesz lubaczewski
Date: 29 January 2006, 07:25:26
Subject: Re: Huge Data sets, simple queries

Re: Huge Data sets, simple queries - Mailing list pgsql-performance

Previous

Next