Re: sequential scan on select distinct - Mailing list pgsql-performance

From Greg Stark
Subject Re: sequential scan on select distinct
Date
Msg-id 87pt3u78xw.fsf@stark.xeocode.com
Whole thread Raw
In response to sequential scan on select distinct  (Ole Langbehn <ole@freiheit.com>)
Responses Re: sequential scan on select distinct
List pgsql-performance
Pierre-Frédéric Caillaud <lists@boutiquenumerique.com> writes:

>     I see this as a minor annoyance only because I can write GROUP BY
> instead of DISTINCT and get the speed boost. It probably annoys people
> trying to port applications to postgres though, forcing them to rewrite
> their queries.

Yeah, really DISTINCT and DISTINCT ON are just special cases of GROUP BY. It
seems it makes more sense to put the effort into GROUP BY and just have
DISTINCT and DISTINCT ON go through the same code path. Effectively rewriting
it internally as a GROUP BY.

The really tricky part is that a DISTINCT ON needs to know about a first()
aggregate. And to make optimal use of indexes, a last() aggregate as well. And
ideally the planner/executor needs to know something is magic about
first()/last() (and potentially min()/max() at some point) and that they don't
need the complete set of tuples to calculate their results.

--
greg

pgsql-performance by date:

Previous
From: Bill Montgomery
Date:
Subject: Re: Excessive context switching on SMP Xeons
Next
From: Gabriele Bartolini
Date:
Subject: Re: Data warehousing requirements