Thread: DISTINCT -> GROUP BY

DISTINCT -> GROUP BY

From

David Fetter

Date:

03 June 2008, 10:03:53

Folks,

I've noticed that queries of the form

SELECT DISTNCT foo, bar, baz
FROM quux
WHERE ...

perform significantly worse than the equivalent using GROUP BY.

SELECT foo, bar, baz
FROM quux
WHERE ...
GROUP BY foo, bar, baz

Where would I start looking in order to make them actually equivalent
from the planner's point of view?  Also, would back-patching this make
sense?  It doesn't change any APIs, but it does make some queries go
faster.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: DISTINCT -> GROUP BY

From

"Pavel Stehule"

Date:

03 June 2008, 10:36:55

Hello David

http://www.postgresql.org/docs/faqs.TODO.html

Consider using hash buckets to do DISTINCT, rather than sorting
This would be beneficial when there are few distinct values. This is
already used by GROUP BY.

Regards
Pavel Stehule

2008/6/3 David Fetter <david@fetter.org>:
> Folks,
>
> I've noticed that queries of the form
>
> SELECT DISTNCT foo, bar, baz
> FROM quux
> WHERE ...
>
> perform significantly worse than the equivalent using GROUP BY.
>
> SELECT foo, bar, baz
> FROM quux
> WHERE ...
> GROUP BY foo, bar, baz
>
> Where would I start looking in order to make them actually equivalent
> from the planner's point of view?  Also, would back-patching this make
> sense?  It doesn't change any APIs, but it does make some queries go
> faster.
>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: DISTINCT -> GROUP BY

From

David Fetter

Date:

03 June 2008, 11:03:16

On Tue, Jun 03, 2008 at 03:36:44PM +0200, Pavel Stehule wrote:
> Hello David
> 
> http://www.postgresql.org/docs/faqs.TODO.html
> 
> Consider using hash buckets to do DISTINCT, rather than sorting This
> would be beneficial when there are few distinct values. This is
> already used by GROUP BY.

It's nice to see that this is kinda on the TODO, but it doesn't
address the question I asked, which is, "how would I get the planner
to rewrite DISTINCTs as the equivalent GROUP BYs?" :)

Any hints?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: DISTINCT -> GROUP BY

From

"Pavel Stehule"

Date:

03 June 2008, 11:10:44

2008/6/3 David Fetter <david@fetter.org>:
> On Tue, Jun 03, 2008 at 03:36:44PM +0200, Pavel Stehule wrote:
>> Hello David
>>
>> http://www.postgresql.org/docs/faqs.TODO.html
>>
>> Consider using hash buckets to do DISTINCT, rather than sorting This
>> would be beneficial when there are few distinct values. This is
>> already used by GROUP BY.
>
> It's nice to see that this is kinda on the TODO, but it doesn't
> address the question I asked, which is, "how would I get the planner
> to rewrite DISTINCTs as the equivalent GROUP BYs?" :)

you can't to do it :(

Pavel

>
> Any hints?
>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>

Re: DISTINCT -> GROUP BY

From

Decibel!

Date:

04 June 2008, 09:36:59

On Jun 3, 2008, at 9:03 AM, David Fetter wrote:
> On Tue, Jun 03, 2008 at 03:36:44PM +0200, Pavel Stehule wrote:
>> Hello David
>>
>> http://www.postgresql.org/docs/faqs.TODO.html
>>
>> Consider using hash buckets to do DISTINCT, rather than sorting This
>> would be beneficial when there are few distinct values. This is
>> already used by GROUP BY.
>
> It's nice to see that this is kinda on the TODO, but it doesn't
> address the question I asked, which is, "how would I get the planner
> to rewrite DISTINCTs as the equivalent GROUP BYs?" :)

IIRC, the patch to allow GROUP BY to do hashing is fairly recent...  
you could probably find it in archives.
-- 
Decibel!, aka Jim C. Nasby, Database Architect  decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828