Re: select count(distinct ...) is slower than select distinct in about 5x - Mailing list pgsql-performance

From jacket41142
Subject Re: select count(distinct ...) is slower than select distinct in about 5x
Date
Msg-id CAONnt+5Atj_XvLtzAGGnoUN7_hAqoV-5+cahqkcuXn7VhtYTqg@mail.gmail.com
Whole thread Raw
Responses Re: select count(distinct ...) is slower than select distinct in about 5x
Re: select count(distinct ...) is slower than select distinct in about 5x
List pgsql-performance
Thanks very much.

I think another problem is that the cost estimation isn't good enough to reflex real cost. Since we can see, from "explain analyze ...", count(distinct ...) has smallest cost between the others, but since it uses sorts, the time complexity should be higher especially for large amount of rows.

Also I think even if we can have multiple count() expressions, the optimizer should also be able to choose between use sort, HashAggregate or maybe something like linear aggregate if sorts are not needed or other methods if exist. Also this may be done as just one job for entire table of interested columns, or for each column separately.

regards,
jacket41142


2013/12/11 Jeff Janes <jeff.janes@gmail.com>
On Tue, Dec 10, 2013 at 9:28 AM, jacket41142 <jacket41142@gmail.com> wrote:
 

test=> select distinct col_int from t1 group by col_int;
Time: 1177.936 ms

So the performance difference is not very large.
But when I do that:

test=> select count(distinct col_int) from t1;
 count
-------
  1025
(1 row)

Time: 7367.476 ms


count(distinct ...) always sorts, rather than using a hash, to do its work.  I don't think that there is any fundamental reason that it could not be changed to allow it to use hashing, it just hasn't been done yet.  It is complicated by the fact that you can have multiple count() expressions in the same query which demand sorting/grouping on different columns.

Cheers,

Jeff

pgsql-performance by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Explain analyze time overhead
Next
From: jacket41142
Date:
Subject: Re: select count(distinct ...) is slower than select distinct in about 5x