Home > mailing lists

Thread: difference when using 'distinct on'

difference when using 'distinct on'

From

"Johnson, Shaunn"

Date:

12 September 2003, 10:30:15

Howdy:

Can someone tell what the difference (and why
you would use it) is between the following:

[snip]
select distinct on (col_1, col_2),
col_1,
col_2,
col_3
from t_table

select distinct
col_1,
col_2,
col_3
from t_table
[/snip]

In the first example, is it just getting
the unique rows for the first two columns?

Thanks!

-X

Re: difference when using 'distinct on'

From

Bruno Wolff III

Date:

12 September 2003, 13:01:00

On Fri, Sep 12, 2003 at 09:29:39 -0400,
  "Johnson, Shaunn" <SJohnson6@bcbsm.com> wrote:
>
> In the first example, is it just getting
> the unique rows for the first two columns?

Yes. Normally you would use distinct on in combination with an order by
so that you get the particular unique row you are interested in.
For example if col1 and col2 were an event type and col3 was a timestamp
for the event, you could use distinct on to get the earliest or latest
timestamp for each event type.

Re: difference when using 'distinct on'

From

Stephan Szabo

Date:

12 September 2003, 23:14:40

On Fri, 12 Sep 2003, Johnson, Shaunn wrote:

> Howdy:
>
> Can someone tell what the difference (and why
> you would use it) is between the following:
>
> [snip]
> select distinct on (col_1, col_2),
> col_1,
> col_2,
> col_3
> from t_table
>
> --
>
> select distinct
> col_1,
> col_2,
> col_3
> from t_table
> [/snip]
>
> In the first example, is it just getting
> the unique rows for the first two columns?

In the first, for each set of rows that have a distinct col1,col2
value it's taking one of those rows and using its col3 value.
It's like group by, but less restrictive since you don't need
to use a set function on col_3.

In general distinct on in that fashion is most usable when
combined with an order by so that you can get a particular row
from each set. For example, you might say do something like:
 select distinct on (col1, col2) col1, col2, col3 from t_table
 order by col1, col2, col4;
In this case you should get the col3 value for each col1,col2
distinct group that corresponds to the row having the lowest col4
value.