Thread: Aggregate Function to return most common value for a column

Aggregate Function to return most common value for a column

From

"Michael Harris"

Date:

22 May 2009, 05:04:06

Hi Experts,

I want to use an aggregate function that will return the most commonly
occurring value in a column.

The column consists of VARCHAR(32) values.

Is it possible to construct such an aggregate using PL/PgSql ?

If I was trying to do something like this in Perl I would use a hash
table to store the values and the number of times each was seen as the
table was iterated, but PL/PgSql does not seem to have an appropriate
data type for that?

I don't want to use PL/Perl to avoid the overhead of starting a perl
interpreter for that.

Do I have to write the function in C maybe?


Thanks in advance,
Regards
Mike Harris

Re: Aggregate Function to return most common value for a column

From

Pavel Stehule

Date:

22 May 2009, 05:39:23

Hello

I thing, so the function in C is the best solution. And I thing, so
you can use PostgreSQL functionality inside. Pg support hash arrays
and hashing function too.

regards
Pavel Stehule

2009/5/22 Michael Harris <michael.harris@ericsson.com>:
> Hi Experts,
>
> I want to use an aggregate function that will return the most commonly
> occurring value in a column.
>
> The column consists of VARCHAR(32) values.
>
> Is it possible to construct such an aggregate using PL/PgSql ?
>
> If I was trying to do something like this in Perl I would use a hash
> table to store the values and the number of times each was seen as the
> table was iterated, but PL/PgSql does not seem to have an appropriate
> data type for that?
>
> I don't want to use PL/Perl to avoid the overhead of starting a perl
> interpreter for that.
>
> Do I have to write the function in C maybe?
>
>
> Thanks in advance,
> Regards
> Mike Harris
>
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: Aggregate Function to return most common value for a column

From

Ivan Sergio Borgonovo

Date:

22 May 2009, 05:41:35

On Fri, 22 May 2009 17:48:44 +1000
"Michael Harris" <michael.harris@ericsson.com> wrote:

> Hi Experts,
>
> I want to use an aggregate function that will return the most
> commonly occurring value in a column.
>
> The column consists of VARCHAR(32) values.
>
> Is it possible to construct such an aggregate using PL/PgSql ?
>
> If I was trying to do something like this in Perl I would use a
> hash table to store the values and the number of times each was
> seen as the table was iterated, but PL/PgSql does not seem to have
> an appropriate data type for that?
>
> I don't want to use PL/Perl to avoid the overhead of starting a
> perl interpreter for that.
>
> Do I have to write the function in C maybe?

Isn't it a job for group by?

select count(*), myvalue from table group by myvalue order by
count(*) desc limit 1;

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: Aggregate Function to return most common value for a column

From

artacus@comcast.net

Date:

22 May 2009, 12:23:13

> I want to use an aggregate function that will return the most commonly
> occurring value in a column.

It's actually dead simple in Postgres. No C either. You just need to create an aggregate function.
I wrote a most() aggregate a while back that does exactly what you are asking for. Here, I'll add it to my blog...

http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/


Scott Bailey

Re: Aggregate Function to return most common value for a column

From

David Fetter

Date:

22 May 2009, 13:02:17

On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:
> > I want to use an aggregate function that will return the most
> > commonly occurring value in a column.
>
> It's actually dead simple in Postgres. No C either. You just need to
> create an aggregate function.  I wrote a most() aggregate a while
> back that does exactly what you are asking for. Here, I'll add it to
> my blog...
>
> http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/

Such an aggregate should probably be called, "mode," that being the
probability/statistics name for the concept.

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Aggregate Function to return most common value for a column

From

artacus@comcast.net

Date:

22 May 2009, 14:23:51

On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:
> > I want to use an aggregate function that will return the most
> > commonly occurring value in a column.
>
> It's actually dead simple in Postgres. No C either. You just need to
> create an aggregate function. I wrote a most() aggregate a while
> back that does exactly what you are asking for. Here, I'll add it to
> my blog...
>
> http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/

> Such an aggregate should probably be called, "mode," that being the
> probability/statistics name for the concept.

Excellent observation Dave. Sometimes I can't see outside of the box I'm in. And at the time I was focusing on text so statistics was in another box. I've update post with final functions for mode(), median() and range().

Scott

Re: Aggregate Function to return most common value for a column

From

David Fetter

Date:

22 May 2009, 17:36:12

On Fri, May 22, 2009 at 05:23:47PM +0000, artacus@comcast.net wrote:
>
> On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:
> > > I want to use an aggregate function that will return the most
> > > commonly occurring value in a column.
> >
> > It's actually dead simple in Postgres. No C either. You just need to
> > create an aggregate function. I wrote a most() aggregate a while
> > back that does exactly what you are asking for. Here, I'll add it to
> > my blog...
> >
> > http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/
>
> > Such an aggregate should probably be called, "mode," that being the
> > probability/statistics name for the concept.
>
> Excellent observation Dave. Sometimes I can't see outside of the box I'm in. And at the time I was focusing on text
sostatistics was in another box. I've update post with final functions for mode(), median() and range().  

Thanks! :)

Cheers,
David.
--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Aggregate Function to return most common value for a column

From

Alvaro Herrera

Date:

22 May 2009, 18:41:09

artacus@comcast.net wrote:

> Excellent observation Dave. Sometimes I can't see outside of the box I'm in.
> And at the time I was focusing on text so statistics was in another box. I've
> update post with final functions for mode(), median() and range().

Hey, if you want to add your functions to http://wiki.postgresql.org/wiki/Snippets ,
that would be great.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Aggregate Function to return most common value for a column

From

"Leif B. Kristensen"

Date:

22 May 2009, 19:02:47

On Friday 22. May 2009, Alvaro Herrera wrote:
>Hey, if you want to add your functions to
> http://wiki.postgresql.org/wiki/Snippets , that would be great.

+1
--
Leif Biberg Kristensen | Registered Linux User #338009
Me And My Database: http://solumslekt.org/blog/

Re: Aggregate Function to return most common value for a column

From

"Michael Harris"

Date:

22 May 2009, 19:14:11

Hi Scott,

Brilliant, that's exactly what I wanted.

I guess the only thing that worries me is if the table being aggregated
is very large, I assume this solution will use a lot of memory - since
it creates an array containing all of the values in the target
expression - but I suspect in my application that won't be a problem.

Thanks again,

Regards // Mike

-----Original Message-----
From: artacus@comcast.net [mailto:artacus@comcast.net]
Sent: Saturday, 23 May 2009 1:23 AM
To: Michael Harris
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Aggregate Function to return most common value
for a column

> I want to use an aggregate function that will return the most commonly

> occurring value in a column.

It's actually dead simple in Postgres. No C either. You just need to
create an aggregate function.
I wrote a most() aggregate a while back that does exactly what you are
asking for. Here, I'll add it to my blog...

http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggr
egates-most/


Scott Bailey

Re: Aggregate Function to return most common value for a column

From

Benjamin Smith

Date:

23 May 2009, 09:48:03

I've used this same concept in subqueries for a very long time. Doing this allows me to "dive in" and get other values from the joined table, rather than just the thing that we're getting the most of.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"I kept looking for somebody to solve the problem.
Then I realized... I am somebody!"

-- Author Unknown
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.