Thread: Aggregate Function to return most common value for a column
Hi Experts, I want to use an aggregate function that will return the most commonly occurring value in a column. The column consists of VARCHAR(32) values. Is it possible to construct such an aggregate using PL/PgSql ? If I was trying to do something like this in Perl I would use a hash table to store the values and the number of times each was seen as the table was iterated, but PL/PgSql does not seem to have an appropriate data type for that? I don't want to use PL/Perl to avoid the overhead of starting a perl interpreter for that. Do I have to write the function in C maybe? Thanks in advance, Regards Mike Harris
Hello I thing, so the function in C is the best solution. And I thing, so you can use PostgreSQL functionality inside. Pg support hash arrays and hashing function too. regards Pavel Stehule 2009/5/22 Michael Harris <michael.harris@ericsson.com>: > Hi Experts, > > I want to use an aggregate function that will return the most commonly > occurring value in a column. > > The column consists of VARCHAR(32) values. > > Is it possible to construct such an aggregate using PL/PgSql ? > > If I was trying to do something like this in Perl I would use a hash > table to store the values and the number of times each was seen as the > table was iterated, but PL/PgSql does not seem to have an appropriate > data type for that? > > I don't want to use PL/Perl to avoid the overhead of starting a perl > interpreter for that. > > Do I have to write the function in C maybe? > > > Thanks in advance, > Regards > Mike Harris > > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general >
On Fri, 22 May 2009 17:48:44 +1000 "Michael Harris" <michael.harris@ericsson.com> wrote: > Hi Experts, > > I want to use an aggregate function that will return the most > commonly occurring value in a column. > > The column consists of VARCHAR(32) values. > > Is it possible to construct such an aggregate using PL/PgSql ? > > If I was trying to do something like this in Perl I would use a > hash table to store the values and the number of times each was > seen as the table was iterated, but PL/PgSql does not seem to have > an appropriate data type for that? > > I don't want to use PL/Perl to avoid the overhead of starting a > perl interpreter for that. > > Do I have to write the function in C maybe? Isn't it a job for group by? select count(*), myvalue from table group by myvalue order by count(*) desc limit 1; -- Ivan Sergio Borgonovo http://www.webthatworks.it
> I want to use an aggregate function that will return the most commonly > occurring value in a column. It's actually dead simple in Postgres. No C either. You just need to create an aggregate function. I wrote a most() aggregate a while back that does exactly what you are asking for. Here, I'll add it to my blog... http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/ Scott Bailey
On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote: > > I want to use an aggregate function that will return the most > > commonly occurring value in a column. > > It's actually dead simple in Postgres. No C either. You just need to > create an aggregate function. I wrote a most() aggregate a while > back that does exactly what you are asking for. Here, I'll add it to > my blog... > > http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/ Such an aggregate should probably be called, "mode," that being the probability/statistics name for the concept. Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote:
> > I want to use an aggregate function that will return the most
> > commonly occurring value in a column.
>
> It's actually dead simple in Postgres. No C either. You just need to
> create an aggregate function. I wrote a most() aggregate a while
> back that does exactly what you are asking for. Here, I'll add it to
> my blog...
>
> http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/
> Such an aggregate should probably be called, "mode," that being the
> probability/statistics name for the concept.
Excellent observation Dave. Sometimes I can't see outside of the box I'm in. And at the time I was focusing on text so statistics was in another box. I've update post with final functions for mode(), median() and range().
Scott
On Fri, May 22, 2009 at 05:23:47PM +0000, artacus@comcast.net wrote: > > On Fri, May 22, 2009 at 03:23:07PM +0000, artacus@comcast.net wrote: > > > I want to use an aggregate function that will return the most > > > commonly occurring value in a column. > > > > It's actually dead simple in Postgres. No C either. You just need to > > create an aggregate function. I wrote a most() aggregate a while > > back that does exactly what you are asking for. Here, I'll add it to > > my blog... > > > > http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggregates-most/ > > > Such an aggregate should probably be called, "mode," that being the > > probability/statistics name for the concept. > > Excellent observation Dave. Sometimes I can't see outside of the box I'm in. And at the time I was focusing on text sostatistics was in another box. I've update post with final functions for mode(), median() and range(). Thanks! :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
artacus@comcast.net wrote: > Excellent observation Dave. Sometimes I can't see outside of the box I'm in. > And at the time I was focusing on text so statistics was in another box. I've > update post with final functions for mode(), median() and range(). Hey, if you want to add your functions to http://wiki.postgresql.org/wiki/Snippets , that would be great. -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Friday 22. May 2009, Alvaro Herrera wrote: >Hey, if you want to add your functions to > http://wiki.postgresql.org/wiki/Snippets , that would be great. +1 -- Leif Biberg Kristensen | Registered Linux User #338009 Me And My Database: http://solumslekt.org/blog/
Hi Scott, Brilliant, that's exactly what I wanted. I guess the only thing that worries me is if the table being aggregated is very large, I assume this solution will use a lot of memory - since it creates an array containing all of the values in the target expression - but I suspect in my application that won't be a problem. Thanks again, Regards // Mike -----Original Message----- From: artacus@comcast.net [mailto:artacus@comcast.net] Sent: Saturday, 23 May 2009 1:23 AM To: Michael Harris Cc: pgsql-general@postgresql.org Subject: Re: [GENERAL] Aggregate Function to return most common value for a column > I want to use an aggregate function that will return the most commonly > occurring value in a column. It's actually dead simple in Postgres. No C either. You just need to create an aggregate function. I wrote a most() aggregate a while back that does exactly what you are asking for. Here, I'll add it to my blog... http://scottrbailey.wordpress.com/2009/05/22/postgres-adding-custom-aggr egates-most/ Scott Bailey
I've used this same concept in subqueries for a very long time. Doing this allows me to "dive in" and get other values from the joined table, rather than just the thing that we're getting the most of.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"I kept looking for somebody to solve the problem.
Then I realized... I am somebody!"
-- Author Unknown
--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.