Home > mailing lists

Re: Need some help with a query (uniq -c) - Mailing list pgsql-general

From	Kenichiro Tanaka
Subject	Re: Need some help with a query (uniq -c)
Date	April 13, 2010 08:13:02
Msg-id	4BC451A8.3010900@ashisuto.co.jp Whole thread Raw
In response to	Re: Need some help with a query (uniq -c) (Steve Atkins <steve@blighty.com>)
Responses	Re: Need some help with a query (uniq -c)
List	pgsql-general

Tree view

Hello.

I try with "With Query".
http://www.postgresql.org/docs/8.4/static/queries-with.html

#We can use "With Queries" >  v8.4
#That'll only work if the time values are contiguous, but there's probably a
#similar trick for non-contiguous ,too.

--create data
drop table foo;
create table foo( time int,message text);
insert into foo values(1,'a');
insert into foo values(2,'b');
insert into foo values(3,'b');
insert into foo values(4,'c');
insert into foo values(5,'a');
insert into foo values(6,'c');
insert into foo values(7,'c');
insert into foo values(8,'a');
insert into foo values(9,'a');
insert into foo values(10,'a');

--begin Answer
with recursive r as (
select foo.time,foo.message,1  as dummy from foo
union all
select foo.time,foo.message,r.dummy+1 from foo , r
where foo.time=r.time-1 and foo.message=r.message
)
,rr as (
select foo.time,foo.message,'OLID' as flag  from foo
union all
select foo.time,foo.message,'DUP' as flag from foo , rr
where foo.time-1=rr.time-2 and foo.message=rr.message
)
select time min,time+max(dummy)-1 max,message,max(dummy) counts
from r where time not in (select distinct (time+1) times from rr
where flag='DUP') group by time,message order by time;

--result
postgres(# where flag='DUP') group by time,message order by time;
  min | max | message | counts
-----+-----+---------+--------
    1 |   1 | a       |      1
    2 |   3 | b       |      2
    4 |   4 | c       |      1
    5 |   5 | a       |      1
    6 |   7 | c       |      2
    8 |  10 | a       |      3
(6 rows)

--end

But I think some one can provide more simple SQL.

Thank you.
> On Apr 12, 2010, at 11:31 AM, Scott Marlowe wrote:
>
>
>> On Mon, Apr 12, 2010 at 12:22 PM, A B<gentosaker@gmail.com>  wrote:
>>
>>> Hello!
>>>
>>> I have a table (think of it as a table of log messages)
>>>
>>> time |  message
>>> -----------------------
>>> 1      |   a
>>> 2      |   b
>>> 3      |   b
>>> 4      |  b
>>> 5      |  a
>>>
>>> the three 'b' are the same message, so I would like to write a query
>>> that would give me a result that is similar to what the unix command
>>> "uniq -c" would give:
>>>
>>> first |  message | last | count
>>> --------------------------------------
>>> 1     |     a              |   1   |     1
>>> 2     |     b              |   4   |     3<--- here it squeezes
>>> similar consecutive messages into a single row
>>> 5     |     a              |   5   |     1
>>>
>>> How do I write such a command?
>>>
>> Pretty straight ahead:
>>
>> select min(t), message, max(t), count(*) from table group by message.
>>
> That was my first though too, but it combines everything not just adjacent messages.
>
> Something like this, maybe
>
> select t1.message, t1.time as first, t2.time as last, t2.time-t1.time+1 as count
>          from foo as t1, foo as t2
>      where t1.time<= t2.time and t1.message = t2.message
>          and not exists
>              (select * from foo as t3
>               where (t3.time between t1.time and t2.time and t3.message<>  t1.message)
>               or (t3.time = t2.time + 1 and t3.message = t1.message)
>               or (t3.time = t1.time - 1 and t3.message = t1.message));
>
>   message | first | last | count
> ---------+-------+------+-------
>   a       |     1 |    1 |     1
>   b       |     2 |    4 |     3
>   a       |     5 |    5 |     1
>
> That'll only work if the time values are contiguous, but there's probably a
> similar trick for non-contiguous.
>
> Cheers,
>    Steve
>
>
>


--
================================================
Kenichiro Tanaka
K.K.Ashisuto
http://www.ashisuto.co.jp/english/index.html
================================================

pgsql-general by date:

From: tv@fuzzy.cz
Date: 13 April 2010, 08:05:55
Subject: Re: Dynamic Catalog Views

From: "Andrus"
Date: 13 April 2010, 08:50:17
Subject: Email address column verification for address list

Re: Need some help with a query (uniq -c) - Mailing list pgsql-general

Previous

Next