Thread: need help with some aggregation magic

need help with some aggregation magic

From

Andreas

Date:

09 June 2011, 10:50:17

hi,
I have a log-table that stores events of users and projects like this
( user_id integer, project_id integer, ts timestamp, event_type integer )

I need an aggregated list of worktime per user, per project, per day.

The users can switch projects during the day so I can't work this out 
with min(ts) and max(ts).

Is there a clever way to get this with SQL ?

Re: need help with some aggregation magic

From

"Oliveiros d'Azevedo Cristina"

Date:

09 June 2011, 11:32:13

The ts means the time the user started on a project ?
Or the time he finished?
Or can mean both? If so, how do you can tell one from the other? Different 
event_type s ?
Is it correct to assume from your words that an user cannot be in more than 
one project at the time? If so, can't be overlapping, right?

Best,
Oliveiros

----- Original Message ----- 
From: "Andreas" <maps.on@gmx.net>
To: <pgsql-sql@postgresql.org>
Sent: Thursday, June 09, 2011 2:43 PM
Subject: [SQL] need help with some aggregation magic


> hi,
> I have a log-table that stores events of users and projects like this
> ( user_id integer, project_id integer, ts timestamp, event_type integer )
>
> I need an aggregated list of worktime per user, per project, per day.
>
> The users can switch projects during the day so I can't work this out with 
> min(ts) and max(ts).
>
> Is there a clever way to get this with SQL ?
>
> -- 
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql

Re: need help with some aggregation magic

From

Andreas

Date:

09 June 2011, 13:13:49

The log holds events and the ts is just the timestamp when the event 
occured.
The events are kind of "opened form xxx with id xxx", "clicked button 
xxx", "switched to record xxx", ... They were primarily meant for 
helping me to find possible bugs when the user complains that it doesn't 
work but can't say what he did or where the error came up.

The projects don't overlap per user.

So I have time intervals with events for a project and I need to find 
the first and last event for every interval to add up the time 
difference and calculate the sum per day.



Am 09.06.2011 16:16, schrieb Oliveiros d'Azevedo Cristina:
> The ts means the time the user started on a project ?
> Or the time he finished?
> Or can mean both? If so, how do you can tell one from the other? 
> Different event_type s ?
> Is it correct to assume from your words that an user cannot be in more 
> than one project at the time? If so, can't be overlapping, right?
>
> Best,
> Oliveiros
>
> ----- Original Message ----- From: "Andreas" <maps.on@gmx.net>
> To: <pgsql-sql@postgresql.org>
> Sent: Thursday, June 09, 2011 2:43 PM
> Subject: [SQL] need help with some aggregation magic
>
>
>> hi,
>> I have a log-table that stores events of users and projects like this
>> ( user_id integer, project_id integer, ts timestamp, event_type 
>> integer )
>>
>> I need an aggregated list of worktime per user, per project, per day.
>>
>> The users can switch projects during the day so I can't work this out 
>> with min(ts) and max(ts).
>>
>> Is there a clever way to get this with SQL ?
>>
>> -- 
>> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-sql 
>
>

Re: need help with some aggregation magic

From

Richard Broersma

Date:

09 June 2011, 13:20:23

On Thu, Jun 9, 2011 at 6:43 AM, Andreas <maps.on@gmx.net> wrote:

> I have a log-table that stores events of users and projects like this
> ( user_id integer, project_id integer, ts timestamp, event_type integer )
>
> I need an aggregated list of worktime per user, per project, per day.
>
> The users can switch projects during the day so I can't work this out with
> min(ts) and max(ts).

SELECT user_id, project_id, date_trunc(  'day', ts ) as event_day,             MIN( ts ) AS event_start, MAX( ts ) AS
event_end,            MAX( ts ) - MIN( ts ) AS duration FROM Loggingtable
 
GROUP BY user_id, project_id, date_trunc( 'day', ts )
ORDER BY date_trunc( 'day', ts ), user_id, project_id;


-- 
Regards,
Richard Broersma Jr.

Re: need help with some aggregation magic

From

Andreas

Date:

09 June 2011, 13:55:46

Am 09.06.2011 18:20, schrieb Richard Broersma:
> On Thu, Jun 9, 2011 at 6:43 AM, Andreas<maps.on@gmx.net>  wrote:
>
>> I have a log-table that stores events of users and projects like this
>> ( user_id integer, project_id integer, ts timestamp, event_type integer )
>>
>> I need an aggregated list of worktime per user, per project, per day.
>>
>> The users can switch projects during the day so I can't work this out with
>> min(ts) and max(ts).
> SELECT user_id, project_id, date_trunc(  'day', ts ) as event_day,
>                MIN( ts ) AS event_start, MAX( ts ) AS event_end,
>                MAX( ts ) - MIN( ts ) AS duration
>    FROM Loggingtable
> GROUP BY user_id, project_id, date_trunc( 'day', ts )
> ORDER BY date_trunc( 'day', ts ), user_id, project_id;
>
As far as I understand you calculate the duration as the difference 
between the first and last event of a project per day.
There is a problem because a user can work from 08.00 to 10.00 on 
project 1 and then from 10.00 to 12.00 on project 2 and then from 12.00 
to 16.00 on project 1 again.
Then I get   project 1   8 hours  plus  project 2   2 hours   though the 
user actually was just 8 hours there.

Re: need help with some aggregation magic

From

Kevin Crain

Date:

09 June 2011, 18:39:34

Try this:

select user_id, project_id, date_trunc, sum(sum) FROM (select user_id,
project_id, date_trunc('day', ts), SUM(duration) FROM (select user_id,
project_id, a.ts, ((SELECT MIN(b.ts) FROM log b WHERE b.ts>a.ts AND
(date_trunc('day',a.ts)=date_trunc('day',b.ts)))-a.ts) AS duration
from log a order by user_id, project_id, ts) AS foo group by user_id,
project_id, ts) AS day_set group by user_id, project_id, date_trunc
order by user_id, project_id, date_trunc;


-Kevin Crain

On Thu, Jun 9, 2011 at 6:43 AM, Andreas <maps.on@gmx.net> wrote:
> hi,
> I have a log-table that stores events of users and projects like this
> ( user_id integer, project_id integer, ts timestamp, event_type integer )
>
> I need an aggregated list of worktime per user, per project, per day.
>
> The users can switch projects during the day so I can't work this out with
> min(ts) and max(ts).
>
> Is there a clever way to get this with SQL ?
>
> --
> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-sql
>

Re: need help with some aggregation magic

From

Edgardo Portal

Date:

10 June 2011, 02:09:36

On 2011-06-09, Andreas <maps.on@gmx.net> wrote:
> Am 09.06.2011 18:20, schrieb Richard Broersma:
>> On Thu, Jun 9, 2011 at 6:43 AM, Andreas<maps.on@gmx.net>  wrote:
>>
>>> I have a log-table that stores events of users and projects like this
>>> ( user_id integer, project_id integer, ts timestamp, event_type integer )
>>>
>>> I need an aggregated list of worktime per user, per project, per day.
>>>
>>> The users can switch projects during the day so I can't work this out with
>>> min(ts) and max(ts).
>> SELECT user_id, project_id, date_trunc(  'day', ts ) as event_day,
>>                MIN( ts ) AS event_start, MAX( ts ) AS event_end,
>>                MAX( ts ) - MIN( ts ) AS duration
>>    FROM Loggingtable
>> GROUP BY user_id, project_id, date_trunc( 'day', ts )
>> ORDER BY date_trunc( 'day', ts ), user_id, project_id;
>>
> As far as I understand you calculate the duration as the difference 
> between the first and last event of a project per day.
> There is a problem because a user can work from 08.00 to 10.00 on 
> project 1 and then from 10.00 to 12.00 on project 2 and then from 12.00 
> to 16.00 on project 1 again.
> Then I get   project 1   8 hours  plus  project 2   2 hours   though the 
> user actually was just 8 hours there.

Unclear to me what to do with the last event of the day (i.e. is there
a implicit end-of-work time to use, or does your usage pattern
guarantee a "closing event" to always be present?), but what about
something like:

BEGIN ;

CREATE TABLE upes ( uid     integer,pid     integer,ts      timestamp without time zone,evtype  integer
) ;

-- ...user 100 works on 3 projects Monday
INSERT INTO upes VALUES(100, 11, '2011-06-06 13:00', 1) ;
INSERT INTO upes VALUES(100, 11, '2011-06-06 13:30', 2) ;
INSERT INTO upes VALUES(100, 22, '2011-06-06 13:45', 1) ;
INSERT INTO upes VALUES(100, 33, '2011-06-06 18:00', 1) ;

-- ...user 100 works on 2 projects Tuesday
INSERT INTO upes VALUES(100, 11, '2011-06-07 13:00', 1) ;
INSERT INTO upes VALUES(100, 33, '2011-06-07 13:30', 2) ;
INSERT INTO upes VALUES(100, 33, '2011-06-07 17:45', 1) ;

-- ...user 200 works also works on Tuesday
INSERT INTO upes VALUES(200, 11, '2011-06-07 13:00', 1) ;
INSERT INTO upes VALUES(200, 33, '2011-06-07 13:30', 2) ;
INSERT INTO upes VALUES(200, 33, '2011-06-07 13:45', 4) ;
INSERT INTO upes VALUES(200, 33, '2011-06-07 19:45', 8) ;

-- ...attempt to summarize
WITH ius AS (
SELECT upes.ts AS uts, upes.uid, upes.pid, upes.evtype, upes_next.ts AS nts FROM upes      LEFT JOIN upes upes_next
       ON upes_next.ts = (                SELECT min(utmp.ts)                  FROM upes utmp                 WHERE
utmp.uid=upes.uid                  AND utmp.ts > upes.ts                   AND utmp.ts::date=upes.ts::date
 )
 
)
SELECT ius.uts::date, ius.uid, ius.pid, SUM(nts-uts) FROM iusGROUP BY 1,2,3ORDER BY 1,2,3
;

ROLLBACK ;

with results of:   uts     | uid | pid |   sum    
------------+-----+-----+----------2011-06-06 | 100 |  11 | 00:45:002011-06-06 | 100 |  22 | 04:15:002011-06-06 | 100 |
33 | 2011-06-07 | 100 |  11 | 01:00:002011-06-07 | 100 |  33 | 04:15:002011-06-07 | 200 |  11 | 01:00:002011-06-07 |
200|  33 | 06:15:00
 
(7 rows)