Thread: need help with some aggregation magic
hi, I have a log-table that stores events of users and projects like this ( user_id integer, project_id integer, ts timestamp, event_type integer ) I need an aggregated list of worktime per user, per project, per day. The users can switch projects during the day so I can't work this out with min(ts) and max(ts). Is there a clever way to get this with SQL ?
The ts means the time the user started on a project ? Or the time he finished? Or can mean both? If so, how do you can tell one from the other? Different event_type s ? Is it correct to assume from your words that an user cannot be in more than one project at the time? If so, can't be overlapping, right? Best, Oliveiros ----- Original Message ----- From: "Andreas" <maps.on@gmx.net> To: <pgsql-sql@postgresql.org> Sent: Thursday, June 09, 2011 2:43 PM Subject: [SQL] need help with some aggregation magic > hi, > I have a log-table that stores events of users and projects like this > ( user_id integer, project_id integer, ts timestamp, event_type integer ) > > I need an aggregated list of worktime per user, per project, per day. > > The users can switch projects during the day so I can't work this out with > min(ts) and max(ts). > > Is there a clever way to get this with SQL ? > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql
The log holds events and the ts is just the timestamp when the event occured. The events are kind of "opened form xxx with id xxx", "clicked button xxx", "switched to record xxx", ... They were primarily meant for helping me to find possible bugs when the user complains that it doesn't work but can't say what he did or where the error came up. The projects don't overlap per user. So I have time intervals with events for a project and I need to find the first and last event for every interval to add up the time difference and calculate the sum per day. Am 09.06.2011 16:16, schrieb Oliveiros d'Azevedo Cristina: > The ts means the time the user started on a project ? > Or the time he finished? > Or can mean both? If so, how do you can tell one from the other? > Different event_type s ? > Is it correct to assume from your words that an user cannot be in more > than one project at the time? If so, can't be overlapping, right? > > Best, > Oliveiros > > ----- Original Message ----- From: "Andreas" <maps.on@gmx.net> > To: <pgsql-sql@postgresql.org> > Sent: Thursday, June 09, 2011 2:43 PM > Subject: [SQL] need help with some aggregation magic > > >> hi, >> I have a log-table that stores events of users and projects like this >> ( user_id integer, project_id integer, ts timestamp, event_type >> integer ) >> >> I need an aggregated list of worktime per user, per project, per day. >> >> The users can switch projects during the day so I can't work this out >> with min(ts) and max(ts). >> >> Is there a clever way to get this with SQL ? >> >> -- >> Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-sql > >
On Thu, Jun 9, 2011 at 6:43 AM, Andreas <maps.on@gmx.net> wrote: > I have a log-table that stores events of users and projects like this > ( user_id integer, project_id integer, ts timestamp, event_type integer ) > > I need an aggregated list of worktime per user, per project, per day. > > The users can switch projects during the day so I can't work this out with > min(ts) and max(ts). SELECT user_id, project_id, date_trunc( 'day', ts ) as event_day, MIN( ts ) AS event_start, MAX( ts ) AS event_end, MAX( ts ) - MIN( ts ) AS duration FROM Loggingtable GROUP BY user_id, project_id, date_trunc( 'day', ts ) ORDER BY date_trunc( 'day', ts ), user_id, project_id; -- Regards, Richard Broersma Jr.
Am 09.06.2011 18:20, schrieb Richard Broersma: > On Thu, Jun 9, 2011 at 6:43 AM, Andreas<maps.on@gmx.net> wrote: > >> I have a log-table that stores events of users and projects like this >> ( user_id integer, project_id integer, ts timestamp, event_type integer ) >> >> I need an aggregated list of worktime per user, per project, per day. >> >> The users can switch projects during the day so I can't work this out with >> min(ts) and max(ts). > SELECT user_id, project_id, date_trunc( 'day', ts ) as event_day, > MIN( ts ) AS event_start, MAX( ts ) AS event_end, > MAX( ts ) - MIN( ts ) AS duration > FROM Loggingtable > GROUP BY user_id, project_id, date_trunc( 'day', ts ) > ORDER BY date_trunc( 'day', ts ), user_id, project_id; > As far as I understand you calculate the duration as the difference between the first and last event of a project per day. There is a problem because a user can work from 08.00 to 10.00 on project 1 and then from 10.00 to 12.00 on project 2 and then from 12.00 to 16.00 on project 1 again. Then I get project 1 8 hours plus project 2 2 hours though the user actually was just 8 hours there.
Try this: select user_id, project_id, date_trunc, sum(sum) FROM (select user_id, project_id, date_trunc('day', ts), SUM(duration) FROM (select user_id, project_id, a.ts, ((SELECT MIN(b.ts) FROM log b WHERE b.ts>a.ts AND (date_trunc('day',a.ts)=date_trunc('day',b.ts)))-a.ts) AS duration from log a order by user_id, project_id, ts) AS foo group by user_id, project_id, ts) AS day_set group by user_id, project_id, date_trunc order by user_id, project_id, date_trunc; -Kevin Crain On Thu, Jun 9, 2011 at 6:43 AM, Andreas <maps.on@gmx.net> wrote: > hi, > I have a log-table that stores events of users and projects like this > ( user_id integer, project_id integer, ts timestamp, event_type integer ) > > I need an aggregated list of worktime per user, per project, per day. > > The users can switch projects during the day so I can't work this out with > min(ts) and max(ts). > > Is there a clever way to get this with SQL ? > > -- > Sent via pgsql-sql mailing list (pgsql-sql@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-sql >
On 2011-06-09, Andreas <maps.on@gmx.net> wrote: > Am 09.06.2011 18:20, schrieb Richard Broersma: >> On Thu, Jun 9, 2011 at 6:43 AM, Andreas<maps.on@gmx.net> wrote: >> >>> I have a log-table that stores events of users and projects like this >>> ( user_id integer, project_id integer, ts timestamp, event_type integer ) >>> >>> I need an aggregated list of worktime per user, per project, per day. >>> >>> The users can switch projects during the day so I can't work this out with >>> min(ts) and max(ts). >> SELECT user_id, project_id, date_trunc( 'day', ts ) as event_day, >> MIN( ts ) AS event_start, MAX( ts ) AS event_end, >> MAX( ts ) - MIN( ts ) AS duration >> FROM Loggingtable >> GROUP BY user_id, project_id, date_trunc( 'day', ts ) >> ORDER BY date_trunc( 'day', ts ), user_id, project_id; >> > As far as I understand you calculate the duration as the difference > between the first and last event of a project per day. > There is a problem because a user can work from 08.00 to 10.00 on > project 1 and then from 10.00 to 12.00 on project 2 and then from 12.00 > to 16.00 on project 1 again. > Then I get project 1 8 hours plus project 2 2 hours though the > user actually was just 8 hours there. Unclear to me what to do with the last event of the day (i.e. is there a implicit end-of-work time to use, or does your usage pattern guarantee a "closing event" to always be present?), but what about something like: BEGIN ; CREATE TABLE upes ( uid integer,pid integer,ts timestamp without time zone,evtype integer ) ; -- ...user 100 works on 3 projects Monday INSERT INTO upes VALUES(100, 11, '2011-06-06 13:00', 1) ; INSERT INTO upes VALUES(100, 11, '2011-06-06 13:30', 2) ; INSERT INTO upes VALUES(100, 22, '2011-06-06 13:45', 1) ; INSERT INTO upes VALUES(100, 33, '2011-06-06 18:00', 1) ; -- ...user 100 works on 2 projects Tuesday INSERT INTO upes VALUES(100, 11, '2011-06-07 13:00', 1) ; INSERT INTO upes VALUES(100, 33, '2011-06-07 13:30', 2) ; INSERT INTO upes VALUES(100, 33, '2011-06-07 17:45', 1) ; -- ...user 200 works also works on Tuesday INSERT INTO upes VALUES(200, 11, '2011-06-07 13:00', 1) ; INSERT INTO upes VALUES(200, 33, '2011-06-07 13:30', 2) ; INSERT INTO upes VALUES(200, 33, '2011-06-07 13:45', 4) ; INSERT INTO upes VALUES(200, 33, '2011-06-07 19:45', 8) ; -- ...attempt to summarize WITH ius AS ( SELECT upes.ts AS uts, upes.uid, upes.pid, upes.evtype, upes_next.ts AS nts FROM upes LEFT JOIN upes upes_next ON upes_next.ts = ( SELECT min(utmp.ts) FROM upes utmp WHERE utmp.uid=upes.uid AND utmp.ts > upes.ts AND utmp.ts::date=upes.ts::date ) ) SELECT ius.uts::date, ius.uid, ius.pid, SUM(nts-uts) FROM iusGROUP BY 1,2,3ORDER BY 1,2,3 ; ROLLBACK ; with results of: uts | uid | pid | sum ------------+-----+-----+----------2011-06-06 | 100 | 11 | 00:45:002011-06-06 | 100 | 22 | 04:15:002011-06-06 | 100 | 33 | 2011-06-07 | 100 | 11 | 01:00:002011-06-07 | 100 | 33 | 04:15:002011-06-07 | 200 | 11 | 01:00:002011-06-07 | 200| 33 | 06:15:00 (7 rows)