Thread: Grouping logs by ip and time
Hello all, I have a table which stores action logs from users. It looks something like this: log_type text, date date, "time" time without time zone, ip inet The log type can be action1, action2, action3, action4, or action5. I know that each user session will have a max of one of each log and it will always start with action1. It may not have every action though. I also know that each session will take no longer than one minute. What I'd like to do is be able to group these logs by sessions based on the IP and the time range so I can figure out the time taken between each action. I know how to script it, although it's very slow. I was hoping there was some way to do this in SQL. I'm running Postgresql 8.3.7 on this machine right now, but if there's something that would make this easier and doesn't exist there yet, I would be willing to upgrade. Thanks, Alex
On 08/11/2011 17:59, Alex Thurlow wrote: > Hello all, > I have a table which stores action logs from users. It looks > something like this: > log_type text, > date date, > "time" time without time zone, > ip inet [snip] > What I'd like to do is be able to group these logs by sessions based on > the IP and the time range so I can figure out the time taken between > each action. Would something like this work? - select ip, max("time") - min("time") as session_duration from log_table group by ip; This doesn't take the date into account - what happens if the session spans midnight? You can get around this by using a timestamp column instead of separate date and time. Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
On 08/11/2011 18:48, Raymond O'Donnell wrote: > On 08/11/2011 17:59, Alex Thurlow wrote: >> Hello all, >> I have a table which stores action logs from users. It looks >> something like this: >> log_type text, >> date date, >> "time" time without time zone, >> ip inet > > [snip] > >> What I'd like to do is be able to group these logs by sessions based on >> the IP and the time range so I can figure out the time taken between >> each action. > > Would something like this work? - > > select ip, max("time") - min("time") as session_duration > from log_table > group by ip; > > This doesn't take the date into account - what happens if the session > spans midnight? You can get around this by using a timestamp column > instead of separate date and time. Whoops - just re-read your email, and realised that you're looking for the time between actions; the above just gives you the total duration of the session from each IP address. Ray. -- Raymond O'Donnell :: Galway :: Ireland rod@iol.ie
On 11/08/11 10:48 AM, Raymond O'Donnell wrote: > Would something like this work? - > > select ip, max("time") - min("time") as session_duration > from log_table > group by ip; > > This doesn't take the date into account - what happens if the session > spans midnight? You can get around this by using a timestamp column > instead of separate date and time. he said a session always starts with 'action1', and presumably there can be more than one session per day, so this won't work. the 'end' of a session is presumably the previous action2|3|4|5 thats prior to the next action1. I have no idea how you'd code this as a SQL query. -- john r pierce N 37, W 122 santa cruz ca mid-left coast
> Would something like this work? - > > select ip, max("time") - min("time") as session_duration > from log_table > group by ip; I don't think this is the right way to do. This is based on ip address, so if - client connect diffrent times with same ip - client has sime ip but he made another action on other day. you will have a wrong results. You should save also the session id and group by sesion id not ip. Ex. : Table ------------------ log_type text, date date, "time" time without time zone, ip inet session_id text -- you can use maybe foreign tables ? SQL ( Same as Raynold's but groups session ids) ------------------- select ip, max("time") - min("time") as session_duration from log_table group by session_id;
On 11/8/2011 1:00 PM, Ascarabina wrote: >> Would something like this work? - >> >> select ip, max("time") - min("time") as session_duration >> from log_table >> group by ip; > > I don't think this is the right way to do. This is based on ip > address, so if > - client connect diffrent times with same ip > - client has sime ip but he made another action on other day. > you will have a wrong results. > > > You should save also the session id and group by sesion id not ip. > Ex. : > Table > ------------------ > log_type text, > date date, > "time" time without time zone, > ip inet session_id text -- you can use maybe foreign tables ? > > SQL ( Same as Raynold's but groups session ids) > ------------------- > select ip, max("time") - min("time") as session_duration > from log_table > group by session_id; > Thanks for the responses guys. I guess I'll need to figure out how to add a session ID if I'm going to do this. -Alex
Hi Alex, in PG9 you can use a query like this: with a as ( select a.*, rank() over (partition by a.ip,a.date order by a.log_type, a.time) from log_table a ) select a.*, b.*, b.time-a.time from a join a b on a.ip=b.ip and a.date=b.date and a.rank+1=b.rank this orders entry by time grouped by ip and date and selects entries with there successors. In older versions this is not so easy. It should work like this: create temp sequence s; create temp table a as select a.*, nextval('s') as rank from ( select a.* from log_table a order by a.ip, a.date, a.time) a; select a.*, b.*, b.time-a.time from a a join a b on a.ip=b.ip and a.date=b.date and a.rank+1=b.rank; Thomas Am 08.11.2011 18:59, schrieb Alex Thurlow: > Hello all, > I have a table which stores action logs from users. It looks > something like this: > log_type text, > date date, > "time" time without time zone, > ip inet > > The log type can be action1, action2, action3, action4, or action5. I > know that each user session will have a max of one of each log and it > will always start with action1. It may not have every action though. > I also know that each session will take no longer than one minute. > > What I'd like to do is be able to group these logs by sessions based > on the IP and the time range so I can figure out the time taken > between each action. > > I know how to script it, although it's very slow. I was hoping there > was some way to do this in SQL. I'm running Postgresql 8.3.7 on this > machine right now, but if there's something that would make this > easier and doesn't exist there yet, I would be willing to upgrade. > > Thanks, > Alex >