Re: Group by range in hour of day - Mailing list pgsql-general
From | Adrian Klaver |
---|---|
Subject | Re: Group by range in hour of day |
Date | |
Msg-id | 55076B59.2000600@aklaver.com Whole thread Raw |
In response to | Re: Group by range in hour of day (Israel Brewster <israel@ravnalaska.net>) |
Responses |
Re: Group by range in hour of day
|
List | pgsql-general |
On 03/16/2015 04:16 PM, Israel Brewster wrote: > On Mar 16, 2015, at 2:22 PM, David G. Johnston > <david.g.johnston@gmail.com <mailto:david.g.johnston@gmail.com>> wrote: >> >> On Mon, Mar 16, 2015 at 3:13 PM, Adrian Klaver >> <adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com>>wrote: >> >> On 03/16/2015 02:57 PM, Israel Brewster wrote: >> >> I have a table with two timestamp columns for the start time >> and end >> time of each record (call them start and end).I'm trying to >> figure out >> if there is a way to group these records by "hour of day", >> that is the >> record should be included in the group if the hour of the day >> for the >> group falls anywhere in the range [start,end]. Obviously each >> record may >> well fall into multiple groups under this scenario. >> >> The goal here is to figure out, for each hour of the day, a) >> what is the >> total number of "active" records for that hour, and b) what is >> the total >> "active" time for those records during the hour, with an >> ultimate goal >> of figuring out the average active time per record per hour. >> >> So, for simplified example, if the table contained three records: >> >> start | end >> ------------------------------__----------------------- >> 2015-03-15 08:15 | 2015-03-15 10:45 >> 2015-03-15 09:30 | 2015-03-15 10:15 >> 2015-03-15 10:30 | 2015-03-15 11:30 >> >> >> Then the results should break out something like this: >> >> hour | count | sum >> ----------------------------- >> 8 | 1 | 0.75 >> 9 | 2 | 1.5 >> 10 | 3 | 1.5 >> 11 | 1 | 0.5 >> >> I can then easily manipulate these values to get my ultimate >> goal of the >> average, which would of course always be less than or equal to >> 1. Is >> this doable in postgress? Or would it be a better idea to >> simply pull >> the raw data and post-process in code? Thanks. >> >> >> Do not have an answer for you, but a question: >> >> What version of Postgres are you on? >> >> This will help determine what tools are available to work with. >> >> >> The following will give you endpoints for your bounds. Version is >> important since "range types" could be very useful in this situation - >> but you'd still need to generate the bounds info regardless. >> >> >> SELECT * >> FROM >> (SELECT * FROM generate_series('2015-03-15'::timestamptz, >> '2015-03-16'::timestamptz, '1 hour'::interval) start (start_ts)) AS s >> CROSS JOIN >> (SELECT end_ts + '1 hour'::interval AS end_ts FROM >> generate_series('2015-03-15'::timestamptz, '2015-03-16'::timestamptz, >> '1 hour'::interval) e (end_ts)) AS e >> >> You would join this using an ON condition with an OR (start BETWEEN >> [...] OR end BETWEEN [...]) - range logic will be better and you may >> want to adjust the upper bound by negative 1 (nano-second?) to allow >> for easier "<=" logic if using BETWEEN. >> > > Thanks, that is very helpful, but are you sure CROSS JOIN is what you > wanted here? using that, I get a 625 row result set where each row from > the first SELECT is paired up with EVERY row from the second select. I > would think I would want the first row of the first SELECT paired up > with only the first row of the second, second row of the first paired > with the second row of the second, etc - i.e. 24 start and end bounds. > Or am I missing something? Given this: test=> select * from start_end ; id | start_time | end_time ----+------------------------+------------------------ 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 using Pauls hints I got: test=> select * from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) ; id | start_time | end_time | h ----+------------------------+------------------------+---- 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 8 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 8 1 | 2015-03-16 08:15:00-07 | 2015-03-16 09:35:00-07 | 9 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 9 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 9 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 10 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 10 2 | 2015-03-16 09:15:00-07 | 2015-03-16 11:05:00-07 | 11 3 | 2015-03-16 08:00:00-07 | 2015-03-16 11:45:00-07 | 11 test=> select h, count(*) from start_end, generate_series(0, 23) as s(h) where h between extract(hour from start_time) and extract(hour from end_time) group by h order by h; h | count ----+------- 8 | 2 9 | 3 10 | 2 11 | 2 > > > > ----------------------------------------------- > Israel Brewster > Systems Analyst II > Ravn Alaska > 5245 Airport Industrial Rd > Fairbanks, AK 99709 > (907) 450-7293 > ----------------------------------------------- > >> David J. >> > -- Adrian Klaver adrian.klaver@aklaver.com
pgsql-general by date: