Thread: an difficult SQL
Hi Everybody, I was wondering if anybody here could help me cook up a query: 1. against a list of events (like an activity log in the database). The list is a single table: create table events (tm timestamp, user int, description text). 2. of which the output would be sorted in such a way, that: 2.1 most recent event would "select" most recent events of that same user, and displayed in a group (of say 10) of them (in "tm" order). 2.2 going through the events back in time, first event of ANOTHER user selects next group, where (say 10) most recent events of that OTHER user is presented. 2.3 next most recent event of yet another user selects yet another group to display and this selection process goes on, up to a maximum of (say 20) users/groups-of-their-events. 2.4 after that, all other events are selected in tm order. This is to present most recent telephone activities grouped by most recent subscribers so that the dashboard doesn't get cluttered with information but allows for an overview of other activity of most recent users. I tend to think, that it's a problem for a window function ... but I've stumbled on the problem how to limit the window "frame" to just a few (say 10) events within the "window" and have all the rest returned as "tail" of the query. BTW: the eventlog table is big. (and partitioned). Any help appreciated. -R
Hi Rafal You first could select the three users with the most recent entries with a windowing function (https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS) putting it into a with query (https://www.postgresql.org/docs/15/sql-select.html), in following with queries I would select 2.1 to 2.3 with each a constant column with each a different value you later sort by. In a next with query you can select all the rest (except all https://www.postgresql.org/docs/15/sql-select.html#SQL-EXCEPT) the results of 2.1 to 2.3 for 2.4 also with the notorious sort column. In a last with query you can put together the partial results for 2.1 to 2.4 with a union all (https://www.postgresql.org/docs/15/sql-select.html#SQL-UNION) and selecting sort by the sort column and the timestamp in the final select. I do not know your background, however, sql is about data sets end it is not always easy to get ones head around thinking in sets. I hope you could follow my suggestions. It might not be the most efficient way but should work. Kind regards Thiemo Am 05.11.22 um 16:10 schrieb Rafal Pietrak: > Hi Everybody, > > I was wondering if anybody here could help me cook up a query: > > 1. against a list of events (like an activity log in the database). > The list is a single table: create table events (tm timestamp, user > int, description text). > > 2. of which the output would be sorted in such a way, that: > 2.1 most recent event would "select" most recent events of that same > user, and displayed in a group (of say 10) of them (in "tm" order). > > 2.2 going through the events back in time, first event of ANOTHER user > selects next group, where (say 10) most recent events of that OTHER > user is presented. > > 2.3 next most recent event of yet another user selects yet another > group to display and this selection process goes on, up to a maximum > of (say 20) users/groups-of-their-events. > > 2.4 after that, all other events are selected in tm order. > > This is to present most recent telephone activities grouped by most > recent subscribers so that the dashboard doesn't get cluttered with > information but allows for an overview of other activity of most > recent users. > > I tend to think, that it's a problem for a window function ... but > I've stumbled on the problem how to limit the window "frame" to just a > few (say 10) events within the "window" and have all the rest returned > as "tail" of the query. > > BTW: the eventlog table is big. (and partitioned). > > Any help appreciated. > > -R > > -- Signal (Safer than WhatsApp): +49 1578 7723737 Threema (Safer than WhatsApp): A76MKH3J Handy: +49 1578 772 37 37
Hi Thiemo, Thank you for suggestions. W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze: > Hi Rafal > > You first could select the three users with the most recent entries with > a windowing function > (https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS) surely I'm missing something crucial here: select row_number() over w,* from eventlog where row_number() over w < 5 window w as (partition by user); ERROR: window functions are not allowed in WHERE So I'm unable to pick a limited number of rows within the user "group-window" ranges. Without that, I cannot proceed. Any suggestions? -R PS: regarding "my sets background", yes I do "think in sets" ... as opposed to thinking "in functions" (like iterating procedures). I do prefer solutions based on set definitions. putting it into a with query (https://www.postgresql.org/docs/15/sql-select.html), in following with queries I would select 2.1 to 2.3 with each a constant column with each a different value you later sort by. In a next with query you can select all the rest (except all https://www.postgresql.org/docs/15/sql-select.html#SQL-EXCEPT) the results of 2.1 to 2.3 for 2.4 also with the notorious sort column. In a last with query you can put together the partial results for 2.1 to 2.4 with a union all (https://www.postgresql.org/docs/15/sql-select.html#SQL-UNION) and selecting sort by the sort column and the timestamp in the final select. > > I do not know your background, however, sql is about data sets end it is > not always easy to get ones head around thinking in sets. I hope you > could follow my suggestions. It might not be the most efficient way but > should work. > > Kind regards > > Thiemo > > > Am 05.11.22 um 16:10 schrieb Rafal Pietrak: >> Hi Everybody, >> >> I was wondering if anybody here could help me cook up a query: >> >> 1. against a list of events (like an activity log in the database). >> The list is a single table: create table events (tm timestamp, user >> int, description text). >> >> 2. of which the output would be sorted in such a way, that: >> 2.1 most recent event would "select" most recent events of that same >> user, and displayed in a group (of say 10) of them (in "tm" order). >> >> 2.2 going through the events back in time, first event of ANOTHER user >> selects next group, where (say 10) most recent events of that OTHER >> user is presented. >> >> 2.3 next most recent event of yet another user selects yet another >> group to display and this selection process goes on, up to a maximum >> of (say 20) users/groups-of-their-events. >> >> 2.4 after that, all other events are selected in tm order. >> >> This is to present most recent telephone activities grouped by most >> recent subscribers so that the dashboard doesn't get cluttered with >> information but allows for an overview of other activity of most >> recent users. >> >> I tend to think, that it's a problem for a window function ... but >> I've stumbled on the problem how to limit the window "frame" to just a >> few (say 10) events within the "window" and have all the rest returned >> as "tail" of the query. >> >> BTW: the eventlog table is big. (and partitioned). >> >> Any help appreciated. >> >> -R >> >>
> On 06/11/2022 13:48 CET Rafal Pietrak <rafal@ztk-rp.eu> wrote: > > W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze: > > > > You first could select the three users with the most recent entries with > > a windowing function > > (https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS) > > surely I'm missing something crucial here: > select row_number() over w,* from eventlog where row_number() over w < 5 > window w as (partition by user); > ERROR: window functions are not allowed in WHERE > > So I'm unable to pick a limited number of rows within the user > "group-window" ranges. > > Without that, I cannot proceed. > > Any suggestions? Windows functions are only permitted in SELECT and ORDER BY because they are executed after WHERE, GROUP BY, and HAVING[1]. You need a derived table to filter on row_number: with ranked as ( select *, row_number() over w from eventlog window w as (partition by user) ) select * from ranked where row_number < 5; [1] https://www.postgresql.org/docs/15/tutorial-window.html -- Erik
Great, with a little tweaking (to get the remaining rows ordered correctly), this did the job. Thank you Erik. BR -R W dniu 6.11.2022 o 15:23, Erik Wienhold pisze: >> On 06/11/2022 13:48 CET Rafal Pietrak <rafal@ztk-rp.eu> wrote: >> >> W dniu 5.11.2022 o 19:05, Thiemo Kellner pisze: >>> >>> You first could select the three users with the most recent entries with >>> a windowing function >>> (https://www.postgresql.org/docs/15/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS) >> >> surely I'm missing something crucial here: >> select row_number() over w,* from eventlog where row_number() over w < 5 >> window w as (partition by user); >> ERROR: window functions are not allowed in WHERE >> >> So I'm unable to pick a limited number of rows within the user >> "group-window" ranges. >> >> Without that, I cannot proceed. >> >> Any suggestions? > > Windows functions are only permitted in SELECT and ORDER BY because they are > executed after WHERE, GROUP BY, and HAVING[1]. > > You need a derived table to filter on row_number: > > with > ranked as ( > select *, row_number() over w > from eventlog > window w as (partition by user) > ) > select * > from ranked > where row_number < 5; > > [1] https://www.postgresql.org/docs/15/tutorial-window.html > > -- > Erik > >