Understanding aggregation and window functions. - Mailing list pgsql-novice
From | Pól Ua Laoínecháin |
---|---|
Subject | Understanding aggregation and window functions. |
Date | |
Msg-id | CAF4RT5Q15EJ0mc7cMv8FZK+Vdq23GvQo03R_xo3U=M-QsXmKVw@mail.gmail.com Whole thread Raw |
Responses |
Re: Understanding aggregation and window functions.
|
List | pgsql-novice |
Hi all, I have three tables: theatre, show and ticket - all DDL and DML available at the bottom of this post and on a fiddle here: https://dbfiddle.uk/?rdbms=postgres_12&fiddle=94d04a7a08fd09022774f21c327dd529 I've kept things very simple: theatre (theatre_id) is an FK in show (show_theatre_id) show (show_id) is an FK in ticket (ticket_show_id) So, I start off: SELECT th.theatre_id, th.theatre_name AS th_name, s.show_id AS s_id, s.show_theatre_id AS s_th, s.show_name AS s_name, t.ticket_id AS t_id, t.ticket_show_id AS t_show, t.price AS t_p FROM theatre th JOIN show s ON th.theatre_id = s.show_theatre_id JOIN ticket t ON s.show_id = t.ticket_show_id; And I get 29 rows returned - (not all shown) - 1 for every ticket. theatre_id th_name s_id s_th s_name t_id t_show t_p 1 Theatre_1 10 1 show_10 1 10 10 1 Theatre_1 10 1 show_10 2 10 10 That's all fine. So, now I run: SELECT th.theatre_name AS "Theatre name", s.show_name AS "Show name", SUM(t.price) AS "Total sales" FROM theatre th JOIN show s ON th.theatre_id = s.show_theatre_id JOIN ticket t ON s.show_id = t.ticket_show_id GROUP BY th.theatre_name, s.show_name ORDER BY th.theatre_name, s.show_name; And obtain: Theatre name Show name Sales/show Theatre_1 show_10 2050 Theatre_1 show_11 77 Theatre_2 show_20 200 Theatre_2 show_21 63 Again, fine! But what I want is (theatre_1 2127) and (theatre_2, 263). I would also appreciate *_explanations_*, references, URLs... a bit more than just the correct SQL would be great! But, now I want to get the total sales/per theatre in the same query, so I try adding the line (see fiddle): SUM(t.price) OVER (PARTITION BY th.theatre_id ORDER BY th.theatre_id), and I receive the error: ERROR: column "t.price" must appear in the GROUP BY clause or be used in an aggregate function LINE 8: SUM(t.price) OVER (PARTITION BY th.theatre_id ORDER BY th.... Now, I don't understand why I'm getting the error - I *_already_* have t.price in an aggregate function - i.e. the simple SUM without OVER ()? So, I ran another query to experiment: SELECT th.theatre_name AS "T. name", s.show_id AS "Show name", t.price AS "Tkt price", SUM(t.price) OVER (PARTITION BY th.theatre_id) AS "Sales/T.", SUM(t.price) OVER (PARTITION BY s.show_id ORDER BY th.theatre_id, s.show_id) AS "Sales/show", COUNT(t.price) OVER (PARTITION BY th.theatre_id, show_id) "No. of difft tkts/show", SUM(t.price) AS "Tot sales/ticket" FROM theatre th JOIN show s ON th.theatre_id = s.show_theatre_id JOIN ticket t ON s.show_id = t.ticket_show_id GROUP BY th.theatre_id, th.theatre_name, s.show_id, t.price ORDER BY th.theatre_name, s.show_id, t.price DESC; So, the result of this is: T. name Show name Tkt price Sales/T. Sales/show tkts/show Tot sales/ticket Theatre_1 10 500 521 510 2 2000 Theatre_1 10 10 521 510 2 50 Theatre_1 11 11 521 11 1 77 Theatre_2 20 20 41 20 1 200 Theatre_2 21 21 41 21 1 63 I hope this "chart" comes out in the formatting - if not, check out the fiddle - it's far more readable there! So, - t.price AS "Tkt price" gives me 5 prices - i.e. for show 10, there are two types available, 10 and 500 € (or whatever...) and 5 overall (1 each for the others) - OK! - SUM(t.price) OVER (PARTITION BY th.theatre_id) AS "Sales/T." gives me the total of the prices for the individual different tickets by theatre - i.e. the sum of 500 + 10 + 11 for theatre 1 - that's OK So, I'm aggregating over the theatres... but for individual tickets and not the total? - SUM(t.price) OVER (PARTITION BY s.show_id ORDER BY th.theatre_id, s.show_id) AS "Sales/show" give me the sum of the prices of the individual tickets per show - 500 + 10 for theatre 1, show 1 - OK - COUNT(t.price) OVER (PARTITION BY th.theatre_id, show_id) "Tkts/show", give 2 for shows 10 and 1 for the others - which is OK - We've already seen the final result above What I want is Theatre_1 2127 Theatre_2 263 i.e. the total sales overall for all shows by theatre - and I just can't seem to understand why my various partitioning and ordering attempts aren't working out! ** BONUS QUESTION ** I would also like to include the number of tickets sold by price point and the number of tickets sold per theatre and per show - this might involve a subselect in the SELECT clause? I tried (see end of fiddle): (https://dbfiddle.uk/?rdbms=postgres_12&fiddle=94d04a7a08fd09022774f21c327dd529) SELECT COUNT(t.ticket_show_id) FROM show s JOIN ticket t ON s.show_id = t.ticket_show_id GROUP BY s.show_id Result count 9 7 3 10 But when I tried to integrate it into another query as a SELECT in the SELECT statement, I received the error: ERROR: more than one row returned by a subquery used as an expression Which I can understand, but can't figure out how to solve. For each row of my main query, the query above should only return 1 row - but... Any advice appreciated - Pól... ================= DDL and DML ================= CREATE TABLE theatre ( theatre_id INTEGER NOT NULL PRIMARY KEY, theatre_name VARCHAR (25) NOT NULL ); INSERT INTO theatre VALUES (1, 'Theatre_1'), (2, 'Theatre_2'); CREATE TABLE show ( show_id INTEGER NOT NULL PRIMARY KEY, show_theatre_id INTEGER NOT NULL, show_name VARCHAR (25) NOT NULL, CONSTRAINT show_theatre_fk FOREIGN KEY (show_theatre_id) REFERENCES theatre (theatre_id) ); INSERT INTO show (show_id, show_theatre_id, show_name) VALUES (10, 1, 'show_10'), (11, 1, 'show_11'), (20, 2, 'show_20'), (21, 2, 'show_21'); CREATE TABLE ticket ( ticket_id SERIAL PRIMARY KEY, ticket_show_id INTEGER NOT NULL, CONSTRAINT ticket_show_fk FOREIGN KEY (ticket_show_id) REFERENCES show (show_id), price INT NOT NULL -- -- Various other fields - seat_no, is_sold, discount... whatever ); INSERT INTO ticket (ticket_show_id, price) VALUES (10, 10), (10, 10), (10, 10), (10, 10), (10, 10), -- 5 tickets for show 10 (11, 11), (11, 11), (11, 11), (11, 11), (11, 11), (11, 11), (11, 11), -- 7/show 11 &c (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), (20, 20), -- 10 (21, 21), (21, 21), (21, 21), -- 3 (10, 500), (10, 500), (10, 500), (10, 500); -- 4 EXPENSIVE tickets for show_10;
pgsql-novice by date: