proposal - GROUPING SETS - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | proposal - GROUPING SETS |
Date | |
Msg-id | 162867790809160327n23dd2f0eyd0f6ff6365afa25@mail.gmail.com Whole thread Raw |
Responses |
Re: proposal - GROUPING SETS
|
List | pgsql-hackers |
Hello, == Proposal - GROUPING SETS == a grouping set feature allows multiple grouping clauses in one query. Result of grouping sets is union of results each groupby clause's result. create table t(a int, b int); insert into t values(10,20); insert into t values(30,40); select a, b from t group by grouping sets(a, b); is same as: select a, NULL from t group by a union all select NULL, b from t group by b; Note: all ungrouped vars are transformed to NULL Groupby clause should contains cube and rollup lists. These are transformed to grouping sets via transformed rules: create table t1(a int, b int, c int) group by rollup(a, b, c) -> group by grouping sets((a,b,c), (a,b), (a), ()) group by cube(a,b,c) -> group by grouping sets ((a,b,c),(a,b),(a,c), (a), (b,c), (b), (c), ()) Groupby clause or grouping sets should contains more sets. Result is multiplication of these sets: group by grouping sets(a), grouping sets(b, (b,c), ()) -> group by grouping sets((a,b), (a,b,c), (a)) When grouping sets are used, then we should to use grouping and grouping_id functions. Function grouping returns 1 when parameter is in current group set, else returns 0. Function grouping_id returns value as grouping_id(a,b,c) = to_dec(to_bin(grouping(a) || grouping(b) || grouping(c))) postgres=# select * from t;a | b | c ----+----+----10 | 20 | 3010 | 20 | 30 (2 rows) postgres=# select a,b,c from t group by grouping sets(a,b,c);a | b | c ----+----+----10 | | | 20 | | | 30 (3 rows) postgres=# select a,b,c, grouping(a), grouping(b), grouping(c), grouping_id(a,b,c) from t group by grouping sets(a,b,c);a | b | c | grouping | grouping | grouping | grouping_id ----+----+----+----------+----------+----------+-------------10 | | | 1 | 0 | 0 | 4 | 20 | | 0 | 1 | 0 | 2 | | 30 | 0 | 0 | 1 | 1 (3 rows) some real sample: create table report( inserted date, locality varchar, name varchar, c int); postgres=# copy report to stdout; 2008-10-10 Prague Milk 10 2008-10-11 Prague Milk 12 2008-10-10 Prague Rum 2 2008-10-11 Prague Rum 6 2008-10-10 Berlin Milk 8 2008-10-11 Berlin Milk 14 2008-10-10 Berlin Beer 20 2008-10-11 Berlin Beer 25 postgres=# select * from report; inserted | locality | name | c ------------+----------+------+----2008-10-10 | Prague | Milk | 102008-10-11 | Prague | Milk | 122008-10-10 | Prague | Rum | 22008-10-11 | Prague | Rum | 62008-10-10 | Berlin | Milk | 82008-10-11 | Berlin | Milk | 142008-10-10| Berlin | Beer | 202008-10-11 | Berlin | Beer | 25 (8 rows) postgres=# select inserted, locality, name, sum(c) from report group by grouping sets(inserted, locality,name); inserted | locality | name | sum ------------+----------+------+-----2008-10-10 | | | 402008-10-11 | | | 57 | Berlin | | 67 | Prague | | 30 | | Milk | 44 | | Rum | 8 | | Beer | 45 (7 rows) postgres=# select inserted, locality, name, sum(c) from report group by grouping sets(inserted, (locality,name)); inserted | locality | name | sum ------------+----------+------+-----2008-10-10 | | | 402008-10-11 | | | 57 | Prague | Milk | 22 | Berlin | Milk | 22 | Berlin | Beer | 45 | Prague | Rum | 8 (6 rows) postgres=# select inserted, locality, name, sum(c) from report group by name, grouping sets(inserted,locality); inserted | locality | name | sum ------------+----------+------+-----2008-10-11 | | Rum | 62008-10-10 | | Rum | 22008-10-10 | | Beer | 202008-10-11 | | Beer | 252008-10-10 | | Milk | 182008-10-11 | | Milk | 26 | Prague | Rum | 8 | Berlin | Beer | 45 | Berlin | Milk | 22 | Prague | Milk | 22 (10 rows) == Implementation == Grouping sets introduce a new concept into SQL. One readed tuple is multiple used. It's similar with WITH clause. It's little bit dificult implement it for current PostgreSQL's executor. In my prototype I used aux node Feeder. This node should to hold only one tuple. I add new method for Agg node, that process only one input tuple: for (;;){ tuple = feeder->execute(grouping_sets->lefttree); foreach(l, grouping_sets->subplans) { lfirst(l)->process(tuple); } if (is_null(tuple)) break;} It supports only HASH Agg nodes. For non hash agg should be used other method (currently it isn't supported). postgres=# explain verbose select inserted, locality, name, sum(c) from report group by name,grouping sets(inserted, locality); QUERY PLAN -----------------------------------------------------------------Grouping Sets (cost=12.00..35.00 rows=400 width=68) Output:NULL::date, locality, name, sum(c) -> Seq Scan on report (cost=0.00..18.00 rows=800 width=68) Output: inserted,locality, name, c -> HashAggregate (cost=6.00..8.50 rows=200 width=68) Output: inserted, NULL::charactervarying, name, sum(c) -> Feeder (cost=0.00..0.00 rows=800 width=68) Output: inserted,locality, name, c -> HashAggregate (cost=6.00..8.50 rows=200 width=68) Output: NULL::date, locality, name,sum(c) -> Feeder (cost=0.00..0.00 rows=800 width=68) Output: inserted, locality, name, c (12 rows) After work on prototype I don't see any problems in executor or parser. I expect some dificulties in planner. With grouping sets grouping_planner procedure will be much more complex: 1. targetlist and groupclause will be list of list, 2. we should to repeat estimation of NumGroups for each group set (it's should be shared with CTE feature??). == Parser problems == 1. The identifier cube is used in contrib cube. Solution: CUBE '(' ... ')' generates funcCall and it is transformed togrouping sets only in groupby clause later. I invite any ideas, notes and help with documentation. Regards Pavel Stehule
pgsql-hackers by date: