Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise aggregation/grouping |
Date | |
Msg-id | CAFjFpReYBgToHrYwMxbJMb=MoWgD=iVAALnXZ3p0JafBkx61OA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise aggregation/grouping (Jeevan Chalke <jeevan.chalke@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise aggregation/grouping
Re: [HACKERS] Partition-wise aggregation/grouping Re: [HACKERS] Partition-wise aggregation/grouping |
List | pgsql-hackers |
On Wed, Dec 13, 2017 at 6:37 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > > > On Tue, Dec 12, 2017 at 3:43 PM, Ashutosh Bapat > <ashutosh.bapat@enterprisedb.com> wrote: >> >> Here are review comments for 0009 > > > Thank you, Ashutosh for the detailed review so far. > > I am working on your reviews but since parallel Append is now committed, > I need to re-base my changes over it and need to resolve the conflicts too. > > Once done, I will submit the new patch-set fixing these and earlier review > comments. > Sure no problem. Take your time. Here's set of comments for 0008. That ends the first read of all the patches (2nd reading for the core changes) +-- Also, disable parallel paths. +SET max_parallel_workers_per_gather TO 0; If you enable parallel aggregation for smaller data partition-wise aggregation paths won't be chosen. I think this is the reason why you are disabling parallel query. But we should probably explain that in a comment. Better if we could come up testcases without disabling parallel query. Since parallel append is now committed, may be it can help. + +-- Check with multiple columns in GROUP BY, order in target-list is reversed +EXPLAIN (COSTS OFF) +SELECT c, a, count(*) FROM pagg_tab GROUP BY a, c; + QUERY PLAN +------------------------------------------------- + Append + -> HashAggregate + Group Key: pagg_tab_p1.a, pagg_tab_p1.c + -> Seq Scan on pagg_tab_p1 [ ... clipped ... ] +(10 rows) Why do we need this testcase? + +-- Test when input relation for grouping is dummy +EXPLAIN (COSTS OFF) +SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c; + QUERY PLAN +-------------------------------- + HashAggregate + Group Key: pagg_tab.c + -> Result + One-Time Filter: false +(4 rows) Not part of your patch, I am wondering if we can further optimize this plan by converting HashAggregate to Result (One-time Filter: false) and the aggregate target. Just an idea. + +SELECT c, sum(a) FROM pagg_tab WHERE 1 = 2 GROUP BY c; + c | sum +---+----- +(0 rows) I think we also need a case when the child input relations are marked dummy and then the parent is marked dummy. Just use a condition with partkey = <none of list bounds>. + +-- Check with SORTED paths. Disable hashagg to get group aggregate Suggest: "Test GroupAggregate paths by disabling hash aggregates." +-- When GROUP BY clause matches with PARTITION KEY. I don't think we need "with", and just extend the same sentence with "complete aggregation is performed for each partition" +-- Should choose full partition-wise aggregation path suggest: "Should choose full partition-wise GroupAggregate plan", but I guess with the above suggestion, this sentence is not needed. + +-- When GROUP BY clause not matches with PARTITION KEY. +-- Should choose partial partition-wise aggregation path Similar suggestions as above. +-- No aggregates, but still able to perform partition-wise aggregates That's a funny construction. May be "Test partition-wise grouping without any aggregates". We should try some output for this query. + +EXPLAIN (COSTS OFF) +SELECT a FROM pagg_tab GROUP BY a ORDER BY 1; + QUERY PLAN +------------------------------------------------- + Group + Group Key: pagg_tab_p1.a + -> Merge Append + Sort Key: pagg_tab_p1.a + -> Group + Group Key: pagg_tab_p1.a + -> Sort + Sort Key: pagg_tab_p1.a + -> Seq Scan on pagg_tab_p1 [ ... clipped ... ] +(19 rows) It's strange that we do not annotate partial grouping as Partial. Does not look like a bug in your patch. Do we get similar output with parallel grouping? + +-- ORDERED SET within aggregate +EXPLAIN (COSTS OFF) +SELECT a, sum(b order by a) FROM pagg_tab GROUP BY a ORDER BY 1, 2; + QUERY PLAN +------------------------------------------------------------------------ + Sort + Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b ORDER BY pagg_tab_p1.a)) + -> GroupAggregate + Group Key: pagg_tab_p1.a + -> Sort + Sort Key: pagg_tab_p1.a + -> Append + -> Seq Scan on pagg_tab_p1 + -> Seq Scan on pagg_tab_p2 + -> Seq Scan on pagg_tab_p3 +(10 rows) pagg_tab is partitioned by column c. So, not having it in GROUP BY itself might produce this plan if Partial parallel aggregation is expensive. When testing negative tests like this GROUP BY should always have the partition key. In case of full aggregation, since all the rows that belong to the same group come from the same partition, having an ORDER BY doesn't make any difference. We should support such a case. +INSERT INTO pagg_tab1 SELECT i%30, i%20 FROM generate_series(0, 299, 2) i; +INSERT INTO pagg_tab2 SELECT i%20, i%30 FROM generate_series(0, 299, 3) i; spaces around % operator? +-- When GROUP BY clause matches with PARTITION KEY. +-- Should choose full partition-wise aggregation path. Probably we should just club single table and join cases under one set of comments rather than repeating those? Create the tables once at the beginning of the test file and group together the queries under one comment head. +-- Disable mergejoin to get hash aggregate. +SET enable_mergejoin TO false; Why? We have tested that once. + +-- When GROUP BY clause not matches with PARTITION KEY. +-- Should choose partial partition-wise aggregation path. +-- Also check with SORTED paths. Disable hashagg to get group aggregate. +SET enable_hashagg TO false; Same as above. Two of those clubbed together they will produce one hash and one group plan. That will cover it. +-- Check with LEFT/RIGHT/FULL OUTER JOINs which produces NULL values for +-- aggregation +-- LEFT JOIN, should produce partial partition-wise aggregation plan as +-- GROUP BY is on nullable column +EXPLAIN (COSTS OFF) +SELECT b.y, sum(a.y) FROM pagg_tab1 a LEFT JOIN pagg_tab2 b ON a.x = b.y GROUP BY 1 ORDER BY 1 NULLS LAST; May be you should explicitly use GROUP BY b.y in all of these queries. +-- FULL JOIN, should produce partial partition-wise aggregation plan as +-- GROUP BY is on nullable column In case of a FULL JOIN partition keys from the joining relations land on nullable side; there is no key on non-nulllable side, so an aggregation on top of FULL JOIN will always be partial partition-wise aggregation. + +-- Empty relations on LEFT side, no partition-wise agg plan. Suggest: Empty join relation because of empty outer side. I don't think we are writing a negative test to check whether partition-wise agg plan is not chosen. We are testing the case when the join relation is empty. + +EXPLAIN (COSTS OFF) +SELECT a, c, sum(b), avg(c), count(*) FROM pagg_tab GROUP BY c, a, (a+b)/2 HAVING sum(b) = 50 AND avg(c) > 25 ORDER BY 1, 2, 3; Keep this or the previous one, both is overkill. I will vote for this one, but it's upto you. May be add a testcase with the partition keys themselves switched; output just the plan. +-- Test with multi-level partitioning scheme +-- Partition-wise aggregation is tried only on first level. [ ... clipped ... ] +-- Full aggregation as GROUP BY clause matches with PARTITION KEY This seems to contradict with the previous comment. May be club them together and say "Partition-wise aggregation with full aggregation only at the first leve" and move that whole comment down. + +-- Partial aggregation as GROUP BY clause does not match with PARTITION KEY +EXPLAIN (COSTS OFF) +SELECT b, sum(a), count(*) FROM pagg_tab GROUP BY b ORDER BY 1, 2, 3; + QUERY PLAN +---------------------------------------------------------------- + Sort + Sort Key: pagg_tab_p1.b, (sum(pagg_tab_p1.a)), (count(*)) + -> Finalize GroupAggregate + Group Key: pagg_tab_p1.b + -> Sort + Sort Key: pagg_tab_p1.b + -> Append + -> Partial HashAggregate + Group Key: pagg_tab_p1.b + -> Seq Scan on pagg_tab_p1 + -> Partial HashAggregate + Group Key: pagg_tab_p2_s1.b + -> Append + -> Seq Scan on pagg_tab_p2_s1 + -> Seq Scan on pagg_tab_p2_s2 + -> Partial HashAggregate + Group Key: pagg_tab_p3_s1.b + -> Append + -> Seq Scan on pagg_tab_p3_s1 + -> Seq Scan on pagg_tab_p3_s2 +(20 rows) Why aren't we seeing partial aggregation paths for level two and below partitions? + +-- Test on middle level partitioned table which is further partitioned on b. +-- Full aggregation as GROUP BY clause matches with PARTITION KEY +EXPLAIN (COSTS OFF) +SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3; + QUERY PLAN +------------------------------------------------------------------- + Sort + Sort Key: pagg_tab_p3_s1.b, (sum(pagg_tab_p3_s1.a)), (count(*)) + -> Append + -> HashAggregate + Group Key: pagg_tab_p3_s1.b + -> Seq Scan on pagg_tab_p3_s1 + -> HashAggregate + Group Key: pagg_tab_p3_s2.b + -> Seq Scan on pagg_tab_p3_s2 +(9 rows) + +SELECT b, sum(a), count(*) FROM pagg_tab_p3 GROUP BY b ORDER BY 1, 2, 3; + b | sum | count +---+------+------- + 0 | 2000 | 100 + 1 | 2100 | 100 + 2 | 2200 | 100 + 3 | 2300 | 100 + 4 | 2400 | 100 + 5 | 2500 | 100 + 6 | 2600 | 100 + 7 | 2700 | 100 + 8 | 2800 | 100 + 9 | 2900 | 100 +(10 rows) We should just remove this case, it's same as testing top-level partitioned tables. + +-- Full aggregation as GROUP BY clause matches with PARTITION KEY +EXPLAIN (COSTS OFF) +SELECT a, sum(b), array_agg(distinct c), count(*) FROM pagg_tab GROUP BY a, b HAVING avg(b) < 3 ORDER BY 1, 2, 3; + QUERY PLAN +-------------------------------------------------------------------------------------- + Sort + Sort Key: pagg_tab_p1.a, (sum(pagg_tab_p1.b)), (array_agg(DISTINCT pagg_tab_p1.c)) + -> Append + -> GroupAggregate + Group Key: pagg_tab_p1.a, pagg_tab_p1.b + Filter: (avg(pagg_tab_p1.b) < '3'::numeric) + -> Sort + Sort Key: pagg_tab_p1.a, pagg_tab_p1.b + -> Seq Scan on pagg_tab_p1 + -> GroupAggregate + Group Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b + Filter: (avg(pagg_tab_p2_s1.b) < '3'::numeric) + -> Sort + Sort Key: pagg_tab_p2_s1.a, pagg_tab_p2_s1.b + -> Append + -> Seq Scan on pagg_tab_p2_s1 + -> Seq Scan on pagg_tab_p2_s2 + -> GroupAggregate + Group Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b + Filter: (avg(pagg_tab_p3_s1.b) < '3'::numeric) + -> Sort + Sort Key: pagg_tab_p3_s1.a, pagg_tab_p3_s1.b + -> Append + -> Seq Scan on pagg_tab_p3_s1 + -> Seq Scan on pagg_tab_p3_s2 +(25 rows) Instead of an Append node appearing under GroupAggregate, I think we should flatten all the partition scans for the subpartitions whose partition keys are part of group keys and add GroupAggregate on top of each of such partition scans. +-- Parallelism within partition-wise aggregates +RESET max_parallel_workers_per_gather; +SET min_parallel_table_scan_size TO '8kB'; +SET parallel_setup_cost TO 0; +INSERT INTO pagg_tab_para SELECT i%30, i%20 FROM generate_series(0, 29999) i; spaces around % operator? +SHOW max_parallel_workers_per_gather; + max_parallel_workers_per_gather +--------------------------------- + 2 Why do we need this? + +-- When GROUP BY clause matches with PARTITION KEY. +EXPLAIN (COSTS OFF) +SELECT x, sum(y), avg(y), count(*) FROM pagg_tab_para GROUP BY x HAVING avg(y) < 7 ORDER BY 1, 2, 3; + QUERY PLAN +-------------------------------------------------------------------------------------- + Sort + Sort Key: pagg_tab_para_p1.x, (sum(pagg_tab_para_p1.y)), (avg(pagg_tab_para_p1.y)) + -> Append [ ... clipped ...] + -> Finalize GroupAggregate + Group Key: pagg_tab_para_p3.x + Filter: (avg(pagg_tab_para_p3.y) < '7'::numeric) + -> Sort + Sort Key: pagg_tab_para_p3.x + -> Gather + Workers Planned: 2 + -> Partial HashAggregate + Group Key: pagg_tab_para_p3.x + -> Parallel Seq Scan on pagg_tab_para_p3 [ ... clipped ... ] +-- When GROUP BY clause not matches with PARTITION KEY. +EXPLAIN (COSTS OFF) +SELECT y, sum(x), avg(x), count(*) FROM pagg_tab_para GROUP BY y HAVING avg(x) < 12 ORDER BY 1, 2, 3; + QUERY PLAN +-------------------------------------------------------------------------------------- + Sort + Sort Key: pagg_tab_para_p1.y, (sum(pagg_tab_para_p1.x)), (avg(pagg_tab_para_p1.x)) + -> Finalize HashAggregate + Group Key: pagg_tab_para_p1.y [ ... clipped ... ] + -> Gather + Workers Planned: 2 + -> Partial HashAggregate + Group Key: pagg_tab_para_p3.y + -> Parallel Seq Scan on pagg_tab_para_p3 Per a prior discussion on this thread, we shouldn't produce such plans; Parallel Append instead? +SET enable_partition_wise_agg to true; May be just enable it at the beginning instead of enabling and disabling twice? -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
pgsql-hackers by date: