Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers
From | Ashutosh Bapat |
---|---|
Subject | Re: [HACKERS] Partition-wise aggregation/grouping |
Date | |
Msg-id | CAFjFpRf9mJLovBhsioSjUcDwg5BgSGid-DSxVhn0MF6AXreBTA@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise aggregation/grouping (Jeevan Chalke <jeevan.chalke@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise aggregation/grouping
|
List | pgsql-hackers |
On Wed, Sep 27, 2017 at 3:42 PM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > Thanks Ashutosh for reviewing. > > Attached new patch-set with following changes: > > 1. Removed earlier 0007 and 0008 patches which were PoC for supporting > partial aggregation over fdw. I removed them as it will be a different > issue altogether and hence I will tackle them separately once this is > done. > > This patch-set now includes support for parallel plans within partitions. > > Notes: > HEAD: 59597e6 > Partition-wise Join Version: 34 > > (First six patches 0001 - 0006, remains the same functionality-wise) > 0007 - Refactors partial grouping paths creation into the separate function. > 0008 - Enables parallelism within the partition-wise aggregation. > > This patch also includes a fix for the crash reported by Rajkumar. > While forcibly applying scan/join target to all the Paths for the scan/join > rel, earlier I was using apply_projection_to_path() which modifies the path > in-place which causing this crash as the path finally chosen has been > updated by partition-wise agg path creation. Now I have used > create_projection_path() like we do in partial aggregation paths. > > Also, fixed issues reported by Ashutosh. Thanks. Here are comments on 0004 from last patch set. But most the comments still apply. Patch 0001 adds functions create_hash_agg_path() and create_sort_agg_path(). Patch 0004 adds a new argument to those functions for conditions in HAVING clause. We should move those changes to 0001 and pass parse->havingQual to these functions in 0001 itself. That will keep all changes to those functions together and also make 0003 small. The prologue of try_partition_wise_grouping() mentions a restriction of partition keys being leading group by clauses. This restriction is not specified in the prologue of have_grouping_by_partkey(), which actually checks for this restriction. The requirement per prologue of that function is just to have partition keys in group clause. I think have_grouping_by_partkey() is correct, and we should change prologue of try_partition_wise_grouping() to be in sync with have_grouping_by_partkey(). The prologue explains why partition-wise aggregation/grouping would be efficient with this restriction, but it doesn't explain why partial aggregation/grouping per partition would be efficient. May be it should do that as well. Similar is the case with partial aggregation/grouping discussion in README. + /* Do not handle grouping sets for now. */ Is this a real restriction or just restriction for first cut of this feature? Can you please add more explanation? May be update README as well? + grouped_rel->part_scheme = input_rel->part_scheme; Is this true even for partial aggregates? I think not. Since group by clause does not contain partition keys, the rows from multiple partitions participate in one group and thus the partition keys of input relation do not apply to the grouped relation. In this case, it seems that the grouped rel will have part_rels but will not be partitioned. + /* + * If there is no path for the child relation, then we cannot add + * aggregate path too. + */ + if (input_child_rel->pathlist == NIL) + return; When can this happen? I think, similar to partition-wise join it happens when there is a dummy parent relation. See [1]. If that's the case, you may want to do things similar to what partition-wise join is doing. If there's some other reason for doing this, returing from here half-way is actually waste of memory and planning time. Instead, we may want to loop over the part_rels to find if any of those have empty pathlist and return from there before doing any actual work. + extra.pathTarget = child_target; + extra.inputRows = input_child_rel->cheapest_startup_path->rows; + extra.havingQual = (Node *) adjust_appendrel_attrs(root, + (Node *) query->havingQual, + nappinfos, + appinfos); These lines are updating some fields of "extra" structure in every loop. The structure is passed to create_child_grouping_paths() in the loop and to add_paths_to_append_rel() outside the loop. Thus add_paths_to_append_rel() only gets some member values for the last child. Is that right? Should we split extra into two structures one to be used within the loop and one outside? Or may be send the members being updated within the loop separately? + /* + * Forcibly apply scan/join target to all the Paths for the scan/join + * rel. + * [ lines clipped ] + if (subpath == input_child_rel->cheapest_total_path) + input_child_rel->cheapest_total_path = path; + } + } This code seems to be copied from grouping_planner() almost verbatim. Is there a way we can refactor it into a function and use it in both the places. have_grouping_by_partkey() may use match_expr_to_partition_keys() to find whether a given clause expression matches any of the partition keys. Or you could use list_intersection() instead of following loop + foreach(lc, partexprs) + { + Expr *partexpr = lfirst(lc); + + if (list_member(groupexprs, partexpr)) + { + found = true; + break; + } + } + /* + * If none of the partition key matches with any of the GROUP BY + * expression, return false. + */ + if (!found) + return false; create_child_grouping_paths() and create_grouping_paths() has almost similar code. Is there a way we could refactor the code to extract common code into a function called by these two functions or reuse create_grouping_paths() for children as well? I don't think we will be able to do the later. + /* Nothing to do if there is an empty pathlist */ + if (grouped_rel->pathlist == NIL) + return false; When would that happen? Similar condition in case of parent grouped rel throws an error, so when this code is called, we know for sure that parent had non-empty pathlist. So, we would expect child to have non-empty pathlist as well. + grouped_rel = fetch_upper_rel(root, UPPERREL_GROUP_AGG, + input_rel->relids); + + /* Mark this rel as "other upper rel" */ + grouped_rel->reloptkind = RELOPT_OTHER_UPPER_REL; I think we need to pass relopkind as an argument to fetch_upper_rel(), now that we have "upper" relations and "other upper" relations. relids will still be a "key" to find an upper relation but its reloptkind should match the given reloptkind. fetch_upper_rel() is used to create the upper relation if it doesn't exist. So, with the above code, if some other function calls fetch_upper_rel() with given relids, it would get an upper rel with RELOPT_UPPER_REL and then this code would change it to RELOPT_OTHER_UPPER_REL. That looks odd. May be we should have written build_upper_rel() and find_upper_rel() similar to build_join_rel() and find_join_rel() instead of combining both the functionalities in one function. + /* + * create_append_path() sets the path target from the given relation. + * However, in this case grouped_rel doesn't have a target set. So we + * must set the pathtarget to the passed in target. + */ + apath->pathtarget = target; I think, we should change create_append_path() functions to accept target as an additional argument. For append rels other than aggregate and grouping, target will be same as relation's target. For agg/group append rels, we will pass different targets for partial and non-partial grouping paths. + /* + * Since Append's output is always unsorted, we'll need to sort, + * unless there's no GROUP BY clause or a degenerate (constant) one, + * in which case there will only be a single group. + */ append path here can be output of either merge append or append. If it's output of merge append, we don't need to sort it again, do we? create_partition_agg_paths() creates append paths and then adds finalization path if necessary. The code to add finalization path seems to be similar to the code that adds finalization path for parallel query. May be we could take out common code into a function and call that function in two places. I see this function as accepting a partial aggregation/grouping path and returning a path that finalizes partial aggregates/groups. [1] https://www.postgresql.org/message-id/CAFjFpRd5+zroxY7UMGTR2M=rjBV4aBOCxQg3+1rBmTPLK5mpDg@mail.gmail.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
pgsql-hackers by date: