Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Partition-wise aggregation/grouping |
Date | |
Msg-id | CA+TgmoZSSLx0aYokv8sJdB7ExfrjE7s+Ee3aVgnvVk9me=_KEg@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise aggregation/grouping (Jeevan Chalke <jeevan.chalke@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise aggregation/grouping
|
List | pgsql-hackers |
On Mon, Mar 5, 2018 at 3:56 AM, Jeevan Chalke <jeevan.chalke@enterprisedb.com> wrote: > However, to perform Gather or Gather Merge once we have all partial paths > ready, and to avoid too many existing code rearrangement, I am calling > try_partitionwise_grouping() before we do any aggregation/grouping on whole > relation. By doing this, we will be having all partial paths in > partially_grouped_rel and then existing code will do required finalization > along with any Gather or Gather Merge, if required. > > Please have a look over attached patch-set and let me know if it needs > further changes. This does look better. + <term><varname>enable_partitionwise_agg</varname> (<type>boolean</type>) Please don't abbreviate "aggregate" to "agg". - /* Build final grouping paths */ - add_paths_to_grouping_rel(root, input_rel, grouped_rel, target, - partially_grouped_rel, agg_costs, - &agg_final_costs, gd, can_sort, can_hash, - dNumGroups, (List *) parse->havingQual); + if (isPartialAgg) + { + Assert(agg_partial_costs && agg_final_costs); + add_paths_to_partial_grouping_rel(root, input_rel, + partially_grouped_rel, + agg_partial_costs, + gd, can_sort, can_hash, + false, true); + } + else + { + double dNumGroups; + + /* Estimate number of groups. */ + dNumGroups = get_number_of_groups(root, + cheapest_path->rows, + gd, + child_data ? make_tlist_from_pathtarget(target) : parse->targetList); + + /* Build final grouping paths */ + add_paths_to_grouping_rel(root, input_rel, grouped_rel, target, + partially_grouped_rel, agg_costs, + agg_final_costs, gd, can_sort, can_hash, + dNumGroups, (List *) havingQual); + } This looks strange. Why do we go down two completely different code paths here? It seems to me that the set of paths we add to the partial_pathlist shouldn't depend at all on isPartialAgg. I might be confused, but it seems to me that any aggregate path we construct that is going to run in parallel must necessarily be partial, because even if each group will occur only in one partition, it might still occur in multiple workers for that partition, so finalization would be needed. On the other hand, for non-partial paths, we can add then to partially_grouped_rel when isPartialAgg = true and to grouped_rel when isPartialAgg = false, with the only difference being AGGSPLIT_SIMPLE vs. AGGSPLIT_INITIAL_SERIAL. But that doesn't appear to be what this is doing. + /* + * If there are any fully aggregated partial paths present, may be because + * of parallel Append over partitionwise aggregates, we must stick a + * Gather or Gather Merge path atop the cheapest partial path. + */ + if (grouped_rel->partial_pathlist) This comment is copied from someplace where the code does what the comment says, but here it doesn't do any such thing. More tomorrow... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: