Re: Parallel Aggregate - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Re: Parallel Aggregate |
Date | |
Msg-id | CAA4eK1+zm7iog8RHSrFC-AZ0qxKPJiTekJAOSwHmW5AG9_nTRQ@mail.gmail.com Whole thread Raw |
In response to | Re: Parallel Aggregate (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Parallel Aggregate
|
List | pgsql-hackers |
On Wed, Mar 16, 2016 at 4:19 PM, David Rowley <david.rowley@2ndquadrant.com> wrote:
>
> On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
> > I don't think I'd be objecting if you made PartialAggref a real
> > alternative to Aggref. But that's not what you've got here. A
> > PartialAggref is just a wrapper around an underlying Aggref that
> > changes the interpretation of it - and I think that's not a good idea.
> > If you want to have Aggref and PartialAggref as truly parallel node
> > types, that seems cool, and possibly better than what you've got here
> > now. Alternatively, Aggref can do everything. But I don't think we
> > should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.
>
>
> On 16 March 2016 at 15:04, Robert Haas <robertmhaas@gmail.com> wrote:
> > I don't think I'd be objecting if you made PartialAggref a real
> > alternative to Aggref. But that's not what you've got here. A
> > PartialAggref is just a wrapper around an underlying Aggref that
> > changes the interpretation of it - and I think that's not a good idea.
> > If you want to have Aggref and PartialAggref as truly parallel node
> > types, that seems cool, and possibly better than what you've got here
> > now. Alternatively, Aggref can do everything. But I don't think we
> > should go with this wrapper concept.
>
> Ok, I've now gotten rid of the PartialAggref node, and I'm actually
> quite happy with how it turned out. I made
> search_indexed_tlist_for_partial_aggref() to follow-on the series of
> other search_indexed_tlist_for_* functions and have made it behave the
> same way, by returning the newly created Var instead of doing that in
> fix_combine_agg_expr_mutator(), as the last version did.
>
> Thanks for the suggestion.
>
> New patch attached.
>
Few assorted comments:
1.
/*
+ * Determine if it's possible to perform aggregation in parallel using
+ * multiple worker processes. We can permit this when there's at least one
+ * partial_path in input_rel, but not if the query has grouping sets,
+ * (although this likely just requires a bit more thought). We must also
+ * ensure that any aggregate functions which are present in either the
+ * target list, or in the HAVING clause all support parallel mode.
+ */
+ can_parallel = false;
+
+ if ((parse->hasAggs || parse->groupClause != NIL) &&
+ input_rel->partial_pathlist != NIL &&
+ parse->groupingSets == NIL &&
+ root->glob->parallelModeOK)
I think here you need to use has_parallel_hazard() with second parameter as false to ensure expressions are parallel safe. glob->parallelModeOK flag indicates that there is no parallel unsafe expression, but it can still contain parallel restricted expression.
2.
AggPath *
create_agg_path(PlannerInfo *root,
@@ -2397,9 +2399,11 @@ create_agg_path(PlannerInfo *root,
List *groupClause,
List *qual,
const AggClauseCosts
*aggcosts,
- double numGroups)
+ double numGroups,
+
bool combineStates,
+ bool finalizeAggs)
Don't you need to set parallel_aware flag in this function as we do for create_seqscan_path()?
3.
postgres=# explain select count(*) from t1;
QUERY PLAN
--------------------------------------------------------------------------------
------
Finalize Aggregate (cost=45420.57..45420.58 rows=1 width=8)
-> Gather (cost=45420.35..45420.56 rows=2 width=8)
Number of Workers: 2
-> Partial Aggregate (cost=44420.35..44420.36 rows=1 width=8)
-> Parallel Seq Scan on t1 (cost=0.00..44107.88 rows=124988 wid
th=0)
(5 rows)
Isn't it better to call it as Parallel Aggregate instead of Partial Aggregate. Initialy, we have kept Partial for seqscan, but later on we changed to Parallel Seq Scan, so I am not able to think why it is better to call Partial incase of Aggregates.
4.
/*
+ * Likewise for any partial paths, although this case is more simple as
+
* we don't track the cheapest path.
+ */
+ foreach(lc, current_rel->partial_pathlist)
+
{
+ Path *subpath = (Path *) lfirst(lc);
+
+ Assert(subpath->param_info ==
NULL);
+ lfirst(lc) = apply_projection_to_path(root, current_rel,
+
subpath, scanjoin_target);
+ }
+
Can't we do this by teaching apply_projection_to_path() as done in the latest patch posted by me to push down the target list beneath workers [1].
5.
+ /*
+ * If we find any aggs with an internal transtype then we must ensure
+ * that
pointers to aggregate states are not passed to other processes,
+ * therefore we set the maximum degree
to PAT_INTERNAL_ONLY.
+ */
+ if (aggform->aggtranstype == INTERNALOID)
+
context->allowedtype = PAT_INTERNAL_ONLY;
In the above comment, you have refered maximum degree which is not making much sense to me. If it is not a typo, then can you clarify the same.
6.
+ * fix_combine_agg_expr
+ * Like fix_upper_expr() but additionally adjusts the Aggref->args of
+ * Aggrefs so
that they references the corresponding Aggref in the subplan.
+ */
+static Node *
+fix_combine_agg_expr(PlannerInfo
*root,
+ Node *node,
+ indexed_tlist *subplan_itlist,
+
Index newvarno,
+ int rtoffset)
+{
+ fix_upper_expr_context context;
+
+ context.root =
root;
+ context.subplan_itlist = subplan_itlist;
+ context.newvarno = newvarno;
+ context.rtoffset = rtoffset;
+
return fix_combine_agg_expr_mutator(node, &context);
+}
+
+static Node *
+fix_combine_agg_expr_mutator(Node *node,
fix_upper_expr_context *context)
Don't we want to handle the case of context->subplan_itlist->has_non_vars as it is handled in fix_upper_expr_mutator()? If not then, I think adding the reason for same in comments above function would be better.
7.
tlist.c
+}
\ No newline at end of file
There should be a new line at end of file.
pgsql-hackers by date: