Re: Deduplicate aggregates and transition functions in planner - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: Deduplicate aggregates and transition functions in planner |
Date | |
Msg-id | 16a1e564-1fa5-b4ea-c512-9ca133ea4bc0@iki.fi Whole thread Raw |
In response to | Re: Deduplicate aggregates and transition functions in planner (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Deduplicate aggregates and transition functions in planner
|
List | pgsql-hackers |
On 28/10/2020 21:59, Andres Freund wrote: > On 2020-10-28 21:10:41 +0200, Heikki Linnakangas wrote: >> Currently, ExecInitAgg() performs quite a lot of work, to deduplicate >> identical Aggrefs, as well as Aggrefs that can share the same transition >> state. That doesn't really belong in the executor, we should perform that >> work in the planner. It doesn't change from one invocation of the plan to >> another, and it would be nice to reflect the state-sharing in the plan >> costs. > > Woo! Very glad to see this tackled. > > It wouldn't surprise me to see a small execution time speedup here - > I've seen the load of the aggno show up in profiles. I think you'd be hard-pressed to find a real-life query where it matters. But if you don't care about real life: regression=# do $$ begin for i in 1..100000 loop perform sum(g), sum(g+0), sum(g+1), sum(g+2), sum(g+3), sum(g+4), sum(g+5), sum(g+6), sum(g+7), sum(g+8), sum(g+9), sum(g+10) from generate_series(1,1) g; end loop; end; $$; DO Time: 1282.701 ms (00:01.283) vs. Time: 860.323 ms with the patch. >> @@ -783,14 +783,13 @@ ExecInitExprRec(Expr *node, ExprState *state, >> >> scratch.opcode = EEOP_AGGREF; >> scratch.d.aggref.astate = astate; >> - astate->aggref = aggref; >> + astate->aggno = aggref->aggno; >> >> if (state->parent && IsA(state->parent, AggState)) >> { >> AggState *aggstate = (AggState *) state->parent; >> >> - aggstate->aggs = lappend(aggstate->aggs, astate); >> - aggstate->numaggs++; >> + aggstate->aggs = lappend(aggstate->aggs, aggref); > > Hm. Why is aggstate->aggs still built during expression initialization? > Imo that's a pretty huge wart that also introduces more > order-of-operation brittleness to executor startup. The Agg node itself doesn't include any information about the aggregates and transition functions. Because of that, ExecInitAgg needs a "representive" Aggref for each transition state and agg, to initialize the per-trans and per-agg structs. The expression initialization makes those Aggrefs available for ExecInitAgg. Instead of collecting all the Aggrefs in a list, ExecInitExprRec() could set each representative Aggref directly in the right per-trans and per-agg struct, based on the 'aggno' and 'aggtransno' fields. That requires initializing the per-trans and per-agg arrays earlier, and for that, we would need to store the # of aggs and transition states in the Agg node, like you also suggested. Certainly doable, but on the whole, it didn't really seem better to me. Attached is a patch, to demonstrate what that looks like, on top of the main patch. It's not complete, there's at least one case with hash-DISTINCT for queries like "SELECT DISTINCT aggregate(x) ..." where the planner creates an Agg for the DISTINCT without aggregates, but the code currently passes numAggs=1 to the executor. Some further changes would be needed in the planner, to mark the AggPath generated for deduplication differently from the AggPaths created for aggregation. Again that's doable, but on the whole I prefer the approach to scan the Aggrefs in executor startup, for now. I'd like to get rid of the "representative Aggrefs" altogether, and pass information about the transition and final functions from planner to executor in some other form. But that's exactly what got me into the refactoring that was ballooning out of hand that I mentioned. - Heikki
Attachment
pgsql-hackers by date: