Home > mailing lists

Re: Deduplicate aggregates and transition functions in planner - Mailing list pgsql-hackers

From	Heikki Linnakangas
Subject	Re: Deduplicate aggregates and transition functions in planner
Date	October 29, 2020 08:17:20
Msg-id	16a1e564-1fa5-b4ea-c512-9ca133ea4bc0@iki.fi Whole thread Raw
In response to	Re: Deduplicate aggregates and transition functions in planner (Andres Freund <andres@anarazel.de>)
Responses	Re: Deduplicate aggregates and transition functions in planner
List	pgsql-hackers

Tree view

On 28/10/2020 21:59, Andres Freund wrote:
> On 2020-10-28 21:10:41 +0200, Heikki Linnakangas wrote:
>> Currently, ExecInitAgg() performs quite a lot of work, to deduplicate
>> identical Aggrefs, as well as Aggrefs that can share the same transition
>> state. That doesn't really belong in the executor, we should perform that
>> work in the planner. It doesn't change from one invocation of the plan to
>> another, and it would be nice to reflect the state-sharing in the plan
>> costs.
> 
> Woo! Very glad to see this tackled.
> 
> It wouldn't surprise me to see a small execution time speedup here -
> I've seen the load of the aggno show up in profiles.

I think you'd be hard-pressed to find a real-life query where it 
matters. But if you don't care about real life:

regression=# do $$
begin
   for i in 1..100000 loop
     perform sum(g), sum(g+0), sum(g+1), sum(g+2), sum(g+3), sum(g+4), 
sum(g+5), sum(g+6), sum(g+7), sum(g+8), sum(g+9), sum(g+10) from 
generate_series(1,1) g;
   end loop;
end;
$$;
DO
Time: 1282.701 ms (00:01.283)

vs.

Time: 860.323 ms

with the patch.

>> @@ -783,14 +783,13 @@ ExecInitExprRec(Expr *node, ExprState *state,
>>   
>>                   scratch.opcode = EEOP_AGGREF;
>>                   scratch.d.aggref.astate = astate;
>> -                astate->aggref = aggref;
>> +                astate->aggno = aggref->aggno;
>>   
>>                   if (state->parent && IsA(state->parent, AggState))
>>                   {
>>                       AggState   *aggstate = (AggState *) state->parent;
>>   
>> -                    aggstate->aggs = lappend(aggstate->aggs, astate);
>> -                    aggstate->numaggs++;
>> +                    aggstate->aggs = lappend(aggstate->aggs, aggref);
> 
> Hm. Why is aggstate->aggs still built during expression initialization?
> Imo that's a pretty huge wart that also introduces more
> order-of-operation brittleness to executor startup.

The Agg node itself doesn't include any information about the aggregates 
and transition functions. Because of that, ExecInitAgg needs a 
"representive" Aggref for each transition state and agg, to initialize 
the per-trans and per-agg structs. The expression initialization makes 
those Aggrefs available for ExecInitAgg.

Instead of collecting all the Aggrefs in a list, ExecInitExprRec() could 
set each representative Aggref directly in the right per-trans and 
per-agg struct, based on the 'aggno' and 'aggtransno' fields. That 
requires initializing the per-trans and per-agg arrays earlier, and for 
that, we would need to store the # of aggs and transition states in the 
Agg node, like you also suggested. Certainly doable, but on the whole, 
it didn't really seem better to me. Attached is a patch, to demonstrate 
what that looks like, on top of the main patch. It's not complete, 
there's at least one case with hash-DISTINCT for queries like "SELECT 
DISTINCT aggregate(x) ..." where the planner creates an Agg for the 
DISTINCT without aggregates, but the code currently passes numAggs=1 to 
the executor. Some further changes would be needed in the planner, to 
mark the AggPath generated for deduplication differently from the 
AggPaths created for aggregation. Again that's doable, but on the whole 
I prefer the approach to scan the Aggrefs in executor startup, for now.

I'd like to get rid of the "representative Aggrefs" altogether, and pass 
information about the transition and final functions from planner to 
executor in some other form. But that's exactly what got me into the 
refactoring that was ballooning out of hand that I mentioned.

- Heikki

Attachment

pass-numaggs-from-planner.patch

pgsql-hackers by date:

From: Masahiro Ikeda
Date: 29 October 2020, 08:03:56
Subject: Re: Add statistics to pg_stat_wal view for wal related parameter tuning

From: Heikki Linnakangas
Date: 29 October 2020, 08:50:44
Subject: Re: Parallel copy

Re: Deduplicate aggregates and transition functions in planner - Mailing list pgsql-hackers

Attachment

Previous

Next