Re: Rethinking representation of partial-aggregate steps - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Rethinking representation of partial-aggregate steps |
Date | |
Msg-id | CA+TgmoYNZiCbbrmy15VWGMGH8qFhwixTM78WeuEWm8gP2x89rQ@mail.gmail.com Whole thread Raw |
In response to | Rethinking representation of partial-aggregate steps (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Rethinking representation of partial-aggregate steps
Re: Rethinking representation of partial-aggregate steps |
List | pgsql-hackers |
On Thu, Jun 23, 2016 at 12:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I really dislike this aspect of the partial-aggregate patch: > > typedef struct Agg > { > ... > bool combineStates; /* input tuples contain transition states */ > bool finalizeAggs; /* should we call the finalfn on agg states? */ > bool serialStates; /* should agg states be (de)serialized? */ > ... > } Agg; > > Representing the various partial-aggregate behaviors as three bools seems > like a bad idea to me. In the first place, this makes it look like there > are as many as eight possible behaviors, whereas only a subset of those > flag combinations are or ever will be supported (not that the comments > anywhere tell you that, much less which ones). In the second place, > it leads to unreadable code like this: > > /* partial phase */ > count_agg_clauses(root, (Node *) partial_grouping_target->exprs, > &agg_partial_costs, false, false, true); > > /* final phase */ > count_agg_clauses(root, (Node *) target->exprs, &agg_final_costs, > true, true, true); > count_agg_clauses(root, parse->havingQual, &agg_final_costs, true, > true, true); > > Good luck telling whether those falses and trues actually do what they > should. > > I'm inclined to think that we should aggressively simplify here, perhaps > replacing all three of these flags with a three-way enum, say > > PARTIALAGG_NORMAL /* simple aggregation */ > PARTIALAGG_PARTIAL /* lower phase of partial aggregation */ > PARTIALAGG_FINAL /* upper phase of partial aggregation */ > > This embodies a couple of judgments that maybe someone would quibble with: > * we don't have any use for intermediate partial aggregation steps; > * we don't have any use for partial aggregation in which data transferred > up needn't be serialized. > > But I can't really see how either of those would make sense for any > practical use-case, and I don't think we need to have a confusing and > bug-prone API in order to hold the door open to support them. Hmm, well I guess I would have to disagree with the idea that those other modes aren't useful. I seem to recall that David had some performance results showing that pushing partial aggregate steps below an Append node resulted in a noticeable speed-up even in the absence of any parallelism, presumably because it avoids whatever projection the Append might do, and also improves icache and dcache locality. So it seems quite possible that you might want to have something like this: FinalizeAggregate -> Gather -> IntermediateAggregate -> Append -> PartialAggregate -> Parallel SeqScan -> PartialAggregate -> Parallel SeqScan -> PartialAggregate -> Parallel SeqScan -> PartialAggregate -> Parallel SeqScan The partial aggregate plans need not serialize, because they are passing data to the same process, and the intermediate aggregate is there, too. I do agree, however, that the three Boolean flags don't make the code entirely easy to read. What I might suggest is that we replace the three Boolean flags with integer flags, something like this: #define AGGOP_COMBINESTATES 0x1 #define AGGOP_SERIALIZESTATES 0x2 #define AGGOP_FINALIZEAGGS 0x4 #define AGGTYPE_SIMPLE (AGGOP_FINALIZEAGGS) #define AGGTYPE_PARTIAL_SERIAL (AGGOP_SERIALIZESTATES) #define AGGTYPE_FINALIZE_SERIAL (AGGOP_COMBINESTATES|AGGOP_FINALIZEAGGS|AGG_SERIALIZESTATES) Code that requests behavior can do so with the AGGTYPE_* constants, but code that implements behavior can test AGGOP_* constants. Then if we decide we need new combinations in the future, we can just add those --- e.g. AGGTYPE_PARTIAL = 0, AGGTYPE_FINALIZE = AGGOP_COMBINESTATES|AGGOP_FINALIZEAGGS, AGGTYPE_INTERMEDIATE = AGGOP_COMBINESTATES --- and have some hope that it will mostly work. I actually think nearly all of the combinations are sensible, here, although I think it may have been a mistake to overload the "serialize states" flag to mean both "does this combining aggregate expect to receive the serial type rather than the transition type?" and also "does this partial aggregate output the serial type rather than the transition type?". In the example above, you'd want the IntermediateAggregate to expect the transition type but output the serial type. Oops. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: