Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Final Patch for GROUPING SETS |
Date | |
Msg-id | 20150514055031.GE9584@alap3.anarazel.de Whole thread Raw |
In response to | Re: Final Patch for GROUPING SETS (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Final Patch for GROUPING SETS
Re: Final Patch for GROUPING SETS |
List | pgsql-hackers |
On 2015-05-13 22:51:15 +0200, Andres Freund wrote: > I'm pretty sure by now that I dislike the introduction of GroupedVar, > and not just tentatively. While I can see why you found its > introduction to be nicer than fiddling with the result tuple, for me the > disadvantages seem to outweigh the advantage. For one it's rather wierd > to have Var nodes be changed into GroupedVar in setrefs.c. The number > of places that need to be touched even when it's a 'planned stmt only' > type of node is still pretty large. > > Andrew: I'll work on changing this in a couple hours unless you're > speaking up about doing it yourself. I did a stab at removing it, and it imo definitely ends up looking better. The code for the GroupedVar replacement isn't perfect yet, but I think it'd be possible to clean that up until Friday. Unfortunately, after prolonged staring out of the window, I came to the conclusion that I don't think the current tree structure isn't right. I still believe that the general approach of chaining vs. a union or CTE is correct due to the efficiency arguments upthread. My problem is that, unless I very much misunderstand something, the current implementation can end up requiring roughly #sets * #input of additional space for the "sidechannel tuplestore" in some bad cases. That happens if you group by a couple clauses that each lead to a high number of groups. That happens because the aggregated rows produced in the chain nodes can't be returned up-tree, because the the next chain (or final group aggregate) node will expect unaggregated tuples. The current solution for that is to move the aggregated rows produced in chain nodes into a tuplestore that's then drained when the top level aggregate node has done it's own job. While that's probably not too bad in many cases because most of the use cases aggregation will be relatively effective, it does seem to be further evidence that the sidechannel tuplestore isn't the perfect idea. What I think we should/need to do instead is to the chaining locally inside one aggregation node. That way the aggregated tuples can be returned directly, without the sidechannel. While that will require inlining part of the code from nodeSort.c it doesn't seem too bad. Besides the advantage of getting rid of that tuplestore, it'll also fix the explain performance problems (as there's no deep tree to traverse via ruleutils.c), get rid of the the preemtive ExecReScan() to control memory usage. I think it might also make combined hashing/sorting easier. A rough sketch of what I'm thinking of is: ExecAgg() { ... while (!aggstate->consumed_input) { outerslot = ExecProcNode(outerPlanState(aggstate)); if (TupIsNull(outerslot)) { consumed_input = true; break; } if (aggstate->doing_hashing) { entry = lookup_hash_entry(aggstate, outerslot); /* Advance the aggregates */ advance_aggregates(aggstate, entry->pergroup); } if (aggstate->presorted_input || AGG_PLAIN) { /* handle aggregation, return if done with group */ } if (aggstate->doing_chaining) { tuplesort_puttupleslot(tuplesortstate, slot); } } if (aggstate->doing_hashing && !done) agg_retrieve_hashed(); /* * Feed data from one sort to the next, to compute grouping sets that * need differing sort orders. */ last_sort= tuplesortstate[0]; current_sort = numGroupingSets > 0 ? tuplesortstate[1] : NULL; while (aggstate->doing_chaining && !done_sorting) { tuplesort_gettupleslot(last_sort, tmpslot); /* exhausted all tuple from a particular sort order, move on */ if (TupIsNull(tmpslot)) { finalize_aggregates(...); tuplesort_end(last_sort); /* maybe save stats somewhere? */ last_sort = current_sort; current_sort = tuplesortstate[...]; if (all_sorts_done) done_sorting = true; return aggregated; } if (current_sort != NULL) tuplesort_puttupleslot(current_sort, slot); /* check if we crossed a boundary */ if (!execTuplesMatch(...)) { finalize_aggregates(...); aggstate->grp_firstTuple = ... return aggregated; } advance_aggregates(); tuplesort_puttupleslot(current_sort, slot); } } I think this is quite doable and seems likely to actually end up with easier to understand code. But unfortunately it seems to be big enough of a change to make it unlikely to be done in sufficient quality until the freeze. I'll nonetheless work a couple hours on it tomorrow. Andrew, is that a structure you could live with, or not? Others, what do you think? Greetings, Andres Freund
pgsql-hackers by date: