Re: Final Patch for GROUPING SETS - Mailing list pgsql-hackers
From | Andrew Gierth |
---|---|
Subject | Re: Final Patch for GROUPING SETS |
Date | |
Msg-id | 87fvcks1og.fsf@news-spur.riddles.org.uk Whole thread Raw |
In response to | Re: Final Patch for GROUPING SETS (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Final Patch for GROUPING SETS
|
List | pgsql-hackers |
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> I'd already explained in more detail way back when we posted the>> patch. But to reiterate: the ChainAggregate nodes passthrough>> their input data unchanged, but on group boundaries they write>> aggregated result rows to a tuplestore sharedby the whole>> chain. The top node returns the data from the tuplestore after its>> own output is completed. Tom> That seems pretty grotty from a performance+memory consumptionTom> standpoint. At peak memory usage, each one of theSort nodesTom> will contain every input row, Has this objection ever been raised for WindowAgg, which has the same issue? Tom> and the shared tuplestore will contain every output row. Every output row except those produced by the top node, and since this is after grouping, that's expected to be smaller than the input. Tom> That will lead to either a lot of memory eaten, or a lot ofTom> temp-file I/O, depending on how big work_mem is. Yes. Though note that this code only kicks in when dealing with grouping sets more complex than a simple rollup. A CUBE of two dimensions uses only one Sort node above whatever is needed to produce sorted input, and a CUBE of three dimensions uses only two. (It does increase quite a lot for large cubes though.) Tom> In principle, with the CTE+UNION approach I was suggesting, theTom> peak memory consumption would be one copy of theinput rows inTom> the CTE's tuplestore plus one copy in the active branch's SortTom> node. I think a bit of effort wouldbe needed to get there (ie,Tom> shut down one branch's Sort node before starting the next,Tom> something I'm prettysure doesn't happen today). Correct, it doesn't. However, I notice that having ChainAggregate shut down its input would also have the effect of bounding the memory usage (to two copies, which is as good as the append+sorts+CTE case). Is shutting down and reinitializing parts of the plan really feasible here? Or would it be a case of forcing a rescan? >> Second was to generate a CTE for the input data. This didn't get>> very far because everything that already exists tohandle CTE>> nodes assumes that they are explicit in the planner's input (that>> they have their own Query node, etc.)and I was not able to>> determine a good solution. Tom> Seems like restructuring that wouldn't be *that* hard. WeTom> probably don't want it to be completely like a CTE forplanningTom> purposes anyway --- that would foreclose passing down anyTom> knowledge of desired sort order, which we don'twant. But itTom> seems like we could stick a variant of CtePath atop the chosenTom> result path of the scan/join planningphase. If you like I canTom> poke into this a bit. Please do. That seems to cover the high-priority issues from our point of view. We will continue working on the other issues, on the assumption that when we have some idea how to do it your way, we will rip out the ChainAggregate stuff in favour of an Append-based solution. -- Andrew (irc:RhodiumToad)
pgsql-hackers by date: