Re: Combining Aggregates - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: Combining Aggregates |
Date | |
Msg-id | CAKJS1f-RiKMH3FTABnrXWtU+uMrEA3a+at47GdUNu+u7-ySR8g@mail.gmail.com Whole thread Raw |
In response to | Re: Combining Aggregates (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Combining Aggregates
|
List | pgsql-hackers |
On 22 January 2016 at 06:56, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jan 20, 2016 at 8:32 PM, David Rowley > <david.rowley@2ndquadrant.com> wrote: >> The other two usages which I have thought of are; >> >> 1) Aggregating before UNION ALL, which might be fairly simple after the >> grouping planner changes, as it may just be a matter of considering another >> "grouping path" which partially aggregates before the UNION ALL, and >> performs the final grouping stage after UNION ALL. At this stage it's hard >> to say how that will work as I'm not sure how far changes to the grouping >> planner will go. Perhaps Tom can comment? > > I hope he will, but in the meantime let me ask how this does us any > good. UNION ALL turns into an Append node. Pushing aggregation > through an Append doesn't make anything faster AFAICS *unless* you can > further optimize beginning at that point. For example, if one or both > sides of the Append node have a Gather under them, and you can push > the partial aggregation down into the Gather; or if you hit a Foreign > Scan and can have it do partial aggregation on the remote side, you > win. But if you just have: > > Finalize Aggregate > -> Append > -> Partial Aggregate > -> Thing One > -> Partial Aggregate > -> Thing Two > > Instead of: > > Aggregate > -> Append > -> Thing One > -> Thing Two > > ...then I don't see the point, at least not for a single-group > Aggregate or HashAggregate. For a GroupAggregate, Thing One and Thing > Two might need to be sorted, and sorting two or more smaller data sets > might be faster than sorting one larger data set, but I can't see us > winning anything very big here. > > To be clear, I'm not saying we shouldn't do this. I just don't think > it does much *by itself*. If you combine it with other optimizations > that let the aggregation be pushed further down the plan tree, it > could win big. Yes, I agree, it's not a big win, at least not in the case of a serial plan. If each branch of the UNION ALL could be processed in parallel, then that's different. It's quite simple to test how much of a win it'll be in the serial case today, and yes, it's not much, but it's a bit. create table t1 as select x from generate_series(1,1000000) x(x); vacuum analyze t1; select count(*) from (select * from t1 union all select * from t1) t; count ---------2000000 (1 row) Time: 185.793 ms -- Mock up pushed down aggregation by using sum() as a combine function for count(*) select sum(c) from (select count(*) c from t1 union all select count(*) from t1) t; sum ---------2000000 (1 row) Time: 162.076 ms Not particularly incredible, but we don't normally turn our noses up at a 14% improvement, so let's just see how complex it will be to implement, once the upper planner changes are done. But as you mention about lack of ability to make use of pre-sorted Path for each branch of the UNION ALL; I was really hoping Tom's patch will improve that part by allowing the planner to choose a pre-sorted Path and perform a MergeAppend instead of an Append, which would allow pre-sorted input into a GroupAggregate node. -- David Rowley http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: