Home > mailing lists

Re: Combining Aggregates - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: Combining Aggregates
Date	January 21, 2016 21:08:33
Msg-id	CAKJS1f-RiKMH3FTABnrXWtU+uMrEA3a+at47GdUNu+u7-ySR8g@mail.gmail.com Whole thread Raw
In response to	Re: Combining Aggregates (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: Combining Aggregates
List	pgsql-hackers

Tree view

On 22 January 2016 at 06:56, Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Jan 20, 2016 at 8:32 PM, David Rowley
> <david.rowley@2ndquadrant.com> wrote:
>> The other two usages which I have thought of are;
>>
>> 1) Aggregating before UNION ALL, which might be fairly simple after the
>> grouping planner changes, as it may just be a matter of considering another
>> "grouping path" which partially aggregates before the UNION ALL, and
>> performs the final grouping stage after UNION ALL. At this stage it's hard
>> to say how that will work as I'm not sure how far changes to the grouping
>> planner will go. Perhaps Tom can comment?
>
> I hope he will, but in the meantime let me ask how this does us any
> good.  UNION ALL turns into an Append node.  Pushing aggregation
> through an Append doesn't make anything faster AFAICS *unless* you can
> further optimize beginning at that point.  For example, if one or both
> sides of the Append node have a Gather under them, and you can push
> the partial aggregation down into the Gather; or if you hit a Foreign
> Scan and can have it do partial aggregation on the remote side, you
> win.  But if you just have:
>
> Finalize Aggregate
> -> Append
>   -> Partial Aggregate
>     -> Thing One
>   -> Partial Aggregate
>     -> Thing Two
>
> Instead of:
>
> Aggregate
> -> Append
>   -> Thing One
>   -> Thing Two
>
> ...then I don't see the point, at least not for a single-group
> Aggregate or HashAggregate.  For a GroupAggregate, Thing One and Thing
> Two might need to be sorted, and sorting two or more smaller data sets
> might be faster than sorting one larger data set, but I can't see us
> winning anything very big here.
>
> To be clear, I'm not saying we shouldn't do this.  I just don't think
> it does much *by itself*.  If you combine it with other optimizations
> that let the aggregation be pushed further down the plan tree, it
> could win big.

Yes, I agree, it's not a big win, at least not in the case of a serial
plan. If each branch of the UNION ALL could be processed in parallel,
then that's different.

It's quite simple to test how much of a win it'll be in the serial
case today, and yes, it's not much, but it's a bit.

create table t1 as select x from generate_series(1,1000000) x(x);
vacuum analyze t1;
select count(*) from (select * from t1 union all select * from t1) t; count
---------2000000
(1 row)

Time: 185.793 ms

-- Mock up pushed down aggregation by using sum() as a combine
function for count(*)
select sum(c) from (select count(*) c from t1 union all select
count(*) from t1) t;  sum
---------2000000
(1 row)

Time: 162.076 ms

Not particularly incredible, but we don't normally turn our noses up
at a 14% improvement, so let's just see how complex it will be to
implement, once the upper planner changes are done.

But as you mention about lack of ability to make use of pre-sorted
Path for each branch of the UNION ALL; I was really hoping Tom's patch
will improve that part by allowing the planner to choose a pre-sorted
Path and perform a MergeAppend instead of an Append, which would allow
pre-sorted input into a GroupAggregate node.

-- David Rowley                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Vladimir Sitnikov
Date: 21 January 2016, 20:55:53
Subject: Re: Set search_path + server-prepared statements = cached plan must not change result type

From: Tomasz Rybak
Date: 21 January 2016, 22:14:09
Subject: Re: pglogical_output - a general purpose logical decoding output plugin

Re: Combining Aggregates - Mailing list pgsql-hackers

Previous

Next