Thread: User-defined Aggregate function and performance.

User-defined Aggregate function and performance.

From

Ronan Dunklau

Date:

02 April 2012, 11:19:42

Hello.

I've tried asking this on the irc channel, without much success.

I'm trying to define a "weighted mean" aggregate using postgresql create
aggregate feature.

I've been able to quickly write the required pgsql code to get it
working, but after testing it on a sample 10000 rows table, it seems to
be approximately 6 to 10 times slower than pure sql.

My initial implementation was in pl/pgsql, and did not mark the
functions as immutable. I did so after a suggestion from an irc user,
but it did not change anything performance wise.

Any idea on how to make it faster ?

Here is the code:

create type _weighted_avg_type as (
  running_sum numeric,
  running_count numeric
);

create or replace function mul_sum (a _weighted_avg_type, amount
numeric, weight numeric) returns _weighted_avg_type as $$
    select ((($1.running_sum + ($2 * $3)) ,  ($1.running_count +  $3)
))::_weighted_avg_type;
$$ language sql immutable;

create or replace function final_sum (a _weighted_avg_type) returns
numeric as $$
  SELECT CASE
      WHEN $1.running_count = 0 THEN 0
      ELSE $1.running_sum / $1.running_count
  END;
$$ language sql immutable;

create aggregate weighted_avg (numeric, numeric)(
  sfunc = mul_sum,
  finalfunc = final_sum,
  stype = _weighted_avg_type,
  initcond = '(0,0)'
);

create temp table test as (select a::numeric, b::numeric from
generate_series(1, 100) as t1(a), generate_series(1, 100) as t2(b));

-- Custom aggregate
select weighted_avg(a, b) from test;


-- pure sql version
select case when sum(b::numeric) = 0 then 0 else sum(a::numeric *
b::numeric) / sum(b::numeric) end from test;


--
Ronan Dunklau

Re: User-defined Aggregate function and performance.

From

Tom Lane

Date:

02 April 2012, 13:06:43

Ronan Dunklau <rdunklau@gmail.com> writes:
> I'm trying to define a "weighted mean" aggregate using postgresql create
> aggregate feature.

> I've been able to quickly write the required pgsql code to get it
> working, but after testing it on a sample 10000 rows table, it seems to
> be approximately 6 to 10 times slower than pure sql.

It might help to use a two-element array for the transition state,
instead of a custom composite type.

> My initial implementation was in pl/pgsql, and did not mark the
> functions as immutable. I did so after a suggestion from an irc user,
> but it did not change anything performance wise.

Those suggestions would possibly help for a function that's meant to be
inlined into larger SQL expressions, but they won't do much for an
aggregate support function.  I'm not real sure, but I think plpgsql
might be faster in this context.

Another thing to think about is whether you really need type numeric
here.  float8 would be a lot faster ... though you might have roundoff
issues.

            regards, tom lane

Re: User-defined Aggregate function and performance.

From

Ronan Dunklau

Date:

03 April 2012, 06:51:27

On 02/04/2012 18:06, Tom Lane wrote:
> Ronan Dunklau <rdunklau@gmail.com> writes:
>> I'm trying to define a "weighted mean" aggregate using postgresql create
>> aggregate feature.
>
>> I've been able to quickly write the required pgsql code to get it
>> working, but after testing it on a sample 10000 rows table, it seems to
>> be approximately 6 to 10 times slower than pure sql.
>
> It might help to use a two-element array for the transition state,
> instead of a custom composite type.

It does not change much.
It seems that altering the transition state instead of building a new
one does help, though. When altering the state, the composite type
version seems to be faster. But it still much slower than the
hand-written sql version.

>
>> My initial implementation was in pl/pgsql, and did not mark the
>> functions as immutable. I did so after a suggestion from an irc user,
>> but it did not change anything performance wise.
>
> Those suggestions would possibly help for a function that's meant to be
> inlined into larger SQL expressions, but they won't do much for an
> aggregate support function.  I'm not real sure, but I think plpgsql
> might be faster in this context.
>
> Another thing to think about is whether you really need type numeric
> here.  float8 would be a lot faster ... though you might have roundoff
> issues.

The "hand-written" sql using only built-in functions performs really
well with numerics. Why do you suggest that it could be the bottleneck ?

I solved the problem by writing a C extension for it:

http://pgxn.org/dist/weighted_mean/1.0.0/

Regards,

--
Ronan Dunklau