Thread: User-defined Aggregate function and performance.
Hello. I've tried asking this on the irc channel, without much success. I'm trying to define a "weighted mean" aggregate using postgresql create aggregate feature. I've been able to quickly write the required pgsql code to get it working, but after testing it on a sample 10000 rows table, it seems to be approximately 6 to 10 times slower than pure sql. My initial implementation was in pl/pgsql, and did not mark the functions as immutable. I did so after a suggestion from an irc user, but it did not change anything performance wise. Any idea on how to make it faster ? Here is the code: create type _weighted_avg_type as ( running_sum numeric, running_count numeric ); create or replace function mul_sum (a _weighted_avg_type, amount numeric, weight numeric) returns _weighted_avg_type as $$ select ((($1.running_sum + ($2 * $3)) , ($1.running_count + $3) ))::_weighted_avg_type; $$ language sql immutable; create or replace function final_sum (a _weighted_avg_type) returns numeric as $$ SELECT CASE WHEN $1.running_count = 0 THEN 0 ELSE $1.running_sum / $1.running_count END; $$ language sql immutable; create aggregate weighted_avg (numeric, numeric)( sfunc = mul_sum, finalfunc = final_sum, stype = _weighted_avg_type, initcond = '(0,0)' ); create temp table test as (select a::numeric, b::numeric from generate_series(1, 100) as t1(a), generate_series(1, 100) as t2(b)); -- Custom aggregate select weighted_avg(a, b) from test; -- pure sql version select case when sum(b::numeric) = 0 then 0 else sum(a::numeric * b::numeric) / sum(b::numeric) end from test; -- Ronan Dunklau
Ronan Dunklau <rdunklau@gmail.com> writes: > I'm trying to define a "weighted mean" aggregate using postgresql create > aggregate feature. > I've been able to quickly write the required pgsql code to get it > working, but after testing it on a sample 10000 rows table, it seems to > be approximately 6 to 10 times slower than pure sql. It might help to use a two-element array for the transition state, instead of a custom composite type. > My initial implementation was in pl/pgsql, and did not mark the > functions as immutable. I did so after a suggestion from an irc user, > but it did not change anything performance wise. Those suggestions would possibly help for a function that's meant to be inlined into larger SQL expressions, but they won't do much for an aggregate support function. I'm not real sure, but I think plpgsql might be faster in this context. Another thing to think about is whether you really need type numeric here. float8 would be a lot faster ... though you might have roundoff issues. regards, tom lane
On 02/04/2012 18:06, Tom Lane wrote: > Ronan Dunklau <rdunklau@gmail.com> writes: >> I'm trying to define a "weighted mean" aggregate using postgresql create >> aggregate feature. > >> I've been able to quickly write the required pgsql code to get it >> working, but after testing it on a sample 10000 rows table, it seems to >> be approximately 6 to 10 times slower than pure sql. > > It might help to use a two-element array for the transition state, > instead of a custom composite type. It does not change much. It seems that altering the transition state instead of building a new one does help, though. When altering the state, the composite type version seems to be faster. But it still much slower than the hand-written sql version. > >> My initial implementation was in pl/pgsql, and did not mark the >> functions as immutable. I did so after a suggestion from an irc user, >> but it did not change anything performance wise. > > Those suggestions would possibly help for a function that's meant to be > inlined into larger SQL expressions, but they won't do much for an > aggregate support function. I'm not real sure, but I think plpgsql > might be faster in this context. > > Another thing to think about is whether you really need type numeric > here. float8 would be a lot faster ... though you might have roundoff > issues. The "hand-written" sql using only built-in functions performs really well with numerics. Why do you suggest that it could be the bottleneck ? I solved the problem by writing a C extension for it: http://pgxn.org/dist/weighted_mean/1.0.0/ Regards, -- Ronan Dunklau