Re: [PATCH] Negative Transition Aggregate Functions (WIP) - Mailing list pgsql-hackers
From | Florian Pflug |
---|---|
Subject | Re: [PATCH] Negative Transition Aggregate Functions (WIP) |
Date | |
Msg-id | 1885DD71-07BC-4646-9309-7EB47E117ADE@phlo.org Whole thread Raw |
In response to | Re: [PATCH] Negative Transition Aggregate Functions (WIP) (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: [PATCH] Negative Transition Aggregate Functions (WIP)
|
List | pgsql-hackers |
On Jan23, 2014, at 01:17 , David Rowley <dgrowleyml@gmail.com> wrote: > On Wed, Jan 22, 2014 at 12:46 AM, Florian Pflug <fgp@phlo.org> wrote: >> If you want to play with >> this, I think the first step has to be to find a set of guarantees that >> SUM(float) is supposed to meet. Currently, SUM(float) guarantees that if the >> summands all have the same sign, the error is bound by C * N, where C is some >> (machine-specific?) constant (about 1e-15 or so), and N is the number of input >> rows. Or at least so I think from looking at SUMs over floats in descending >> order, which I guess is the worst case. You could, for example, try to see if >> you can find a invertibility conditions which guarantees the same, but allows >> C to be larger. That would put a bound on the number of digits lost by the new >> SUM(float) compared to the old one. > > I guess if the case is that normal (non-window) sum(float) results are undefined > unless you add an order by to the aggregate then I guess there is no possible > logic to put in for inverse transitions that will make them behave the same as > an undefined behaviour. Actually, if sum(float) was really undefined, it'd be *very* easy to provide an inverse transition function - just make it a no-op. Heck, you could then even make the forward transition function a no-op, since the very definition of "undefined behaviour" is "result can be anything, including setting your house on fire". The point is, it's *not* undefined - it's just imprecise. And the imprecision can even be quantified, it just depends on the number of input rows (the equal-sign requirement is mostly there to make "imprecision" equivalent to "relative error"). >> > If it seems sound enough, then I may implement it in C to see how much >> > overhead it adds to forward aggregation for floating point types, but even >> > if it did add too much overhead to forward aggregation it might be worth >> > allowing aggregates to have 2 forward transition functions and if the 2nd >> > one exists then it could be used in windowing functions where the frame >> > does not have "unbounded following". >> >> I don't think adding yet another type of aggregation function is the >> solution here. > > Why not? > > I thought, if in the cases where we need to burden the forward transition > functions with extra code to make inverse transitions possible, then why > not have an extra transition function so that it does not slow down normal > aggregation? > > My testing of sum(numeric) last night does not show any slow down by that > extra dscale tracking code that was added, but if it did then I'd be thinking > seriously about 2 forward transition functions to get around the problem. > I don't understand what would be wrong with that. The core code could just > make use of the 2nd function if it existed and inverse transitions were > thought to be required. i.e in a window context and does not > have UNBOUNDED PRECEDING. First, every additional function increases the maintenance burden, and at some point the expected benefit simply isn't worth it. IMHO at least. Secondly, you'd really need a second aggregate definition - usually, the non-invertible aggregate will get away with a much smaller state representation. Aggregates with state type "internal" could hack their way around that by simply not using the additional parts of their state structure, but e.g. plpgsql aggregates would still incur overhead due to repeated copying of a unnecessary large state value. If it's possible to represent the forward-only but not the invertible state state with a copy-by-value type, that overhead could easily be a large part of the total overhead you paid for having an inverse transition function. And lastly, we could achieve much of the benefit of a second transition function by simply telling the forward transition function whether to expect inverse transition function calls. We already expose things like the aggregation memory context to C-language transition functions, so the basic mechanism is already there. In fact, I pondered whether to do this - but then figured I'd leave it to however needs that facility first. Though it wouldn't be much code - basically, setting a flag in the WindowAggState, and exporting a function to be used by aggregates to read it, similar to what AggCheckCallContext does. best regards, Florian Pflug
pgsql-hackers by date: