Re: introduction of WIP window function patch - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: introduction of WIP window function patch |
Date | |
Msg-id | 1215278473.4051.343.camel@ebony.site Whole thread Raw |
In response to | Re: introduction of WIP window function patch (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: introduction of WIP window function patch
|
List | pgsql-hackers |
On Sat, 2008-07-05 at 16:20 +0200, Martijn van Oosterhout wrote: > On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote: > > http://umitanuki.net/pgsql/wfv01/design.html > > > > The problem is, as written in the "Things to discussed" section of the > > document, how you define window functions (e.g. RANK()). My idea is to > > treat them as specialized functions such as SET OF functions and mark > > it in pg_proc. But this doesn't resolve RANK() boundary problem. > > Actually, I would make RANK() and ROW_NUMBER() act more like > aggregates. ISTM you have two kinds of window functions: > > - aggregation: a result is calculated over a set and the result copied > across all the rows. > - order depenadant: same as above, but the result is different for each > row. > > I think you could make the latter work using the current aggregation > setup, just by calling the final_func for each row rather than just > once at the end. AFAICS there's no overlap between windowed aggregates and normal aggregates, so we can different infrastructure for each. I like the suggestion of doing it very similarly to current aggregates, but I would introduce a new function hook for windowed aggregates, wfunc. i.e. to create a windowed aggregate you would do CREATE AGGREGATE window_func() (sfunc = ...stype = ...wfunc = ...initcond = ) For each row we would execute the transition function (sfunc) then, if there is a window function (wfunc) then we call that to return a value for this tuple (so in that case we execute two functions per tuple in the window). If wfunc is not set then we return the transition datatype itself. Doing it this way * it will be clear which aggregates are windowed and which non-windowed, so we can avoid errors running a windowed aggregate in a non-windowed context * it also allows us to avoid executing two functions when the windowed function is very simple - denserank() for example just returns the number of rows seen so far in the window. Denserank is fairly simple CREATE AGGREGATE denserank() (sfunc = incrementstype = bigintinitcond = 0 ) rank() is fairly complex because the state data must track 3 things: * the number of tuples seen so far (bigint) * the value of the last tuple seen (anyelement) * the rank of the last tuple seen (bigint) sfunc would compare the new value with the last value, if they match then we return the rank of the last tuple. If they don't match then we set the stored value and rank, then return the number of tuples seen so far as the rank. > That would make RANK() a normal aggrgate which returns the number of > distinct values seen so far (assuming input is ordered) and > ROW_NUMBER() is just an alias for COUNT(). > I hope this is clear, let me know if it doesn't make sense. -- Simon Riggs www.2ndQuadrant.comPostgreSQL Training, Services and Support
pgsql-hackers by date: