Re: introduction of WIP window function patch - Mailing list pgsql-hackers
From | H.Harada |
---|---|
Subject | Re: introduction of WIP window function patch |
Date | |
Msg-id | e08cc0400807060139m5fedd73buea68ed6ba508a95f@mail.gmail.com Whole thread Raw |
In response to | Re: introduction of WIP window function patch (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: introduction of WIP window function patch
|
List | pgsql-hackers |
2008/7/6 Simon Riggs <simon@2ndquadrant.com>: >> I think there are two types of functions for windowed mode. >> - windowed aggregate >> this type of function is exactly same as normal aggregate. So we use >> functions that have been in pgsql already. Actually in my patch above, >> I didn't introduce any new function. This type of function includes >> simply sum(), avg(), etc. which returns same values on a partition or >> a window frame. >> >> - windowed function >> this is the NEW type of function. I guess we should add a new function >> type to pgsql. This type of function includes rank(), rank_dense(), >> row_number(), etc. Windowed functions returns different values per >> tuple. >> >> The difference between two types is if the function returns the same >> value during a partition or different values. >> >> So, windowed aggregate and normal aggregate overlap each other. How >> you know which one is that you see OVER clause in SQL just after the >> function call. When you see OVER after func(), and pg_proc says it's >> an aggregate, it's a windowed aggregate. Otherwise, it's a windowed >> function. >> >> If I misunderstood about those definitions please correct me. > > Yes, I understand that and I think Martijn does also. > > I've done some thinking and rooting around on this and I think I have a > different proposal for you, different to what we just discussed. > > SQL2008 specifies window functions as > > * rank functions > * distribution functions: percent_rank() and cume_dist() > * rownumber() > * ntile() > * lead() and lag() > * first, last and n-th value functions > * inverse distribution functions (similar to n-th value, based upon > distribution function results) > > plus window aggregate functions (the normal aggregates COUNT, SUM etc) > > Now looking through all of those, I don't see *any* window functions > that need access to different datatypes, or actually need to see the > values of the attributes. > > The normal aggregates work with windows identically to the way they do > without windows, so no change needed there. > > AFAICS we could define all of the non-aggregate window functions on the > above list *without* defining them as functions in pg_proc. That would > be a benefit because the window functions are very powerful and we'd > need to give them access to any/all tuples in the window. > > So that would mean we don't provide a mechanism for user-defined > windowed aggregate functions at all. Which solves the discussion about > how to pass generic info through to them (at least long enough to get > the first implementation done). > > We do already have such functions in code, e.g. greatest(). Sure they > need to be defined in code, but we don't need to come up with a generic > API for them. > > If you disagree, think about how we'd implement lag() or ntile() and > what info we'd need to pass them. Well, your idea is one of considerable choices. But I like pgsql's extensibility that enables pgsql more powerful DBMS. So, I design it as you propsed though trying to unify the function form somehow. Just idea, how about pass window object to a function? We'll provide window operation API then in the function you take window object through fcinfo: Datum func(PG_FUNCTION_ARGS) { Datum v; WindowObject w = get_window(fcinfo); HeapTuple htup_current = window_current_row(w); HeapTuple htup_prev = window_preceding(w,1); /* do something */ PG_RETURN_DATUM(v); } so that a function access whole the window. APIs include - current row - preceding row - following row - current key - preceding key - following key - iterate for the window where "key" means ORDER BY values in OVER clause. Fortunately, my patch uses tuplestore/tuplesort to create window, which allows random access operation such above. Is there security/performance issue about this? -- Hitoshi Harada
pgsql-hackers by date: