Thread: introduction of WIP window function patch

introduction of WIP window function patch

From

H.Harada

Date:

05 July 2008, 07:04:38

Hi,

As I proposed a month before, I am working on window function.
Although this work is at quite early step, I would like to introduce
it since a part of it have been finished. If you can afford and are
interested in it, please review the document and patch, or compile the
applied source to execute an attached sample SQL.

http://umitanuki.net/pgsql/wfv01/design.html

Currently, only aggregation over window does work. I am planning to
work for the combination of window and normal aggregation from now on,
which I guess I can manage to do.

The problem is, as written in the "Things to discussed" section of the
document, how you define window functions (e.g. RANK()). My idea is to
treat them as specialized functions such as SET OF functions and mark
it in pg_proc. But this doesn't resolve RANK() boundary problem.

I am so happy with any kind of comments, reviews or critiques.

Regards,

-- 
Hitoshi Harada

Re: introduction of WIP window function patch

From

Martijn van Oosterhout

Date:

05 July 2008, 11:20:36

On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote:
> Hi,
>
> As I proposed a month before, I am working on window function.

Very nice!

> http://umitanuki.net/pgsql/wfv01/design.html
>
> The problem is, as written in the "Things to discussed" section of the
> document, how you define window functions (e.g. RANK()). My idea is to
> treat them as specialized functions such as SET OF functions and mark
> it in pg_proc. But this doesn't resolve RANK() boundary problem.

Actually, I would make RANK() and ROW_NUMBER() act more like
aggregates. ISTM you have two kinds of window functions:

- aggregation: a result is calculated over a set and the result copied across all the rows.
- order depenadant: same as above, but the result is different for each row.

I think you could make the latter work using the current aggregation
setup, just by calling the final_func for each row rather than just
once at the end.

That would make RANK() a normal aggrgate which returns the number of
distinct values seen so far (assuming input is ordered) and
ROW_NUMBER() is just an alias for COUNT().

I hope this is clear, let me know if it doesn't make sense.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: introduction of WIP window function patch

From

Simon Riggs

Date:

05 July 2008, 14:16:11

On Sat, 2008-07-05 at 16:20 +0200, Martijn van Oosterhout wrote:
> On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote:

> > http://umitanuki.net/pgsql/wfv01/design.html
> > 
> > The problem is, as written in the "Things to discussed" section of the
> > document, how you define window functions (e.g. RANK()). My idea is to
> > treat them as specialized functions such as SET OF functions and mark
> > it in pg_proc. But this doesn't resolve RANK() boundary problem.
> 
> Actually, I would make RANK() and ROW_NUMBER() act more like
> aggregates. ISTM you have two kinds of window functions:
> 
> - aggregation: a result is calculated over a set and the result copied
>   across all the rows.
> - order depenadant: same as above, but the result is different for each
>   row.
> 
> I think you could make the latter work using the current aggregation
> setup, just by calling the final_func for each row rather than just
> once at the end.

AFAICS there's no overlap between windowed aggregates and normal
aggregates, so we can different infrastructure for each. I like the
suggestion of doing it very similarly to current aggregates, but I would
introduce a new function hook for windowed aggregates, wfunc.

i.e. to create a windowed aggregate you would do

CREATE AGGREGATE window_func() 
(sfunc = ...stype = ...wfunc = ...initcond = 
)

For each row we would execute the transition function (sfunc) then, if
there is a window function (wfunc) then we call that to return a value
for this tuple (so in that case we execute two functions per tuple in
the window). If wfunc is not set then we return the transition datatype
itself.

Doing it this way

* it will be clear which aggregates are windowed and which non-windowed,
so we can avoid errors running a windowed aggregate in a non-windowed
context

* it also allows us to avoid executing two functions when the windowed
function is very simple - denserank() for example just returns the
number of rows seen so far in the window.

Denserank is fairly simple

CREATE AGGREGATE denserank()
(sfunc = incrementstype = bigintinitcond = 0
)

rank() is fairly complex because the state data must track 3 things:
* the number of tuples seen so far (bigint)
* the value of the last tuple seen (anyelement)
* the rank of the last tuple seen (bigint)

sfunc would compare the new value with the last value, if they match
then we return the rank of the last tuple. If they don't match then we
set the stored value and rank, then return the number of tuples seen so
far as the rank.

> That would make RANK() a normal aggrgate which returns the number of
> distinct values seen so far (assuming input is ordered) and
> ROW_NUMBER() is just an alias for COUNT().

> I hope this is clear, let me know if it doesn't make sense.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: introduction of WIP window function patch

From

H.Harada

Date:

05 July 2008, 15:40:57

Hi,

2008/7/6 Simon Riggs <simon@2ndquadrant.com>:
>
> On Sat, 2008-07-05 at 16:20 +0200, Martijn van Oosterhout wrote:
>> On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote:
>
>> > http://umitanuki.net/pgsql/wfv01/design.html
>> >
>> > The problem is, as written in the "Things to discussed" section of the
>> > document, how you define window functions (e.g. RANK()). My idea is to
>> > treat them as specialized functions such as SET OF functions and mark
>> > it in pg_proc. But this doesn't resolve RANK() boundary problem.
>>
>> Actually, I would make RANK() and ROW_NUMBER() act more like
>> aggregates. ISTM you have two kinds of window functions:
>>
>> - aggregation: a result is calculated over a set and the result copied
>>   across all the rows.
>> - order depenadant: same as above, but the result is different for each
>>   row.
>>
>> I think you could make the latter work using the current aggregation
>> setup, just by calling the final_func for each row rather than just
>> once at the end.
>
> AFAICS there's no overlap between windowed aggregates and normal
> aggregates, so we can different infrastructure for each. I like the
> suggestion of doing it very similarly to current aggregates, but I would
> introduce a new function hook for windowed aggregates, wfunc.

I think there are two types of functions for windowed mode.
- windowed aggregate
this type of function is exactly same as normal aggregate. So we use
functions that have been in pgsql already. Actually in my patch above,
I didn't introduce any new function. This type of function includes
simply sum(), avg(), etc. which returns same values on a partition or
a window frame.

- windowed function
this is the NEW type of function. I guess we should add a new function
type to pgsql. This type of function includes rank(), rank_dense(),
row_number(), etc. Windowed functions returns different values per
tuple.

The difference between two types is if the function returns the same
value during a partition or different values.

So, windowed aggregate and normal aggregate overlap each other. How
you know which one is that you see OVER clause in SQL just after the
function call. When you see OVER after func(), and pg_proc says it's
an aggregate, it's a windowed aggregate. Otherwise, it's a windowed
function.

If I misunderstood about those definitions please correct me.

> Denserank is fairly simple
>
> CREATE AGGREGATE denserank()
> (
>        sfunc = increment
>        stype = bigint
>        initcond = 0
> )

I think this is ROW_NUMBER(), not DENSE_RANK(), isn't it?

> rank() is fairly complex because the state data must track 3 things:
> * the number of tuples seen so far (bigint)
> * the value of the last tuple seen (anyelement)
> * the rank of the last tuple seen (bigint)
>
> sfunc would compare the new value with the last value, if they match
> then we return the rank of the last tuple. If they don't match then we
> set the stored value and rank, then return the number of tuples seen so
> far as the rank.

Yeah, I think RANK() needs to know some or all of tuple of current
row. But it seems not to take any argument, which is the "boundary
problem" I said. Definitely RANK() or windowed function should know
about current tuple so that when to increment rank. But how? As
explicit argument? or specialized method?


-- 
Hitoshi Harada

Re: introduction of WIP window function patch

From

H.Harada

Date:

05 July 2008, 15:47:03

2008/7/5 Martijn van Oosterhout <kleptog@svana.org>:
> On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote:
>> Hi,
>>
>> As I proposed a month before, I am working on window function.
>
> Very nice!
>
>> http://umitanuki.net/pgsql/wfv01/design.html
>>
>> The problem is, as written in the "Things to discussed" section of the
>> document, how you define window functions (e.g. RANK()). My idea is to
>> treat them as specialized functions such as SET OF functions and mark
>> it in pg_proc. But this doesn't resolve RANK() boundary problem.
>
> Actually, I would make RANK() and ROW_NUMBER() act more like
> aggregates. ISTM you have two kinds of window functions:
>
> - aggregation: a result is calculated over a set and the result copied
>  across all the rows.
> - order depenadant: same as above, but the result is different for each
>  row.

So I agree the definition of these two types.

> I think you could make the latter work using the current aggregation
> setup, just by calling the final_func for each row rather than just
> once at the end.

How do you know which type of two above is used in the same SQL syntax?


-- 
Hitoshi Harada

Re: introduction of WIP window function patch

From

Simon Riggs

Date:

05 July 2008, 17:07:32

On Sun, 2008-07-06 at 03:40 +0900, H.Harada wrote:
> Hi,
> 
> 2008/7/6 Simon Riggs <simon@2ndquadrant.com>:
> >
> > On Sat, 2008-07-05 at 16:20 +0200, Martijn van Oosterhout wrote:
> >> On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote:
> >
> >> > http://umitanuki.net/pgsql/wfv01/design.html
> >> >
> >> > The problem is, as written in the "Things to discussed" section of the
> >> > document, how you define window functions (e.g. RANK()). My idea is to
> >> > treat them as specialized functions such as SET OF functions and mark
> >> > it in pg_proc. But this doesn't resolve RANK() boundary problem.
> >>
> >> Actually, I would make RANK() and ROW_NUMBER() act more like
> >> aggregates. ISTM you have two kinds of window functions:
> >>
> >> - aggregation: a result is calculated over a set and the result copied
> >>   across all the rows.
> >> - order depenadant: same as above, but the result is different for each
> >>   row.
> >>
> >> I think you could make the latter work using the current aggregation
> >> setup, just by calling the final_func for each row rather than just
> >> once at the end.
> >
> > AFAICS there's no overlap between windowed aggregates and normal
> > aggregates, so we can different infrastructure for each. I like the
> > suggestion of doing it very similarly to current aggregates, but I would
> > introduce a new function hook for windowed aggregates, wfunc.
> 
> I think there are two types of functions for windowed mode.
> - windowed aggregate
> this type of function is exactly same as normal aggregate. So we use
> functions that have been in pgsql already. Actually in my patch above,
> I didn't introduce any new function. This type of function includes
> simply sum(), avg(), etc. which returns same values on a partition or
> a window frame.
> 
> - windowed function
> this is the NEW type of function. I guess we should add a new function
> type to pgsql. This type of function includes rank(), rank_dense(),
> row_number(), etc. Windowed functions returns different values per
> tuple.
> 
> The difference between two types is if the function returns the same
> value during a partition or different values.
> 
> So, windowed aggregate and normal aggregate overlap each other. How
> you know which one is that you see OVER clause in SQL just after the
> function call. When you see OVER after func(), and pg_proc says it's
> an aggregate, it's a windowed aggregate. Otherwise, it's a windowed
> function.
> 
> If I misunderstood about those definitions please correct me.

Yes, I understand that and I think Martijn does also.

I've done some thinking and rooting around on this and I think I have a
different proposal for you, different to what we just discussed.

SQL2008 specifies window functions as

* rank functions
* distribution functions: percent_rank() and cume_dist()
* rownumber()
* ntile()
* lead() and lag()
* first, last and n-th value functions
* inverse distribution functions (similar to n-th value, based upon
distribution function results)

plus window aggregate functions (the normal aggregates COUNT, SUM etc)

Now looking through all of those, I don't see *any* window functions
that need access to different datatypes, or actually need to see the
values of the attributes. 

The normal aggregates work with windows identically to the way they do
without windows, so no change needed there.

AFAICS we could define all of the non-aggregate window functions on the
above list *without* defining them as functions in pg_proc. That would
be a benefit because the window functions are very powerful and we'd
need to give them access to any/all tuples in the window.

So that would mean we don't provide a mechanism for user-defined
windowed aggregate functions at all. Which solves the discussion about
how to pass generic info through to them (at least long enough to get
the first implementation done).

We do already have such functions in code, e.g. greatest(). Sure they
need to be defined in code, but we don't need to come up with a generic
API for them.

If you disagree, think about how we'd implement lag() or ntile() and
what info we'd need to pass them.

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: introduction of WIP window function patch

From

H.Harada

Date:

06 July 2008, 05:39:18

2008/7/6 Simon Riggs <simon@2ndquadrant.com>:
>> I think there are two types of functions for windowed mode.
>> - windowed aggregate
>> this type of function is exactly same as normal aggregate. So we use
>> functions that have been in pgsql already. Actually in my patch above,
>> I didn't introduce any new function. This type of function includes
>> simply sum(), avg(), etc. which returns same values on a partition or
>> a window frame.
>>
>> - windowed function
>> this is the NEW type of function. I guess we should add a new function
>> type to pgsql. This type of function includes rank(), rank_dense(),
>> row_number(), etc. Windowed functions returns different values per
>> tuple.
>>
>> The difference between two types is if the function returns the same
>> value during a partition or different values.
>>
>> So, windowed aggregate and normal aggregate overlap each other. How
>> you know which one is that you see OVER clause in SQL just after the
>> function call. When you see OVER after func(), and pg_proc says it's
>> an aggregate, it's a windowed aggregate. Otherwise, it's a windowed
>> function.
>>
>> If I misunderstood about those definitions please correct me.
>
> Yes, I understand that and I think Martijn does also.
>
> I've done some thinking and rooting around on this and I think I have a
> different proposal for you, different to what we just discussed.
>
> SQL2008 specifies window functions as
>
> * rank functions
> * distribution functions: percent_rank() and cume_dist()
> * rownumber()
> * ntile()
> * lead() and lag()
> * first, last and n-th value functions
> * inverse distribution functions (similar to n-th value, based upon
> distribution function results)
>
> plus window aggregate functions (the normal aggregates COUNT, SUM etc)
>
> Now looking through all of those, I don't see *any* window functions
> that need access to different datatypes, or actually need to see the
> values of the attributes.
>
> The normal aggregates work with windows identically to the way they do
> without windows, so no change needed there.
>
> AFAICS we could define all of the non-aggregate window functions on the
> above list *without* defining them as functions in pg_proc. That would
> be a benefit because the window functions are very powerful and we'd
> need to give them access to any/all tuples in the window.
>
> So that would mean we don't provide a mechanism for user-defined
> windowed aggregate functions at all. Which solves the discussion about
> how to pass generic info through to them (at least long enough to get
> the first implementation done).
>
> We do already have such functions in code, e.g. greatest(). Sure they
> need to be defined in code, but we don't need to come up with a generic
> API for them.
>
> If you disagree, think about how we'd implement lag() or ntile() and
> what info we'd need to pass them.

Well, your idea is one of considerable choices. But I like pgsql's
extensibility that enables pgsql more powerful DBMS. So, I design it
as you propsed though trying to unify the function form somehow.

Just idea, how about pass window object to a function? We'll provide
window operation API then in the function you take window object
through fcinfo:

Datum func(PG_FUNCTION_ARGS)
{ Datum v; WindowObject w = get_window(fcinfo); HeapTuple htup_current = window_current_row(w); HeapTuple htup_prev =
window_preceding(w,1); /* do something */ PG_RETURN_DATUM(v);
 
}

so that a function access whole the window. APIs include
- current row
- preceding row
- following row
- current key
- preceding key
- following key
- iterate for the window
where "key" means ORDER BY values in OVER clause. Fortunately, my
patch uses tuplestore/tuplesort to create window, which allows random
access operation such above. Is there security/performance issue about
this?

-- 
Hitoshi Harada

Re: introduction of WIP window function patch

From

Simon Riggs

Date:

06 July 2008, 06:17:09

On Sun, 2008-07-06 at 17:39 +0900, H.Harada wrote:

> Is there security/performance issue about this?

Performance, yes. 

If we need access to more rows than will fit within work_mem we have a
problem and will need to restart sort. Giving random access to all
tuples in the current window, whatever its size would be very costly,
which is why we have optimized that access for merge joins. So we need
to know how far back access is required, if any - think of that as an
"access window" definition.

For example, 
rownumber() doesn't need access to prior tuples at all.
lag(col, 1) requires access only to the prior row of the current window
ntile() needs to know the size of the window before we begin processing 

In some cases the window itself is redefined for each tuple, e.g. 
avg() over (order by ... range between 5 preceeding and current row) 

In that case, we want the tuples no longer in the window to scroll out
of memory. We already have the mechanism for this: a dynamic tuplestore
(materialize node) in front of the tuplesort (sort node).

Most of that tuning can be done after the initial implementation, but my
point here is this: there needs to be a mechanism by which the window
access requirements can be specified for a function so the executor can
understand how to optimise access. So if you go the route of defining an
extensible API then you must include this also.

I know I rattle on about performance, but with window functions it will
be critical to their usability to have them perform well. We can already
do the types of analysis that window functions allow, it just requires
hand written procedures to do it. So the window functions must perform
acceptably well against very large tables (i.e. much bigger than
memory).

-- Simon Riggs           www.2ndQuadrant.comPostgreSQL Training, Services and Support

Re: introduction of WIP window function patch

From

H.Harada

Date:

06 July 2008, 07:15:31

2008/7/6 Simon Riggs <simon@2ndquadrant.com>:
>
> On Sun, 2008-07-06 at 17:39 +0900, H.Harada wrote:
>
>> Is there security/performance issue about this?
>
> Performance, yes.
>
> If we need access to more rows than will fit within work_mem we have a
> problem and will need to restart sort. Giving random access to all
> tuples in the current window, whatever its size would be very costly,
> which is why we have optimized that access for merge joins. So we need
> to know how far back access is required, if any - think of that as an
> "access window" definition.

Is this about tuplesort, not tuplestore? At a glance, tuplestore seems
to be able to access tuples randomly without any performance problem,
just fitting its pointer. So I thought planner should always insert
Sort node before Window node to let Window not to sort, as I explained
in the document. But anyways, I understand some kind of optimization
mechanism to scroll in/out window is needed.

> I know I rattle on about performance, but with window functions it will
> be critical to their usability to have them perform well. We can already
> do the types of analysis that window functions allow, it just requires
> hand written procedures to do it. So the window functions must perform
> acceptably well against very large tables (i.e. much bigger than
> memory).

I know, the probable use case is against large data set. There's no
reason to add this feature if it is slower than self joins or other
kludge methods.

-- 
Hitoshi Harada

Re: introduction of WIP window function patch

From

David Fetter

Date:

17 July 2008, 04:29:06

On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote:
> Hi,
> 
> As I proposed a month before, I am working on window function.
> Although this work is at quite early step, I would like to introduce
> it since a part of it have been finished. If you can afford and are
> interested in it, please review the document and patch, or compile the
> applied source to execute an attached sample SQL.
> 
> http://umitanuki.net/pgsql/wfv01/design.html
> 
> Currently, only aggregation over window does work. I am planning to
> work for the combination of window and normal aggregation from now on,
> which I guess I can manage to do.
> 
> The problem is, as written in the "Things to discussed" section of the
> document, how you define window functions (e.g. RANK()). My idea is to
> treat them as specialized functions such as SET OF functions and mark
> it in pg_proc. But this doesn't resolve RANK() boundary problem.
> 
> I am so happy with any kind of comments, reviews or critiques.
> 
> Regards,

Many people have been waiting for years for this functionality.
Thanks so much for doing this.

When will you have another patch that applies against CVS HEAD?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: introduction of WIP window function patch

From

H.Harada

Date:

17 July 2008, 06:21:07

2008/7/17 David Fetter <david@fetter.org>:
> On Sat, Jul 05, 2008 at 07:04:29PM +0900, H.Harada wrote:
>> Hi,
>>
>> As I proposed a month before, I am working on window function.
>> Although this work is at quite early step, I would like to introduce
>> it since a part of it have been finished. If you can afford and are
>> interested in it, please review the document and patch, or compile the
>> applied source to execute an attached sample SQL.
>>
>> http://umitanuki.net/pgsql/wfv01/design.html
>>
>> Currently, only aggregation over window does work. I am planning to
>> work for the combination of window and normal aggregation from now on,
>> which I guess I can manage to do.
>>
>> The problem is, as written in the "Things to discussed" section of the
>> document, how you define window functions (e.g. RANK()). My idea is to
>> treat them as specialized functions such as SET OF functions and mark
>> it in pg_proc. But this doesn't resolve RANK() boundary problem.
>>
>> I am so happy with any kind of comments, reviews or critiques.
>>
>> Regards,
>
> Many people have been waiting for years for this functionality.
> Thanks so much for doing this.
>
> When will you have another patch that applies against CVS HEAD?
>
> Cheers,
> David.
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>

As you see the patch is not completed yet, lacking RANK() etc. I am
now struggling for this design which I assume will finish by the end
of this month. Since then, I will apply it against HEAD. So another
patch to -hackers will be around first half of August, or by the next
Commit Fest at worst.

There must be more discussion about the RANK() design after the post.
But I hope it will be in 8.4.

Regards,


-- 
Hitoshi Harada