Re: wip: functions median and percentile - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: wip: functions median and percentile |
Date | |
Msg-id | AANLkTikGt8Z1dEOShTmwCCCOCJvmO94UCnYn9ZLCrbi2@mail.gmail.com Whole thread Raw |
In response to | Re: wip: functions median and percentile (David Fetter <david@fetter.org>) |
Responses |
Re: wip: functions median and percentile
|
List | pgsql-hackers |
2010/9/23 David Fetter <david@fetter.org>: > On Thu, Sep 23, 2010 at 08:27:38PM +0200, Pavel Stehule wrote: >> 2010/9/23 Hitoshi Harada <umi.tanuki@gmail.com>: >> > 2010/9/23 Pavel Stehule <pavel.stehule@gmail.com>: >> >> Hello >> >> >> >> 2010/9/22 Hitoshi Harada <umi.tanuki@gmail.com>: >> >>> 2010/9/22 Pavel Stehule <pavel.stehule@gmail.com>: >> >>>> Hello >> >>>> >> >>>> I found probably hard problem in cooperation with window functions :( >> >> >> >> maybe I was confused. I found a other possible problems. >> >> >> >> The problem with median function is probably inside a final function >> >> implementation. Actually we request possibility of repetitive call of >> >> final function. But final function call tuplesort_end function and >> >> tuplesort_performsort. These function changes a state of tuplesort. >> >> The most basic question is "who has to call tuplesort_end function and >> >> when? >> > >> > Reading the comment in array_userfuncs.c, array_agg_finalfn() doesn't >> > clean up its internal state at all and tells it's the executor's >> > responsibility to clear memory. It is allowed since ArrayBuildState is >> > only in-memory state. In the other hand, TupleSort should be cleared >> > by calling tuplesort_end() if it has tapeset member (on file based >> > sort) to close physical files. >> > >> > So 2 or 3 ways to go in my mind: >> >> it is little bit worse - we cannot to call tuplesort_performsort repetitive. >> >> > >> > 1. call tuplesort_begin_datum with INT_MAX workMem rather than the >> > global work_mem, to avoid it spills out sort state to files. It may >> > sounds dangerous, but actually memory exhausting can happen in >> > array_agg() as well. >> > >> > 2. add TupleSort an argument that tells not to use file at all. This >> > results in the same as #1 but more generic approach. >> > >> > 3. don't use tuplesort in median() but implement its original sort >> > management. This looks quite redundant and like maintenance problem. >> > >> > #2 sounds like the best in generic and consistent way. The only point >> > is whether the change is worth for implementing median() as it's very >> > system-wide common fundamentals. >> > >> > Other options? >> >> #4 block median under window clause >> >> #5 use a C array instead tuplesort under window clause. It is very >> unpractical to use a windows clauses with large datasets, so it should >> not be a problem. More, this can be very quick, because for C array we >> can use a qsort function. >> >> Now I prefer #5 - it can be fast for using inside windows clause and >> safe when window clause will not be used. > > If there's some way to do this using the same code in the windowing > and non-windowing case, that would be much, much better from an > architectural point of view. Single Point of Truth and all that. We can have a median with support a window clause, but limited to work_mem, or we can have a unlimited median, but without window clause. I think, I am able to minimalize a code duplicity - just to define some envelope over tuplesort. The unique code isn't possible there - minimally now we have a two variants - one for numeric result and second for double. But it is usual - try to look how much AVG functions are in core. Regards Pavel > > Cheers, > David. > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate >
pgsql-hackers by date: