Re: wip: functions median and percentile - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: wip: functions median and percentile |
Date | |
Msg-id | AANLkTi==JtfXUMBmhrmHyKe-25ieD5etJfeDpWAL8vnk@mail.gmail.com Whole thread Raw |
In response to | Re: wip: functions median and percentile (Hitoshi Harada <umi.tanuki@gmail.com>) |
Responses |
Re: wip: functions median and percentile
Re: wip: functions median and percentile |
List | pgsql-hackers |
Hello I found probably hard problem in cooperation with window functions :( tuplesort_begin_datum creates child context inside aggcontext. This context is used for tuplestore. But when this function is called from WindowAgg_Aggregates context - someone drops all child context every iteration, and then tuplestore state is invalid. For this moment we can block using a median function as window function, but it should be solved better - if it is possible - I don't see inside window function implementation. 2010/9/20 Hitoshi Harada <umi.tanuki@gmail.com>: > 2010/8/19 Pavel Stehule <pavel.stehule@gmail.com>: >> Hello >> >> I am sending a prototype implementation of functions median and >> percentile. This implementation is very simple and I moved it to >> contrib for this moment - it is more easy maintainable. Later I'll >> move it to core. > > I've reviewed this patch. > > * The patch can apply cleanly and make doesn't print any errors nor > warnings. But it doesn't touch contrib/Makefile so I had to make by > changing dir to contrib/median. > fixed > * Cosmetic coding style should be more appropriate, including trailing > white spaces and indentation positions. > can you specify where, please? > * Since these two aggregates use tuplesort inside it, there're > possible risk to cause out of memory in normal use case. I don't think > this fact is critical, but at least some notation should be referred > in docs. > it is same as windows function, no? So the risk is same. > * It doesn't contain any document nor regression tests. > yes, it is my fault, some median regress test appended, documentation I am will sending later - resp. if you agree with current form of patch, I'll finalize this patch and remove "median" to core. I put these functions to contrib just for simple testing of proof concept. > * They should be callable in window function context; for example > > contrib_regression=# select median(a) over (order by a) from t1; > ERROR: invalid tuplesort state > > or at least user-friend message should be printed. > the problem is in windows function implementation - partially fixed - blocked from windows context > * The returned type is controversy. median(int) returns float8 as the > patch intended, but avg(int) returns numeric. AFAIK only avg(float8) > returns float8. > fixed > * percentile() is more problematic; first, the second argument for the > aggregate takes N of "N%ile" as int, like 50 if you want 50%ile, but > it doesn't care about negative values or more than 100. In addition, > the second argument is taken at the first non-NULL value of the first > argument, but the second argument is semantically constant. For > example, for (1.. 10) value of a in table t1, > yes, I understand - I don't have a problem to modify it. Just this has same behave as string_agg. This is again deeper problem - we cannot ensure so some parameter of aggregation function will be a constant. - so it cannot be well solved now :( Have you some idea? There isn't any tool for checking if some parameter is constant or not. I removed percentile now. > contrib_regression=# select percentile(a, a * 10 order by a) from t1; > percentile > ------------ > 1 > (1 row) > > contrib_regression=# select percentile(a, a * 10 order by a desc) from t1; > percentile > ------------ > 10 > (1 row) > > and if the argument comes from the subquery which doesn't contain > ORDER BY clause, you cannot predict the result. > > That's all of my quick review up to now. > > Regards, > > -- > Hitoshi Harada >
Attachment
pgsql-hackers by date: