Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support |
Date | |
Msg-id | 162867790912202148j3381f9a2k52a3b697ab2201d8@mail.gmail.com Whole thread Raw |
In response to | Re: Proposal: Pre ordered aggregates, default ORDER BY clause for aggregates - median support (Andrew Gierth <andrew@tao11.riddles.org.uk>) |
Responses |
Re: Proposal: Pre ordered aggregates, default ORDER BY clause for
aggregates - median support
|
List | pgsql-hackers |
2009/12/20 Andrew Gierth <andrew@tao11.riddles.org.uk>: >>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes: > > > 2009/12/20 Tom Lane <tgl@sss.pgh.pa.us>: > >> I think that we've already expanded the capabilities of aggregates > >> a great deal for 8.5, and we should let it sit as-is for a release > >> or two and see what the real user demand is for additional > >> features. > >> > >> I'm particularly concerned by the fact that the feature set is > >> already far out in front of what the planner can optimize > >> effectively (e.g., there's no ability to combine the work when > >> multiple aggregates need the same sorted data). The more features > >> we add on speculation, the harder it's going to be to close that > >> gap. > > I absolutely agree with Tom here and for some quite specific reasons. > > An optimal (or at least more optimal than is currently possible) > implementation of median() on top of the ordered-agg code as it stands > requires additions to the aggregate function interface: the median agg > implementation would have to, as a minimum, know how many rows of > sorted input are available. In addition, it would be desirable for it > to have direct (and possibly bidirectional) access to the tuplesort. I was thinking about some new kind of aggregates based on tuple-store. > > Now, if we look at how ordered aggs ought to be optimized, it's clear > that the planner should take the ordering costs into account and > consider plans that order the input instead. Once you do this, then > there's no longer any pre-computed count or tuplesort object > available, so if you'd implemented a better median() before the > optimizations, you'd end up either having to forgo the optimization or > break the median() agg again; clearly not something we want. > Information about count of rows are not detected in planner time. Thishave to by available in any executor's sort node. Sothis value will be available every time - index scan is problem. Interesting is Greg's info about O(N) algorithms. Then we can implement median as classic aggregate. > Plus, a feature that I intentionally omitted from the ordered-aggs > patch is the ability to do ordered-aggs as window functions. This is > in the spec, but (a) there were conflicting patches for window > functions on the table and (b) in my opinion, much of the work needed > to implement ordered-agg-as-window-func in an effective manner is > dependent on doing more work on optimization first (or at least will > potentially become easier as a result of that work). > I fully agree, so agg(x order by x) needs some work more - so general design is premature. On second hand - any internal implementation of median will be faster, then currently used techniques. Pavel > So I think both the optimization issue and the window functions issue > would be best addressed before trying to build any additional features > on top of what we have so far. > > -- > Andrew (irc:RhodiumToad) >
pgsql-hackers by date: