Re: Combining Aggregates - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Combining Aggregates |
Date | |
Msg-id | 00ab52f0-5cd9-a667-5c3d-b0917aef7677@2ndquadrant.com Whole thread Raw |
In response to | Re: Combining Aggregates (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Combining Aggregates
|
List | pgsql-hackers |
Hi, On 03/16/2016 12:03 PM, David Rowley wrote: > On 16 March 2016 at 10:34, David Rowley <david.rowley@2ndquadrant.com> wrote: >> If Haribabu's patch does all that's required for the numerical >> aggregates for floating point types then the status of covered >> aggregates is (in order of pg_aggregate.h): >> >> * AVG() complete coverage >> * SUM() complete coverage >> * MAX() complete coverage >> * MIN() complete coverage >> * COUNT() complete coverage >> * STDDEV + friends complete coverage >> * regr_*,covar_pop,covar_samp,corr not touched these. >> * bool*() complete coverage >> * bitwise aggs. complete coverage >> * Remaining are not touched. I see diminishing returns with making >> these parallel for now. I think I might not be worth pushing myself >> any harder to make these ones work. >> >> Does what I have done + floating point aggs from Haribabu seem >> reasonable for 9.6? > > I've attached a series of patches. Thanks. Haven't really done much testing of this, aside from reading through the patches and "it compiles". > > Patch 1: > This is the parallel aggregate patch, not intended for review here. > However, all further patches are based on this, and this adds the > required planner changes to make it possible to test patches 2 and 3. > > Patch 2: > This adds the serial/deserial aggregate infrastructure, pg_dump > support, CREATE AGGREGATE changes, and nodeAgg.c changes to have it > serialise and deserialise aggregate states when instructed to do so. > > Patch 3: > This adds a boat load of serial/deserial functions, and combine > functions for most of the built-in numerical aggregate functions. It > also contains some regression tests which should really be in patch 2, > but I with patch 2 there's no suitable serialisation or > de-serialisation functions to test CREATE AGGREGATE with. I think > having them here is ok, as patch 2 is quite useless without patch 3 > anyway. I don't see how you could move the tests into #2 when the functions are defined in #3? IMHO this is the right place for the regression tests. > > Another thing to note about this patch is that I've gone and created > serial/de-serial functions for when PolyNumAggState both require > sumX2, and don't require sumX2. I had thought about perhaps putting an > extra byte in the serial format to indicate if a sumX2 is included, > but I ended up not doing it this way. I don't really want these serial > formats getting too complex as we might like to do fun things like > pass them along to sharded servers one day, so it might be nice to > keep them simple. Hmmm, I've noticed that while eyeballing the diffs, and I'm not sure if it's worth the additional complexity at this point. I mean, one byte is hardly going to make a measurable difference - we're probably wasting more than that due to alignment, for example. As you've mentioned sharded servers - how stable should the serialized format be? I've assumed it to be transient, i.e. in the extreme case it might change after restarting a database - for the parallel aggregate that's clearly sufficient. But if we expect this to eventually work in a sharded environment, that's going to be way more complicated. I guess in these cases we could rely on implicit format versioning, via the major the version (doubt we'd tweak the format in a minor version anyway). I'm not sure the sharding is particularly useful argument at this point, considering we don't really know if the current format is actually a good match for that. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: