Re: Combining Aggregates - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Combining Aggregates |
Date | |
Msg-id | CA+TgmoaDELi70g1o7w8PLoL-hKD=i6UK22RgNOGewvr6Sn7S-g@mail.gmail.com Whole thread Raw |
In response to | Re: Combining Aggregates (David Rowley <david.rowley@2ndquadrant.com>) |
Responses |
Re: Combining Aggregates
Re: Combining Aggregates |
List | pgsql-hackers |
[ rewinding to here from the detour I led us on ] On Mon, Dec 21, 2015 at 4:02 AM, David Rowley <david.rowley@2ndquadrant.com> wrote: > Now, there has been talk of this previously, on various threads, but I don't > believe any final decisions were made on how exactly it should be done. At > the moment I plan to make changes as follows: > > Add 3 new columns to pg_aggregate, aggserialfn, aggdeserialfn and > aggserialtype These will only be required when aggtranstype is INTERNAL. Check. > Perhaps we should disallow CREATE AGGREAGET from accepting them for any > other type... Well, we should definitely not accept them and have them be meaningless. We should either reject them or accept them and then use them. I can't immediately think of a reason to serialize one non-internal type as another, but maybe there is one. > The return type of aggserialfn should be aggserialtype, and it > should take a single parameter of aggtranstype. aggdeserialfn will be the > reverse of that. Check. > Add a new bool field to nodeAgg's state named serialStates. If this is field > is set to true then when we're in finalizeAgg = false mode, we'll call the > serialfn on the agg state instead of the finalfn. This will allow the > serialized state to be stored in the tuple and sent off to the main backend. > The combine agg node should also be set to serialStates = true, so that it > knows to deserialize instead of just assuming that the agg state is of type > aggtranstype. I'm not quite sure, but it sounds like you might be overloading serialStates with two different meanings here. > I believe this should allow us to not cause any performance regressions by > moving away from INTERNAL agg states. It should also be very efficient for > internal states such as Int8TransTypeData, as this struct merely has 2 int64 > fields which should be very simple to stuff into a bytea varlena type. We > don't need to mess around converting the ->count and ->sum into a strings or > anything. I think it would be more user-friendly to emit these as 2-element integer arrays rather than bytea. Sure, bytea is fine when PostgreSQL is talking to itself, but if you are communicating with an external system, it's a different situation. If the remote system is PostgreSQL then you are again OK, except for the data going over the wire being incomprehensible and maybe byte-order-dependent, but what if you want some other database system to do partial aggregation and then further aggregate the result? You don't want the intermediate state to be some kooky thing that only another PostgreSQL database has a chance of generating correctly. > Then in order for the planner to allow parallel aggregation all aggregates > must: > > Not have a DISTINCT or ORDER BY clause > Have a combinefn > If aggtranstype = INTERNAL, must have a aggserialfn and aggdeserialfn. > > We can relax the requirement on 3 if we're using 2-stage aggregation, but > not parallel aggregation. When would we do that? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: