Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [HACKERS] Partition-wise aggregation/grouping |
Date | |
Msg-id | CA+TgmoaR+AGqs=RLzp92_qWaRpHdJRbyqTvQN9SAL2qRrc0Mrw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Partition-wise aggregation/grouping (Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>) |
Responses |
Re: [HACKERS] Partition-wise aggregation/grouping
|
List | pgsql-hackers |
On Fri, Mar 16, 2018 at 1:50 PM, Ashutosh Bapat <ashutosh.bapat@enterprisedb.com> wrote: >> This patch also renames can_parallel_agg to >> can_partial_agg and removes the parallelism-specific bits from it. > > I think we need to update the comments in this function to use phrase > "partial aggregation" instead of "parallel aggregation". And I think > we need to change the conditions as well. For example if > parse->groupClause == NIL, why can't we do partial aggregation? This > is the classical case when we will need patial aggregation. Probably > we should test this with Jeevan's patches for partition-wise aggregate > to see if it considers partition-wise aggregate or not. I think the case where we have neither any aggregates nor a grouping clause is where we are doing SELECT DISTINCT. Something like SELECT COUNT(*) FROM ... is not this case because that has an aggregate. I'm sort of on the fence as to whether and how to update the comments. I agree that it's a little strange to leave the comments here referencing parallel aggregation when the function has been renamed to is_partial_agg(), but a simple search-and-replace doesn't necessarily improve the situation very much. Most notably, hasNonSerial is irrelevant for partial but non-parallel aggregation, but we still have the test because we haven't done the work to really do the right thing here, which is to separately track whether we can do parallel partial aggregation (either hasNonPartial or hasNonSerial is a problem) and non-parallel partial aggregation (only hasNonPartial is a problem). This needs a deeper reassessment, but I don't think that can or should be something we try to do right now. > OR When parse->groupingSets is true, I can see why we can't use > parallel query, but we can still compute partial aggregates. This > condition doesn't hurt since partition-wise aggregation bails out when > there are grouping sets, so it's not that harmful here. I haven't thought deeply about what will break when GROUPING SETS are in use, but it's not the purpose of this patch set to make them work where they didn't before. The point of hoisting the first two tests out of this function was just to avoid doing repeated work when partition-wise aggregate is in use. Those two tests could conceivably produce different results for different children, whereas the remaining tests won't give different answers. Let's not get distracted by the prospect of improving the tests. I suspect that's not anyway so simple to achieve as what you seem to be speculating here... >> I am sort of unclear whether we need/want GroupPathExtraData at all. >> What determines whether something gets passed via GroupPathExtraData >> or just as a separate argument? If we have a rule that stuff that is >> common to all child grouped rels goes in there and other stuff >> doesn't, or stuff postgres_fdw needs goes in there and other stuff >> doesn't, then that might be OK. But I'm not sure that there is such a >> rule in the v20 patches. > > We have a single FDW hook for all the upper relations and that hook > can not accept grouping specific arguments. Either we need a separate > FDW hook for grouping OR we need some way of passing upper relation > specific information down to an FDW. I think some FDWs and extensions > will be happy if we provide them readymade decisions for can_sort, > can_hash, can_partial_agg etc. It will be good if they don't have to > translate the grouping target and havingQual for every child twice, > once for core and second time in the FDW. In all it looks like we need > some structure to hold that information so that we can pass it down > the hook. I am fine with two structures one variable and other > invariable. An upper operation can have one of them or both. I'm fine with using a structure to bundle details that need to be sent to the FDW, but why should the FDW need to know about can_sort/can_hash? I suppose if it needs to know about can_partial_agg then it doesn't really cost anything to pass through all the flags, but I doubt whether the FDW has any use for the others. Anyway, if that's the goal, let's just make sure that the set of things we're passing that way are exactly the set of things that we think the FDW might need. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: