Home > mailing lists

Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

From	Jeevan Chalke
Subject	Re: [HACKERS] Partition-wise aggregation/grouping
Date	February 26, 2018 14:38:30
Msg-id	CAM2+6=U-B5AOGmU0s8sRVYFqoHq+fXF2FsmUVexCMF0VA3D5gw@mail.gmail.com Whole thread Raw
In response to	Re: [HACKERS] Partition-wise aggregation/grouping (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [HACKERS] Partition-wise aggregation/grouping
List	pgsql-hackers

Tree view

Hi Robert,

On Fri, Feb 23, 2018 at 2:53 AM, Robert Haas <robertmhaas@gmail.com> wrote:

On Thu, Feb 8, 2018 at 8:05 AM, Jeevan Chalke
<jeevan.chalke@enterprisedb.com> wrote:
> In this attached version, I have rebased my changes over new design of
> partially_grouped_rel. The preparatory changes of adding
> partially_grouped_rel are in 0001.

I spent today hacking in 0001; results attached. The big change from
your version is that this now uses generate_gather_paths() to add
Gather/Gather Merge nodes (except in the case where we sort by group
pathkeys and then Gather Merge) rather than keeping all of the bespoke
code. That turned up to be a bit less elegant than I would have liked
-- I had to an override_rows argument to generate_gather_paths to make
it work. But overall I think this is still a big improvement, since
it lets us share code instead of duplicating it. Also, it potentially
lets us add partially-aggregated but non-parallel paths into
partially_grouped_rel->pathlist and that should Just Work; they will
get the Finalize Aggregate step but not the Gather. With your
arrangement that wouldn't work.

Please review/test.

I have reviewed and tested the patch and here are my couple of points:

     /*
-    * If the input rel belongs to a single FDW, so does the grouped rel.
+    * If the input rel belongs to a single FDW, so does the grouped rel. Same
+    * for the partially_grouped_rel.
     */
     grouped_rel->serverid = input_rel->serverid;
     grouped_rel->userid = input_rel->userid;
     grouped_rel->useridiscurrent = input_rel->useridiscurrent;
     grouped_rel->fdwroutine = input_rel->fdwroutine;
+    partially_grouped_rel->serverid = input_rel->serverid;
+    partially_grouped_rel->userid = input_rel->userid;
+    partially_grouped_rel->useridiscurrent = input_rel->useridiscurrent;
+    partially_grouped_rel->fdwroutine = input_rel->fdwroutine;

In my earlier mail where I have posted a patch for this partially grouped rel changes, I forgot to put my question on this.
I was unclear about above changes and thus passed grouped_rel whenever we wanted to work on partially_grouped_rel to fetch relevant details.

One idea I thought about is to memcpy the struct once we have set all required fields for grouped_rel so that we don't have to do similar stuff for partially_grouped_rel.

---

+            * Insert a Sort node, if required. But there's no point in
+            * sorting anything but the cheapest path.
             */
-            if (root->group_pathkeys)
+            if (!pathkeys_contained_in(root->group_pathkeys, path->pathkeys))
+            {
+                if (path != linitial(partially_grouped_rel->pathlist))
+                    continue;

Paths in pathlist are added by add_path(). Though we have paths is pathlist is sorted with the cheapest total path, we generally use RelOptInfo->cheapest_total_path instead of using first entry, unlike partial paths. But here you use the first entry like partial paths case. Will it better to use cheapest total path from partially_grouped_rel? This will require calling set_cheapest on partially_grouped_rel before we call this function.

Attached top-up patch doing this along with few indentation fixes.

Rest of the changes look good to me.

Once this gets in, I will re-base my other patches accordingly.

And, thanks for committing 0006.

Thanks

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Jeevan Chalke
Technical Architect, Product Development
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Attachment

use_cheapest_total_path.patch

pgsql-hackers by date:

From: Fabien COELHO
Date: 26 February 2018, 13:14:48
Subject: Re: csv format for psql

From: Ashutosh Bapat
Date: 26 February 2018, 14:50:05
Subject: Optimizing nested ConvertRowtypeExpr execution

Re: [HACKERS] Partition-wise aggregation/grouping - Mailing list pgsql-hackers

Attachment

Previous

Next