Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Zhihong Yu |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | CALNJ-vQp3WNV-kKuGsjSOhjRfdECA1Dbxa-vjAgOmQ4f4Qaw-Q@mail.gmail.com Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Tomas Vondra <tomas.vondra@enterprisedb.com>) |
Responses |
Re: POC: GROUP BY optimization
|
List | pgsql-hackers |
On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra <tomas.vondra@enterprisedb.com> wrote:
Hi,
Here's a rebased/improved version of the patch, with smaller parts
addressing various issues. There are seven parts:
0001 - main part, just rebased
0002 - replace the debug GUC options with a single GUC to disable the
optimization if needed
0003 - minor code cleanup, removal of unnecessary variable
0004 - various comment fixes (rewordings, typos, ...)
0005 - a minor code simplification, addressing FIXMEs from 0004
0006 - adds the new GUC to the docs
0007 - demonstrates plan changes with a disabled optimization
The first 6 parts should be squashed and committed at one, I only kept
them separate for clarity. The 0007 is merely a demonstration of the new
GUC and that it disables the optimization.
> Agree. Because it is a kind of automation we should allow user to switch
> it off in the case of problems or manual tuning.
> > Also, I looked through this patch. It has some minor problems:
> 1. Multiple typos in the patch comment.
I went through the comments and checked all of them for grammar mistakes
and typos using a word processor, so hopefully that should be OK. But
maybe there's still something wrong.
> 2. The term 'cardinality of a key' - may be replace with 'number of
> duplicates'?
No, cardinality means "number of distinct values", so "duplicates" would
be wrong. And I think "cardinality" is well established term, so I think
it's fine.
BTW I named the GUC enable_group_by_reordering, I wonder if it should be
named differently, e.g. enable_groupby_reordering? Opinions?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Hi,
For 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch:
multiple parametes need to be
parametes -> parameters
leave more expensive comparions
comparions -> comparisons
+ if (has_fake_var == false)
The above can be written as:
if (!has_fake_var)
+ nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) / list_length(pathkeys));
Looks like the value of tuples doesn't change inside the loop.
You can precompute sqrt(tuples) outside the loop and store the value in a variable.
+ return -1;
+ else if (a->cost == b->cost)
+ return 0;
+ return 1;
+ else if (a->cost == b->cost)
+ return 0;
+ return 1;
the keyword 'else' is not needed.
+ * Returns newly allocated lists. If no reordering is possible (or needed),
+ * the lists are set to NIL.
+ */
+static bool
+ * the lists are set to NIL.
+ */
+static bool
+get_cheapest_group_keys_order(PlannerInfo *root, double nrows,
It seems the comment for return value doesn't match the bool return type.
+ /* If this optimization is disabled, we're done. */
+ if (!debug_cheapest_group_by)
+ if (!debug_cheapest_group_by)
It seems enable_cheapest_group_by would be better name for the flag.
Cheers
pgsql-hackers by date: