Re: POC: GROUP BY optimization - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: POC: GROUP BY optimization |
Date | |
Msg-id | 7a80a207-5e8e-b80c-476c-2d290b0526e9@enterprisedb.com Whole thread Raw |
In response to | Re: POC: GROUP BY optimization (Zhihong Yu <zyu@yugabyte.com>) |
Responses |
Re: POC: GROUP BY optimization
|
List | pgsql-hackers |
On 3/29/22 04:02, Zhihong Yu wrote: > > > On Mon, Mar 28, 2022 at 5:49 PM Tomas Vondra > <tomas.vondra@enterprisedb.com <mailto:tomas.vondra@enterprisedb.com>> > wrote: > > Hi, > > Here's a rebased/improved version of the patch, with smaller parts > addressing various issues. There are seven parts: > > 0001 - main part, just rebased > > 0002 - replace the debug GUC options with a single GUC to disable the > optimization if needed > > 0003 - minor code cleanup, removal of unnecessary variable > > 0004 - various comment fixes (rewordings, typos, ...) > > 0005 - a minor code simplification, addressing FIXMEs from 0004 > > 0006 - adds the new GUC to the docs > > 0007 - demonstrates plan changes with a disabled optimization > > The first 6 parts should be squashed and committed at one, I only kept > them separate for clarity. The 0007 is merely a demonstration of the new > GUC and that it disables the optimization. > > > Agree. Because it is a kind of automation we should allow user to > switch > > it off in the case of problems or manual tuning. > > > Also, I looked through this patch. It has some minor problems: > > 1. Multiple typos in the patch comment. > > I went through the comments and checked all of them for grammar mistakes > and typos using a word processor, so hopefully that should be OK. But > maybe there's still something wrong. > > > 2. The term 'cardinality of a key' - may be replace with 'number of > > duplicates'? > > No, cardinality means "number of distinct values", so "duplicates" would > be wrong. And I think "cardinality" is well established term, so I think > it's fine. > > BTW I named the GUC enable_group_by_reordering, I wonder if it should be > named differently, e.g. enable_groupby_reordering? Opinions? > > > regards > > -- > Tomas Vondra > EnterpriseDB: http://www.enterprisedb.com <http://www.enterprisedb.com> > The Enterprise PostgreSQL Company > > Hi, > > For 0001-Optimize-order-of-GROUP-BY-keys-20220328.patch: > > multiple parametes need to be > > parametes -> parameters > > leave more expensive comparions > > comparions -> comparisons > > + if (has_fake_var == false) > > The above can be written as: > > if (!has_fake_var) > All of this was already fixed in one of the subsequent "fixup" patches. Attached is a patch merging 0001-0006, which is what I proposed to get committed. > + nGroups = ceil(2.0 + sqrt(tuples) * (i + 1) / > list_length(pathkeys)); > > Looks like the value of tuples doesn't change inside the loop. > You can precompute sqrt(tuples) outside the loop and store the value in > a variable. > IMHO it makes the formula harder to read, and the effect is not going to be measurable - we're processing only a couple elements. If the loop was ~100 iterations or more, maybe it'd have impact. > + return -1; > + else if (a->cost == b->cost) > + return 0; > + return 1; > > the keyword 'else' is not needed. > True, but this is how comparators are usually implemented. > + * Returns newly allocated lists. If no reordering is possible (or needed), > + * the lists are set to NIL. > + */ > +static bool > +get_cheapest_group_keys_order(PlannerInfo *root, double nrows, > > It seems the comment for return value doesn't match the bool return type. > Yup, this is a valid point. I've fixed/reworded the comment a bit. > + /* If this optimization is disabled, we're done. */ > + if (!debug_cheapest_group_by) > > It seems enable_cheapest_group_by would be better name for the flag. > This was renamed to enable_order_by_reordering in one of the fixup patches. Attached is a patch merging 0001 and all the fixup patches, i.e. the patch I plan to commit. There was a minor test failure - the new GUC was not added to the sample config file, so 003_check_guc.pl was failing. I'm not including the 0007 part, because that's meant to demonstrate plan changes with disabled optimization, which would confuse cfbot. regards regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Attachment
pgsql-hackers by date: