Home > mailing lists

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey - Mailing list pgsql-hackers

From	David Rowley
Subject	Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
Date	March 31, 2020 01:43:52
Msg-id	CAApHDvpW5P8LfGNe5TJvs+nVx2Fua4uYnKk7wzYTfA=xn1o_FQ@mail.gmail.com Whole thread Raw
In response to	Re: [PATCH] Keeps tracking the uniqueness with UniqueKey (Andy Fan <zhihui.fan1213@gmail.com>)
Responses	Re: [PATCH] Keeps tracking the uniqueness with UniqueKey Re: [PATCH] Keeps tracking the uniqueness with UniqueKey Re: [PATCH] Keeps tracking the uniqueness with UniqueKey
List	pgsql-hackers

Tree view

On Sun, 29 Mar 2020 at 20:50, Andy Fan <zhihui.fan1213@gmail.com> wrote:
> Some other changes made in the new patch:
> 1.   Fixed bug for UniqueKey calculation for OUTER join.
> 2.   Fixed some typo error in comments.
> 3.   Renamed the field "grantee" as "guarantee".

I've had a look over this patch. Thank for you doing further work on it.

I've noted down the following during my read of the code:

1. There seem to be some cases where joins are no longer being
detected as unique. This is evident in postgres_fdw.out. We shouldn't
be regressing any of these cases.

2. The following change does not seem like it should be part of this
patch.  I understand you perhaps have done as you think it will
improve the performance of checking if an expression is in a list of
expressions.

- COMPARE_SCALAR_FIELD(varno);
+ /* Compare varattno first since it has higher selectivity than varno */
  COMPARE_SCALAR_FIELD(varattno);
+ COMPARE_SCALAR_FIELD(varno);

If you think that is true, then please do it as a separate effort and
provide benchmarks with your findings.

3. list_all_members_in. I think this would be better named as
list_is_subset. Please follow the lead of bms_is_subset().
Additionally, you should Assert that IsPointerList is true as there's
nothing else to indicate that it can't be used for an int or Oid list.

4. guarantee is not a very good name for the field in UniqueKey.
Maybe something like is_not_null?

5. I think you should be performing a bms_del_member during join
removal rather than removing this Assert()

- Assert(bms_equal(rel->relids, root->all_baserels));

FWIW, it's far from perfect that you've needed to delay the left join
removal, but I do understand why you've done it. It's also far from
perfect that you're including removed relations in the
total_table_pages calculation. c6e4133fae1 took some measures to
improve this calculation and this is making it worse again.

6. Can you explain why you moved the populate_baserel_uniquekeys()
call out of set_plain_rel_size()?

7. I don't think the removal of rel_supports_distinctness() is
warranted.  Is it not ok to check if the relation has any uniquekeys?
It's possible, particularly in join_is_removable that this can save
quite a large amount of effort.

8. Your spelling of unique is incorrect in many places:

src/backend/nodes/makefuncs.c: * makeUnqiueKey
src/backend/optimizer/path/uniquekeys.c:static List
*initililze_unqiuecontext_for_joinrel(RelOptInfo *joinrel,
src/backend/optimizer/path/uniquekeys.c: * check if combination of
unqiuekeys from both side is still useful for us,
src/backend/optimizer/path/uniquekeys.c:        outerrel_uniquekey_ctx
= initililze_unqiuecontext_for_joinrel(joinrel, outerrel);
src/backend/optimizer/path/uniquekeys.c:        innerrel_uniquekey_ctx
= initililze_unqiuecontext_for_joinrel(joinrel, innerrel);
src/backend/optimizer/path/uniquekeys.c: * we need to convert the
UnqiueKey from sub_final_rel to currel via the positions info in
src/backend/optimizer/path/uniquekeys.c:                ctx->pos =
pos; /* the position in current targetlist,  will be used to set
UnqiueKey */
src/backend/optimizer/path/uniquekeys.c: * Check if Unqiue key of the
innerrel is valid after join. innerrel's UniqueKey
src/backend/optimizer/path/uniquekeys.c: * initililze_unqiuecontext_for_joinrel
src/backend/optimizer/path/uniquekeys.c: * all the unqiuekeys which
are not possible to use later
src/backend/optimizer/path/uniquekeys.c:initililze_unqiuecontext_for_joinrel(RelOptInfo
*joinrel,  RelOptInfo *inputrel)
src/backend/optimizer/plan/analyzejoins.c:                      /*
This UnqiueKey is what we want */
src/backend/optimizer/plan/planner.c:   /* If we the result if unqiue
already, we just return the input_rel directly */
src/include/nodes/pathnodes.h: * exprs is a list of exprs which is
unqiue on current RelOptInfo.
src/test/regress/expected/join.out:-- XXXX: since b.id is unqiue now
so the group by cluase is erased, so
src/test/regress/expected/select_distinct.out:-- create unqiue index on dist_p
src/test/regress/expected/select_distinct.out:-- we also support
create unqiue index on each child tables
src/test/regress/sql/join.sql:-- XXXX: since b.id is unqiue now so the
group by cluase is erased, so
src/test/regress/sql/select_distinct.sql:-- create unqiue index on dist_p
src/test/regress/sql/select_distinct.sql:-- we also support create
unqiue index on each child tables

9. A few things wrong with the following fragment:

/* set the not null info now */
ListCell *lc;
foreach(lc, find_nonnullable_vars(qual))
{
Var *var = lfirst_node(Var, lc);
RelOptInfo *rel = root->simple_rel_array[var->varno];
if (var->varattno > InvalidAttrNumber)
rel->not_null_cols = bms_add_member(rel->not_null_cols, var->varattno);
}

a. including a function call in the foreach macro is not a practise
that we really follow. It's true that the macro now assigns the 2nd
param to a variable. Previous to 1cff1b95ab6 this was not the case and
it's likely best not to leave any bad examples around that code which
might get backported might follow.
b. We generally subtract InvalidAttrNumber from varattno when
including in a Bitmapset.
c. not_null_cols is not well named. I think notnullattrs
d. not_null_cols should not be a Relids type, it should be Bitmapset.

10. add_uniquekey_for_onerow() seems pretty wasteful.  Is there really
a need to add each item in the rel's targetlist to the uniquekey list?
What if we just add an empty list to the unique keys, that way if we
need to test if some expr is a superset of any uniquekey, then we'll
see it is as any set is a superset of an empty set.  Likely the empty
set of uniquekeys should be the only one in the rel's uniquekey list.

11. In create_distinct_paths() the code is now calling
get_sortgrouplist_exprs() multiple times with the same input. I think
it would be better to just call it once and set the result in a local
variable.

12. The comment in the code below is not true. The List contains
Lists, of which contain UniqueKeys

List    *uniquekeys; /* List of UniqueKey */

13. I'm having trouble parsing the final sentence in:

+ * can only guarantee the uniqueness without considering the null values. This
+ * field is necessary for remove_useless_join & reduce_unique_semijions since
+ * these cases don't care about the null values.

Why is the field which stores the nullability of the key required for
code that does not care about the nullability of the key?

Also please check your spelling of the word "join"

14. In the following fragment, instead of using i+1, please assign the
FormData_pg_attribute to a variable named attr and use attr->attnum.
Also, please see what I mentioned above about subtracting
InvalidAttrNumber

+ rel->not_null_cols = bms_add_member(rel->not_null_cols, i+1);

15. The tests you've changed the expected outcome of in join.out
should be updated so that the GROUP BY and DISTINCT clause is not
removed. This will allow the test to continue testing what it was
intended to test. You can do this by changing the columns in the GROUP
BY clause so that the new code does not find uniquekeys for those
columns.

16. The tests in aggregates.out are in a similar situation. There are
various tests trying to ensure that remove_useless_groupby_columns()
does what it's meant to do. You can modify these tests to add a join
which is non-unique to effectively duplicate the PK column.

17. In your select_distinct tests, can you move away from naming the
tables starting with select_distinct?  It makes reading queries pretty
hard.

e.g. explain (costs off) select distinct uk1, uk2 from
select_distinct_a where uk2 is not null;

When I first glanced that, I failed to see the underscores and the
query looked invalid.

18. Check the spelling if "erased". You have it spelt as "ereased" in
a couple of locations.

19. Please pay attention to the capitalisation of SQL keywords in the
test files you've modified. I understand we're very inconsistent in
this department in general, but we do at least try not to mix
capitalisation within the same file.  Basically, please upper case the
keywords in select_distinct.sql

20. In addition to the above, please try to wrap long SQL lines so
they're below 80 chars.

I'll review the patch in more detail once the above points have been addressed.

David

pgsql-hackers by date:

From: Andres Freund
Date: 31 March 2020, 01:40:07
Subject: Very outdated comments in heapam_index_build_range_scan.

From: Amit Kapila
Date: 31 March 2020, 02:20:45
Subject: Re: error context for vacuum to include block number

Re: [PATCH] Keeps tracking the uniqueness with UniqueKey - Mailing list pgsql-hackers

Previous

Next