Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: [PATCH] Erase the distinctClause if the result is unique by definition |
Date | |
Msg-id | CAApHDvq7i0=O97r4Y1pv68+tprVczKsXRsV28rM9H-rVPOfeNQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Erase the distinctClause if the result is unique by definition (Andy Fan <zhihui.fan1213@gmail.com>) |
Responses |
Re: [PATCH] Erase the distinctClause if the result is unique by definition
Re: [PATCH] Erase the distinctClause if the result is unique by definition |
List | pgsql-hackers |
On Sat, 7 Mar 2020 at 00:47, Andy Fan <zhihui.fan1213@gmail.com> wrote: > Upload the newest patch so that the cfbot can pass. The last patch failed > because some explain without the (cost off). I've only really glanced at this patch, but I think we need to do this in a completely different way. I've been mentioning UniqueKeys around this mailing list for quite a while now [1]. To summarise the idea: 1. Add a new List field to RelOptInfo named unique_keys 2. During get_relation_info() process the base relation's unique indexes and add to the RelOptInfo's unique_keys list the indexed expressions from each unique index (this may need to be delayed until check_index_predicates() since predOK is only set there) 3. Perhaps in add_paths_to_joinrel(), or maybe when creating the join rel itself (I've not looked for the best location in detail), determine if the join can cause rows to be duplicated. If it can't, then add the UniqueKeys from that rel. For example: SELECT * FROM t1 INNER JOIN t2 ON t1.unique = t2.not_unique; would have the joinrel for {t1,t2} only take the unique keys from t2 (t1 can't duplicate t2 rows since it's an eqijoin and t1.unique has a unique index). If the condition was t1.unique = t2.unique then we could take the unique keys from both sides of the join, and with t1.non_unique = t2.non_unique, we can take neither. 4. When creating the GROUP BY paths (when there are no aggregates), don't bother doing anything if the input rel's unique keys are a subset of the GROUP BY clause. Otherwise, create the group by paths and tag the new unique keys onto the GROUP BY rel. 5. When creating the DISTINCT paths, don't bother if the input rel has unique keys are a subset of the distinct clause. 4 and 5 will mean that: SELECT DISTINCT non_unique FROM t1 GROUP BY non_unique will just uniquify once for the GROUP BY and not for the distinct. SELECT DISTINCT unique FROM t1 GROUP BY unique; won't do anything to uniquify the results. Because both 4 and 5 require that the uniquekeys are a subset of the distinct/group by clause, an empty uniquekey set would mean that the RelOptInfo returns no more than 1 row. That would allow your: SELECT DISTINCT max(non_unique) FROM t1; to skip doing the DISTINCT part. There's a separate effort in https://commitfest.postgresql.org/27/1741/ to implement some parts of the uniquekeys idea. However the implementation currently only covers adding the unique keys to Paths, not to RelOptInfos. I also believe that the existing code in analyzejoins.c should be edited to make use of unique keys rather than how it looks at unique indexes and group by / distinct clauses. [1] https://www.postgresql.org/search/?m=1&ln=pgsql-hackers&q=uniquekeys
pgsql-hackers by date: