Re: [PATCH] Erase the distinctClause if the result is unique by definition - Mailing list pgsql-hackers
From | David Rowley |
---|---|
Subject | Re: [PATCH] Erase the distinctClause if the result is unique by definition |
Date | |
Msg-id | CAApHDvrBXjAvH45dEAZFOk-hzOt1mJC7-fxZ2v49mc5njtA7VQ@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCH] Erase the distinctClause if the result is unique by definition (Andy Fan <zhihui.fan1213@gmail.com>) |
Responses |
Re: [PATCH] Erase the distinctClause if the result is unique by definition
|
List | pgsql-hackers |
On Wed, 11 Mar 2020 at 17:23, Andy Fan <zhihui.fan1213@gmail.com> wrote: > Now I am convinced that we should maintain UniquePath on RelOptInfo, > I would see how to work with "Index Skip Scan" patch. I've attached a very early proof of concept patch for unique keys. The NULL detection stuff is not yet hooked up, so it'll currently do the wrong thing for NULLable columns. I've left some code in there with my current idea of how to handle that, but I'll need to add more code both to look at the catalogue tables to see if there's a NOT NULL constraint and also to check for strict quals that filter out NULLs. Additionally, I've not hooked up the collation checking stuff yet. I just wanted to see if it would work ok for non-collatable types first. I've added a couple of lines to create_distinct_paths() to check if the input_rel has the required UniqueKeys to skip doing the DISTINCT. It seems to work, but my tests so far are limited to: create table t1(a int primary key, b int); create table t2(a int primary key, b int); postgres=# -- t2 could duplicate t1, don't remove DISTINCT postgres=# explain (costs off) select distinct t1.a from t1 inner join t2 on t1.a = t2.b; QUERY PLAN ---------------------------------- HashAggregate Group Key: t1.a -> Hash Join Hash Cond: (t2.b = t1.a) -> Seq Scan on t2 -> Hash -> Seq Scan on t1 (7 rows) postgres=# -- neither rel can duplicate the other due to join on PK. Remove DISTINCT postgres=# explain (costs off) select distinct t1.a from t1 inner join t2 on t1.a = t2.a; QUERY PLAN ---------------------------- Hash Join Hash Cond: (t1.a = t2.a) -> Seq Scan on t1 -> Hash -> Seq Scan on t2 (5 rows) postgres=# -- t2.a cannot duplicate t1 and t1.a is unique. Remove DISTINCT postgres=# explain (costs off) select distinct t1.a from t1 inner join t2 on t1.b = t2.a; QUERY PLAN ---------------------------- Hash Join Hash Cond: (t1.b = t2.a) -> Seq Scan on t1 -> Hash -> Seq Scan on t2 (5 rows) postgres=# -- t1.b can duplicate t2.a. Don't remove DISTINCT postgres=# explain (costs off) select distinct t2.a from t1 inner join t2 on t1.b = t2.a; QUERY PLAN ---------------------------------- HashAggregate Group Key: t2.a -> Hash Join Hash Cond: (t1.b = t2.a) -> Seq Scan on t1 -> Hash -> Seq Scan on t2 (7 rows) postgres=# -- t1.a cannot duplicate t2.a. Remove DISTINCT. postgres=# explain (costs off) select distinct t2.a from t1 inner join t2 on t1.a = t2.b; QUERY PLAN ---------------------------- Hash Join Hash Cond: (t2.b = t1.a) -> Seq Scan on t2 -> Hash -> Seq Scan on t1 (5 rows) I've also left a bunch of XXX comments for things that I know need more thought. I believe we can propagate the joinrel's unique keys where the patch is currently doing it. I understand that in populate_joinrel_with_paths() we do things like swapping LEFT JOINs for RIGHT JOINs and switch the input rels around, but we do so only because it's equivalent, so I don't currently see why we can't take the jointype for the SpecialJoinInfo. I need to know that as I'll need to ignore pushed down RestrictInfos for outer joins. I'm posting now as I know I've been mentioning this UniqueKeys idea for quite a while and if it's not something that's going to get off the ground, then it's better to figure that out now.
Attachment
pgsql-hackers by date: