Re: PATCH: use foreign keys to improve join estimates v1 - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: PATCH: use foreign keys to improve join estimates v1 |
Date | |
Msg-id | 555A70DA.7050501@2ndquadrant.com Whole thread Raw |
In response to | Re: PATCH: use foreign keys to improve join estimates v1 (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: PATCH: use foreign keys to improve join estimates v1
|
List | pgsql-hackers |
Hi David, On 05/17/15 14:31, David Rowley wrote: > Hi Tomas, > > I did glance at this patch a while back, but just thinking on it again. > > I think if you find any quals that are a part of *any* foreign key > between the 2 join tables, then you should be never assume these quals > to reduce the number of rows. I believe this should be the case even if > you don't fully match all columns of the foreign key. > > If we modify your example a little, let's say your foreign key between > fact and dim is made from 3 columns (a,b,c) > > If we do: > > EXPLAIN SELECT * FROM fact f JOIN dim d USING (a,b); > > Then we should always (under normal circumstances) find at least one > matching row, although in this case since the join qual for c is > missing, we could find more than 1 matching row. > > Without digging too deep here, I'd say that the best way to do this > would be to either have calc_joinrel_size_estimate() build a list of > restrictinfo items of all quals that are NOT part of any foreign key and > pass that trimmed list down to clauselist_selectivity() for selectivity > estimates. Or perhaps a better way would be just to > teach clauselist_selectivity() about foreign keys. Likely > clauselist_selectivity() would just have to skip over RestrictInfo items > that are part of a foreign key. I'm not particularly happy about the way the current patch tweaks the selectivity, but I think simply removing the clauses is not going to work, because that implies the (removed) conditions have selectivity 1.0 (so the estimate would match a cartesian product). So we really need to estimate the selectivity, we can't just throw them away. And that's essentially what the current patch does - it realizes the clauses match a FK, and then computes the estimate as 1/N where N is the cardinality of the table with PK. Another thing is that there may be multiple "candidate" foreign keys, and we can't just remove clauses matching at least one of them. Imagine e.g. a (somewhat artificial) example: CREATE TABLE parent ( a INT, b INT, c INT, d INT, PRIMARY KEY (a,b), UNIQUE (c,d) ); CREATE TABLE child ( a INT, b INT, c INT, d INT, FOREIGN KEY (a,b) REFERENCES parent (a,b), FOREIGN KEY (c,d) REFERENCES parent (c,b) ); and a join on (a,b,c,d). We can't just discard all the conditions, because (a,b) and (c,d) may 'mismatch'. We know (a,b) and (c,d) matches about 1/N of the 'parent' table, but we don't know selectivity for the whole join condition. And it may be more complex, e.g. there may be two overlapping FKeys, e.b. (a,b) and (b,c) - how do you split that? But this may be an overkill (multiple multi-column FK join), although if we could handle that reasonably, then why not ... someone out there certainly has schema like that ;-) What I think is somewhat more realistic is conditions matching only a subset of FK columns - for example with a FK on (a,b) the join may only use (a), for example. Then again - we can't just discard that, we need to estimate that (and use it to compute the actual selectivity). I agree that if we want to do anything fancy with this, we will have to teach clauselist_selectivity() about foreign keys. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: