Re: join removal - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: join removal |
Date | |
Msg-id | 603c8f070907161927r74d930b3t671bda8786c8b41f@mail.gmail.com Whole thread Raw |
In response to | Re: join removal (Greg Stark <stark@mit.edu>) |
Responses |
Re: join removal
Re: join removal |
List | pgsql-hackers |
On Thu, Jul 16, 2009 at 9:02 PM, Greg Stark<stark@mit.edu> wrote: > I started looking at this patch and it looks pretty good as far as it > goes. But I think we can do a lot more. It seems to me the cases where > foreign key relationships exist are likely to be really big use cases. I agree. But that seems a lot harder, and this is useful all by itself because it can eliminate LEFT joins. Foreign key deductions will be necessary to eliminate inner joins and self-joins. I've been advised that when writing patches for PostgreSQL it's best to start with something small. :-) > I have one big worry though. Currently you're detecting the unique > property using the planner's path mechanism. I suppose that works, but > it's only an accident of the planner design that the path for the > unique index will always be there if it's the join condition. My > instinct is that this code should be going back to the raw index info > to prove this property. The only practical effect I can think of is > that the plan will have to be marked as being dependent on that index > and that will be hard to do if you haven't identified a specific index > you're basing it on. I had trouble figuring out where to hook in the logic. In an ideal world, it would be nice to detect that the join is removable earlier, but it's hard to do that, because it's not until we know the join order that we can test whether any attributes from the inner rel are used above the level of the join. But as it is the fact that the join can be removed will have to be rediscovered over and over again as planning progresses. As for going back to "the raw index info", that was kind of my instinct as well but I couldn't make it work. It seems that the IndexOptInfo structure only knows the column numbers of the index's keys, whereas the code that considers possible join strategies has only equivalence classes to work with, and I don't see how to match the two up. If we can figure out a way to do that it would probably be cleaner. > I would like to see a list of cases we plan to tackle preferably with > example queries, as a kind of checklist so we can knock them down one > by one. Right now it's unclear just how much of the problem space is > being solved. I don't know how many cases I personally plan to handle because I don't know how much time I'm going to have to work on this or whether I have the needed brainpower. But I can enumerate the cases that I know about where this is theoretically possible. - LEFT joins can be eliminated if the nullable side of the join can be proved unique over the join columns. The simplest and most common case is the one where there is a unique index on any (not necessarily proper) subset of the join columns, but it can also happen in any other case where we can prove that the inner rel is unique over (a subset of) the relevant columns, such as when the inner rel groups by those columns. There is an existing function query_is_distinct_for() that does something along these lines, but it operates on yet another different type of data structure (a Query, rather than a list of equivalence classes or alternatively a list of varattnos) and doesn't handle the unique-index case, which is probably the most important one for this optimization. - INNER joins are more complex because what happens on the inner side of the join can potentially wipe out rows from the result. With a LEFT join, it's sufficient to prove that the inner rel is at least unique enough, but for an INNER join, we have to prove that it's exactly UNIQUE enough. I think we can only provide this when the inner rel is a base relation with a unique index over EXACTLY (not a subset of) the relevant columns AND there is a foreign key relationship from the outer rel to the inner rel over the join columns. - Self-joins (whether they are inner, left, semi, or full) can be collapsed into a scan of the underlying base relation if the join columns on both sides include all the columns of the same unique index. All the quals from both sides have to be applied. > Incidentally, guess what other database just got this feature committed... > > http://askmonty.org/worklog/Client-BackLog/?tid=17 Hmm, well, it would be nice to have parity. This is a hugely important feature for the kinds of queries I do all day. ...Robert
pgsql-hackers by date: