Re: PlaceHolderVars versus join ordering - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: PlaceHolderVars versus join ordering |
Date | |
Msg-id | AANLkTimsMFQOHv=ihVUHwXu-O0okh1QQK6wKwLihjpK4@mail.gmail.com Whole thread Raw |
In response to | PlaceHolderVars versus join ordering (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: PlaceHolderVars versus join ordering
|
List | pgsql-hackers |
On Mon, Sep 27, 2010 at 1:15 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've been looking at the bug reported by Kirill Simonov here: > http://archives.postgresql.org/pgsql-bugs/2010-09/msg00265.php > This is pretty ugly :-(. What we have is > select * from A > left join (select some-expressions from B left join C ...) > The expressions are not guaranteed nullable; that is, they won't > automatically produce nulls from all-null rows for B and C. > This means that we have to evaluate the expressions at the join of > B and C, producing PlaceHolderVars, which then bubble up through > the left join with A, and will get replaced with nulls in any > null-extended row of that join. Then references to them in the final > select list will correctly show as nulls. > > The problem is that there isn't anything guaranteeing that the B/C join > will be formed before joining to A. In Kirill's example the planner > thinks it should commute the two joins, and then it evaluates the > PlaceHolderVars at the now-upper B/C join, meaning they fail to go to > nulls in rows where they should. > > This is a fundamental oversight in the original design of the > PlaceHolderVar mechanism. I'm not sure how come I didn't see that some > sort of join-order interlock was necessary, but I didn't. > > The most trustworthy fix that I can think of goes like this: > > 1. Add an extra pass over the parsetree early in query_planner() to > locate PlaceHolderVars, set up the PlaceHolderInfo list, and remember the > syntactic level of each actually-used PlaceHolderVar. It's slightly > annoying to have to make an extra tree traversal for this, but at least > we can skip it in the common case where no PlaceHolderVars were created. > (Currently, this processing happens as a side-effect during > distribute_qual_to_rels, but that's too late for what we need to do in > point 2. Also, making the PlaceHolderInfos immediately when the > PlaceHolderVars are created would be unpalatable since we don't know at > that stage whether the PlaceHolderVars are actually used above the > relevant outer join.) > > 2. In make_outerjoininfo, scan the PlaceHolderInfo list to see if there > are any placeholders that should be evaluated below the current outer join > and are needed above it. If so, enlarge the outer join's min_righthand > set to ensure that it won't get moved below where the placeholder has to > be evaluated. We can fold the existing fix_placeholder_eval_levels() > logic into this operation, which will make that code less fragile than it > is now. > > This is a larger change than I would prefer to back-patch, but the only > less-invasive alternative I can see is to lobotomize the PlaceHolderVar > mechanism entirely by reverting to 8.3-style logic wherein we prevented > pullup of sub-selects that would require introduction of placeholders. > That would undo a significant optimization feature of 8.4, one that > I believe we're now relying on for reasonable performance of some system > views. > > Thoughts, better ideas? Personally, I would rather back-patch a more invasive bug fix than a performance regression. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company
pgsql-hackers by date: