Re: [HACKERS] postgres_fdw bug in 9.6 - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] postgres_fdw bug in 9.6 |
Date | |
Msg-id | 13242.1481582736@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] postgres_fdw bug in 9.6 (Jeff Janes <jeff.janes@gmail.com>) |
Responses |
Re: [HACKERS] postgres_fdw bug in 9.6
Re: [HACKERS] postgres_fdw bug in 9.6 |
List | pgsql-hackers |
Jeff Janes <jeff.janes@gmail.com> writes: > I have a test case where I made the fdw connect back to itself, and > stripped out all the objects that I could and still reproduce the case. It > is large, 21MB compressed, 163MB uncompressed, so I am linking it here: > https://drive.google.com/open?id=0Bzqrh1SO9FcEZkpPM0JwUEMzcmc Thanks for the test case. I believe I understand the fundamental problem, or one of the fundamental problems --- I'm not exactly convinced there aren't several here. The key issue is that GetExistingLocalJoinPath believes it can return any random join path as what to use for the fdw_outerpath of a foreign join. You can get away with that, perhaps, as long as you only consider two-way joins. But in a nest of foreign joins, this fails to consider the interrelationships of join paths and their children. In particular, what we have happening in this example is that GetExistingLocalJoinPath seizes randomly on a MergePath for an upper join relation, clones it, sees that the outer child is a join ForeignPath, and replaces that outer child with its fdw_outerpath ... which was chosen equally cavalierly by some previous execution of GetExistingLocalJoinPath, and does not have the sort ordering expected by the MergePath. So we'd generate an invalid EPQ plan, except that the debug cross-check in create_mergejoin_plan notices the discrepancy. I believe there are probably more problems here, or at least if there aren't, it's not clear why not. Because of GetExistingLocalJoinPath's lack of curiosity about what's underneath the join pathnode it picks, it seems to me that it's possible for it to return a path tree that *isn't* all local joins. If we're looking at, say, a hash path for a 4-way join, whose left input is a hash path for a 3-way join, whose left input is a 2-way foreign join, what's stopping that from being returned as a "local" path tree? Likewise, it seems like the code is trying to reject any custom-path join types, or at least this barely-intelligible comment seems to imply that: * Just skip anything else. We don't know if corresponding * plan would build the output rowfrom whole-row references * of base relations and execute the EPQ checks. But this coding fails to notice any such join type that's below the level of the immediate two join inputs. We've probably managed to not notice this so far because foreign joins generally ought to dominate any local join method, so that there wouldn't often be cases where the surviving paths use local joins for input sub-joins. But Jeff's test case proves it can happen. I kind of wonder why this infrastructure exists at all; it's not the way I'd have foreseen handling EPQ for remote joins. However, while "throw it away and start again" might be good advice going forward, I suppose it won't be very popular for applying to 9.6. One way that we could make things better is to rely on the knowledge that EPQ isn't asked to evaluate joins for more than one row per input relation, and therefore using merge or hash join technology is really overkill. We could make a tree of simple nestloop joins, which aren't going to care about sort order, if we could identify the correct join clauses to apply. At least some of that could be laid off on the FDW, which if it's gotten this far at all, ought to know what join clauses need to be enforced by the foreign join. So I'm thinking a little bit in terms of "just collect the foreign scans for all the base rels included in the join and then build a cross-product nestloop join tree, applying all the join clauses at the top". This would have the signal value that it's guaranteed to succeed and so can be left for later, rather than having to expensively redo it at each level of join planning. (Hm, that does sound a lot like "throw it away and start again", doesn't it. But what we've got here is busted enough that I'm not sure there are good alternatives. Maybe for 9.6 we could just rewrite GetExistingLocalJoinPath, and limp along doing a lot of redundant computation during planning.) BTW, what's "existing" about the result of GetExistingLocalJoinPath? And for that matter, what's "outer" about the content of fdw_outerpath? Good luck figuring that out from the documentation of the field, all zero words of it. regards, tom lane
pgsql-hackers by date: