Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Foreign join pushdown vs EvalPlanQual |
Date | |
Msg-id | 56249FA0.3010001@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Foreign join pushdown vs EvalPlanQual (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Foreign join pushdown vs EvalPlanQual
|
List | pgsql-hackers |
On 2015/10/17 12:22, Robert Haas wrote: > On Fri, Oct 16, 2015 at 9:51 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> Both you and Etsuro Fujita are proposing to fix this problem by >>> somehow making it the FDW's problem to reconstruct the tuple >>> previously produced by the join from whole-row images of the baserels. >>> But that's not looking back far enough: why are we asking for >>> whole-row images of the baserels when what we really want is a >>> whole-row image of the output of the join? The output of the join is >>> what we need to re-return. >> There are multiple components to the requirement though: >> 3. If so, form a join row and return that. Else return NULL. > Not check. > > Suppose we've got two foreign tables ft1 and ft2, using postgres_fdw. > There is a local table t. The user does something like UPDATE t SET > ... FROM ft1, ft2 WHERE t = ft1.a AND ft1.b = ft2.b AND .... The > query planner generates something like: > > Update > -> Join > -> Scan on t > -> Foreign Scan on <ft1, ft2> > > If an EPQ recheck occurs, the only thing that matters is that the > Foreign Scan return the right output row (or possibly now rows, if the > row it would have formed no longer matches the quals). It doesn't > matter how it does this. Let's say the columns actually needed by the > query from the ft1-ft2 join are ft1.a, ft1.b, ft2.a, and ft2.b. > Currently, the output of the foreign scan is something like: ft1.a, > ft1.b, ft2.a, ft.b, ft1.*, ft2.*. The EPQ recheck has access to ft1.* > and ft2.*, but it's not straightforward for postgres_fdw to regenerate > the join tuple from that. Maybe the pushed-down was a left join, > maybe it was a right join, maybe it was a full join. So some of the > columns could have gone to NULL. To figure it out, you need to build > a secondary plan tree that mimics the structure of the join you pushed > down, which is kinda hairy. As Tom mentioned, just recomputing the original join tuple is not good enough. We would need to rejoin the test tuples for the baserels even if ROW_MARK_COPY is in use. Consider: A=# BEGIN; A=# UPDATE t SET a = a + 1 WHERE b = 1; B=# SELECT * from t, ft1, ft2 WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; A=# COMMIT; where the plan for the SELECT FOR UPDATE is LockRows -> Nested Loop -> Seq Scan on t -> Foreign Scan on <ft1, ft2> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c= ft2.c AND ft1.a = $1 AND ft2.b = $2 If an EPQ recheck is invoked by the A's UPDATE, just recomputing the original join tuple from the whole-row image that you proposed would output an incorrect result in the EQP recheck since the value a in the updated version of a to-be-joined tuple in t would no longer match the value ft1.a extracted from the whole-row image if the A's UPDATE has committed successfully. So I think we would need to rejoin the tuples populated from the whole-row images for the baserels ft1 and ft2, by executing the secondary plan with the new parameter values for a and b. As for the secondary plan, I think we could create the corresponding local join execution path during GetForeignJoinPaths, (1) by looking at the pathlist of the joinrel RelOptInfo, which would have already contained some local join execution paths, as does the patch, or (2) by calling a helper function that creates a local join execution path from given outer/inner paths selected from the pathlists of the outerrel/innerrel RelOptInfos, as proposed be KaiGai-san before. ISTM that the latter would be better, so I plan to propose such a function as part of the postgres_fdw join pushdown patch for 9.6. > This example is of the early row locking case, but I think the story > is about the same if the FDW wants to do late row locking instead. If > there's an EPQ recheck, it could issue individual row re-fetches > against every base table and then re-do all the joins that it pushed > down locally. But it would be faster and cleaner, I think, to send > one query to the remote side that re-fetches all the rows at once, and > whose target list is exactly what we need, rather than whole row > targetlists for each baserel that then have to be rejiggered on our > side. I agree with you on that point. (In fact, I thought that too!) But considering that many FDWs including postgres_fdw use early row locking (ie, ROW_MARK_COPY) currently, I'd like to leave that for future work. >> I think what Kaigai-san and Etsuro-san are after is trying to find a way >> to reuse some of the existing EPQ machinery to help with that. This may >> not be practical, or it may end up being messier than a standalone >> implementation; but it's not silly on its face to want to reuse some of >> that code. > Yeah, I think we're all in agreement that reusing as much of the EPQ > machinery as is sensible is something we should do. We are not in > agreement on which parts of it need to be changed or extended. Agreed. Best regards, Etsuro Fujita
pgsql-hackers by date: