Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: Foreign join pushdown vs EvalPlanQual |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F80115B317@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | Re: Foreign join pushdown vs EvalPlanQual (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: Foreign join pushdown vs EvalPlanQual
|
List | pgsql-hackers |
> -----Original Message----- > From: Etsuro Fujita [mailto:fujita.etsuro@lab.ntt.co.jp] > Sent: Wednesday, October 21, 2015 12:31 PM > To: Robert Haas > Cc: Tom Lane; Kaigai Kouhei(海外 浩平); Kyotaro HORIGUCHI; > pgsql-hackers@postgresql.org; Shigeru Hanada > Subject: Re: [HACKERS] Foreign join pushdown vs EvalPlanQual > > On 2015/10/20 13:11, Etsuro Fujita wrote: > > On 2015/10/20 5:34, Robert Haas wrote: > >> On Mon, Oct 19, 2015 at 3:45 AM, Etsuro Fujita > >> <fujita.etsuro@lab.ntt.co.jp> wrote: > >>> As Tom mentioned, just recomputing the original join tuple is not good > >>> enough. We would need to rejoin the test tuples for the baserels > >>> even if > >>> ROW_MARK_COPY is in use. Consider: > >>> > >>> A=# BEGIN; > >>> A=# UPDATE t SET a = a + 1 WHERE b = 1; > >>> B=# SELECT * from t, ft1, ft2 > >>> WHERE t.a = ft1.a AND t.b = ft2.b AND ft1.c = ft2.c FOR UPDATE; > >>> A=# COMMIT; > >>> > >>> where the plan for the SELECT FOR UPDATE is > >>> > >>> LockRows > >>> -> Nested Loop > >>> -> Seq Scan on t > >>> -> Foreign Scan on <ft1, ft2> > >>> Remote SQL: SELECT * FROM ft1 JOIN ft2 WHERE ft1.c = ft2.c > >>> AND ft1.a > >>> = $1 AND ft2.b = $2 > >>> > >>> If an EPQ recheck is invoked by the A's UPDATE, just recomputing the > >>> original join tuple from the whole-row image that you proposed would > >>> output > >>> an incorrect result in the EQP recheck since the value a in the updated > >>> version of a to-be-joined tuple in t would no longer match the value > >>> ft1.a > >>> extracted from the whole-row image if the A's UPDATE has committed > >>> successfully. So I think we would need to rejoin the tuples > >>> populated from > >>> the whole-row images for the baserels ft1 and ft2, by executing the > >>> secondary plan with the new parameter values for a and b. > > >> No. You just need to populate fdw_recheck_quals correctly, same as > >> for the scan case. > > > Yeah, I think we can probably do that for the case where a pushed-down > > join clause is an inner-join one, but I'm not sure that we can do that > > for the case where that clause is an outer-join one. Maybe I'm missing > > something, though. > > As I said yesterday, that opinion of me is completely wrong. Sorry for > the incorrectness. Let me explain a little bit more. I still think > that even if ROW_MARK_COPY is in use, we would need to locally rejoin > the tuples populated from the whole-row images for the foreign tables > involved in a remote join, using a secondary plan. Consider eg, > > SELECT localtab.*, ft2 from localtab, ft1, ft2 > WHERE ft1.x = ft2.x AND ft1.y = localtab.y FOR UPDATE > > In this case, since the output of the foreign join would not include any > ft1 columns, I don't think we could do the same thing as for the scan > case, even if populating fdw_recheck_quals correctly. > As an aside, could you introduce the reason why you think so? It is significant point in discussion, if we want to reach the consensus. It looks to me the above introduction mix up the target-list of user query and the target-list of remote query. If EPQ mechanism requires joined tuple on ft1 and ft2, FDW driver can make a remote query as follows: SELECT ft2, ft1.y, ft1.x, ft2.x FROM ft1.x = ft2.x FOR UPDATE Thus, fdw_scan_tlist has four target-entries, but later two items are resjunk=true because ForeignScan node drops these columns by projection when it returns a tuple to upper node. On the other hands, the joined-tuple we're talking about in this context is a tuple prior to projection; formed according to the fdw_scan_tlist. So, it contains all the necessary information to run scan/join qualifiers towards the joined-tuple. It is not affected by the target-list of user query. Even though I think the approach with joined-tuple reconstruction is reasonable solution here, it is not a fair reason to introduce disadvantage of Robert's suggestion. > And I think we > would need to rejoin the tuples, using a local join execution plan, > which would have the parameterization for the to-be-pushed-down clause > ft1.y = localtab.y. I'm still missing something, though. > Also, please don't mix up "what we do" and "how we do". It is "what we do" to discuss which format of tuples shall be returned to the core backend from the extension, because it determines the role of interface. If our consensus is to return a joined-tuple, we need to design the interface according to the consensus. On the other hands, it is "how we do" discussion whether we should enforce all the FDW/CSP extension to have alternative plan, or not. Once we got a consensus in "what we do" discussion, there are variable options to solve the requirement by the consensus, however, we cannot prioritize "how we do" without "what we do". Thanks, -- NEC Business Creation Division / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date: