Re: Foreign join pushdown vs EvalPlanQual - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Foreign join pushdown vs EvalPlanQual |
Date | |
Msg-id | 5639D4DF.5020709@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Foreign join pushdown vs EvalPlanQual (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: Foreign join pushdown vs EvalPlanQual
Re: Foreign join pushdown vs EvalPlanQual |
List | pgsql-hackers |
On 2015/11/04 17:10, Kouhei Kaigai wrote: >> On 2015/10/28 6:04, Robert Haas wrote: >>> On Tue, Oct 20, 2015 at 12:39 PM, Etsuro Fujita >>> <fujita.etsuro@lab.ntt.co.jp> wrote: >>>> Sorry, my explanation was not correct. (Needed to take in caffeine.) What >>>> I'm concerned about is the following: >>>> >>>> SELECT * FROM localtab JOIN (ft1 LEFT JOIN ft2 ON ft1.x = ft2.x) ON >>>> localtab.id = ft1.id FOR UPDATE OF ft1 >>>> >>>> LockRows >>>> -> Nested Loop >>>> Join Filter: (localtab.id = ft1.id) >>>> -> Seq Scan on localtab >>>> -> Foreign Scan on <ft1, ft2> >>>> Remote SQL: SELECT * FROM ft1 LEFT JOIN ft2 WHERE ft1.x = ft2.x >>>> FOR UPDATE OF ft1 >>>> >>>> Assume that ft1 performs late row locking. >>> If the SQL includes "FOR UPDATE of ft1", then it clearly performs >>> early row locking. I assume you meant to omit that. >>>> If an EPQ recheck was invoked >>>> due to a concurrent transaction on the remote server that changed only the >>>> value x of the ft1 tuple previously retrieved, then we would have to >>>> generate a fake ft1/ft2-join tuple with nulls for ft2. (Assume that the ft2 >>>> tuple previously retrieved was not a null tuple.) However, I'm not sure how >>>> we can do that in ForeignRecheck; we can't know for example, which one is >>>> outer and which one is inner, without an alternative local join execution >>>> plan. Maybe I'm missing something, though. >>> I would expect it to issue a new query like: SELECT * FROM ft1 LEFT >>> JOIN ft2 WHERE ft1.x = ft2.x AND ft1.tid = $0 AND ft2.tid = $1. >> We assume here that ft1 uses late row locking, so I thought the above >> SQL should include "FOR UPDATE of ft1". But I still don't think that >> that is right; the SQL with "FOR UPDATE of ft1" wouldn't generate the >> fake ft1/ft2-join tuple with nulls for ft2, as expected. The reason for >> that is that the updated version of the ft1 tuple wouldn't satisfy the >> ft1.tid = $0 condition in an EPQ recheck, because the ctid for the >> updated version of the ft1 tuple has changed. (IIUC, I think that if we >> use a TID scan for ft1, the SQL would generate the expected result, >> because I think that the TID condition would be ignored in the EPQ >> recheck, but I don't think it's guaranteed to use a TID scan for ft1.) >> Maybe I'm missing something, though. > It looks to me, we should not use ctid system column to identify remote > row when postgres_fdw tries to support late row locking. > > The documentation says: > http://www.postgresql.org/docs/devel/static/fdw-callbacks.html#FDW-CALLBACKS-UPDATE > > UPDATE and DELETE operations are performed against rows previously > fetched by the table-scanning functions. The FDW may need extra information, > such as a row ID or the values of primary-key columns, to ensure that it can > identify the exact row to update or delete > > The "rowid" should not be changed once it is fetched from the remote side > until it is actually updated, deleted or locked, for correct identification. > If ctid is used for this purpose, it is safe only when remote row is locked > when it is fetched - it is exactly early row locking behavior, isn't it? Yeah, we should use early row locking for a target foreign table in UPDATE/DELETE. In case of SELECT FOR UPDATE, I think we are allowed to use ctid to identify target rows for late row locking, but I think the above SQL should be changed to something like this: SELECT * FROM (SELECT * FROM ft1 WHERE ft1.tid = $0 FOR UPDATE) ss1 LEFT JOIN (SELECT * FROM ft2 WHERE ft2.tid = $1) ss2 ON ss1.x = ss2.x >>> This should be significantly more efficient than fetching the base >>> rows from each of two tables with two separate queries. >> Maybe I think we could fix the SQL, so I have to admit that, but I'm >> just wondering (1) what would happen for the case when ft1 uses late row >> rocking and ft2 uses early row rocking and (2) that would be still more >> efficient than re-fetching only the base row from ft1. > It should be decision by FDW driver. It is not easy to estimate a certain > FDW driver mixes up early and late locking policy within a same remote join > query. Do you really want to support such a mysterious implementation? Yeah, the reason for that is because GetForeignRowMarkType allows that. > Or, do you expect all the FDW driver is enforced to return a joined tuple > if remote join case? No. That wouldn't make sense if at least one component table involved in a foreign join uses the rowmark type other than ROW_MARK_COPY. > It is different from my idea; it shall be an extra > optimization option if FDW can fetch a joined tuple at once, but not always. > So, if FDW driver does not support this optimal behavior, your driver can > fetch two base tables then run local alternative join (or something other). OK, so if we all agree that the joined-tuple optimization is just an option for the case where all the component tables use ROW_MARK_COPY, I'd propose to leave that for 9.6. Best regards, Etsuro Fujita
pgsql-hackers by date: