Re: Optimization for updating foreign tables in Postgres FDW - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Optimization for updating foreign tables in Postgres FDW |
Date | |
Msg-id | CA+Tgmob_X7gsU+Frmok28F=WErye5wAi3p10LT2ULEPzHUDh1A@mail.gmail.com Whole thread Raw |
In response to | Re: Optimization for updating foreign tables in Postgres FDW (Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp>) |
Responses |
Re: Optimization for updating foreign tables in Postgres
FDW
|
List | pgsql-hackers |
On Mon, Mar 7, 2016 at 7:53 AM, Etsuro Fujita <fujita.etsuro@lab.ntt.co.jp> wrote: > Another option to avoid such a hazard would be to remove the two changes > from ExecInitModifyTable and create ExecAuxRowMarks and junk filters even in > the pushdown case. I made the changes because we won't use ExecAuxRowMarks > in that case since we don't need to do EvalPlanQual rechecks and because we > won't use junk filters in that case since we do UPDATE/DELETE in the > subplan. But the creating cost is enough small, so simply removing the > changes seems like a good idea. Sure, that works. >> This issue crops up elsewhere as well. The changes to >> ExecModifyTable() have the same problem -- in that case, it might be >> wise to move the code that's going to have to be indented yet another >> level into a separate function. That code also says this: >> >> + /* No need to provide scan tuple to >> ExecProcessReturning. */ >> + slot = ExecProcessReturning(resultRelInfo, >> NULL, planSlot); >> >> ...but, uh, why not? The comment says what the code does, but what it >> should do is explain why it does it. > > As documented in IterateDMLPushdown in fdwhandler.sgml, the reason for that > is that in the pushdown case it's the IterateDMLPushdown's responsiblity to > get actually inserted/updated/deleted tuples and make those tuples available > to the ExecProcessReturning. I'll add comments. Comments are good things to have. :-) >> On a broader level, I'm not very happy with the naming this patch >> uses. Here's an example: >> >> + <para> >> + If an FDW supports optimizing foreign table updates, it still needs >> to >> + provide <function>PlanDMLPushdown</>, <function>BeginDMLPushdown</>, >> + <function>IterateDMLPushdown</> and <function>EndDMLPushdown</> >> + described below. >> + </para> >> >> "Optimizing foreign table updates" is both inaccurate (since it >> doesn't only optimize updates) and so vague as to be meaningless >> unless you already know what it means. The actual patch uses >> terminology like "fdwPushdowns" which is just as bad. We might push a >> lot of things to the foreign side -- sorts, joins, aggregates, limits >> -- and this is just one of them. Worse, "pushdown" is itself >> something of a term of art - will people who haven't been following >> all of the mammoth, multi-hundred-email threads on this topic know >> what that means? I think we need some better terminology here. >> >> The best thing that I can come up with offhand is "bulk modify". So >> we'd have PlanBulkModify, BeginBulkModify, IterateBulkModify, >> EndBulkModify, ExplainBulkModify. Other suggestions welcome. The >> ResultRelInfo flag could be ri_usesFDWBulkModify. > > I'm not sure that "bulk modify" is best. Yeah, this would improve the > performance especially in the bulk-modification case, but would improve the > performance even in the case where an UPDATE/DELETE modifies just a single > row. Let me explain using an example. Without the patch, we have the > following plan for an UPDATE on a foreign table that updates a single row: > > postgres=# explain verbose update foo set a = a + 1 where a = 1; > QUERY PLAN > ---------------------------------------------------------------------------------- > Update on public.foo (cost=100.00..101.05 rows=1 width=14) > Remote SQL: UPDATE public.foo SET a = $2 WHERE ctid = $1 > -> Foreign Scan on public.foo (cost=100.00..101.05 rows=1 width=14) > Output: (a + 1), b, ctid > Remote SQL: SELECT a, b, ctid FROM public.foo WHERE ((a = 1)) FOR > UPDATE > (5 rows) > > The plan requires two queries, SELECT and UPDATE, to do the update. > (Actually, the plan have additional overheads in creating a cursor for the > SELECT and establishing a prepared statement for the UPDATE.) But with the > patch, we have: > > postgres=# explain verbose update foo set a = a + 1 where a = 1; > QUERY PLAN > --------------------------------------------------------------------------- > Update on public.foo (cost=100.00..101.05 rows=1 width=14) > -> Foreign Update on public.foo (cost=100.00..101.05 rows=1 width=14) > Remote SQL: UPDATE public.foo SET a = (a + 1) WHERE ((a = 1)) > (3 rows) > > The optimized plan requires just a single UPDATE query to do that! So, even > in the single-row-modification case the patch could improve the performance. > > How about "Direct Modify"; PlanDirectModify, BeginDirectModify, > IterateDirectModify, EndDirectModify, ExplainDirectModify, and > ri_usesFDWDirectModify. Works for me! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: