Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server - Mailing list pgsql-hackers
From | Etsuro Fujita |
---|---|
Subject | Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server |
Date | |
Msg-id | 5BA4CFE5.5040502@lab.ntt.co.jp Whole thread Raw |
In response to | Re: Problem while updating a foreign table pointing to apartitioned table on foreign server (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>) |
Responses |
Re: Problem while updating a foreign table pointing to a partitionedtable on foreign server
|
List | pgsql-hackers |
(2018/09/18 21:14), Kyotaro HORIGUCHI wrote: > At Fri, 14 Sep 2018 22:01:39 +0900, Etsuro Fujita<fujita.etsuro@lab.ntt.co.jp> wrote in<5B9BB133.1060107@lab.ntt.co.jp> >> @@ -126,8 +173,18 @@ get_relation_info(PlannerInfo *root, Oid >> relationObjectId,\ >> bool inhparent, >> (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), >> errmsg("cannot access temporary or unlogged relations during >> r\ >> ecovery"))); >> >> + max_attrnum = RelationGetNumberOfAttributes(relation); >> + >> + /* Foreign table may have exanded this relation with junk columns */ >> + if (root->simple_rte_array[varno]->relkind == RELKIND_FOREIGN_TABLE) >> + { >> + AttrNumber maxattno = max_varattno(root->parse->targetList, varno); >> + if (max_attrnum< maxattno) >> + max_attrnum = maxattno; >> + } >> + >> rel->min_attr = FirstLowInvalidHeapAttributeNumber + 1; >> - rel->max_attr = RelationGetNumberOfAttributes(relation); >> + rel->max_attr = max_attrnum; >> rel->reltablespace = RelationGetForm(relation)->reltablespace; >> >> This breaks the fundamental assumption that rel->max_attr is equal to >> RelationGetNumberOfAttributes of that table. My concern is: this >> change would probably be a wart, so it would be bug-prone in future >> versions. > > Hmm. I believe that once RelOptInfo is created all attributes > defined in it is safely accessed. Is it a wrong assumption? The patch you proposed seems to fix the issue well for the current version of PG, but I'm a bit scared to have such an assumption (ie, to include columns in a rel's tlist that are not defined anywhere in the system catalogs). In future we might add eg, a lsyscache.c routine for some planning use that are given the attr number of a column as an argument, like get_attavgwidth, and if so, it would be easily conceivable that that routine would error out for such an undefined column. (get_attavgwidth would return 0, not erroring out, though.) > Actually RelationGetNumberOfAttributes is used in few distinct > places while planning. > build_physical_tlist is not used for > foreign relations. For UPDATE/DELETE, that function would not be called for a foreign target in the posetgres_fdw case, as CTID is requested (see use_physical_tlist), but otherwise that function may be called if possible. No? > If we don't accept the expanded tupdesc for base relations, the > another way I can find is transforming the foreign relation into > something another like a subquery, or allowing expansion of > attribute list of a base relation... Sorry, I don't understand this fully, but there seems to be the same concern as mentioned above. >> Another thing on the new version: >> >> @@ -1575,6 +1632,19 @@ build_physical_tlist(PlannerInfo *root, >> RelOptInfo *rel) >> relation = heap_open(rte->relid, NoLock); >> >> numattrs = RelationGetNumberOfAttributes(relation); >> + >> + /* >> + * Foreign tables may have expanded with some junk columns. Punt >> + * in the case. > ... >> I think this would disable the optimization on projection in foreign >> scans, causing performance regression. > > Well, in update/delete cases, create_plan_recurse on foreign scan > is called with CP_EXACT_TLIST in create_modifytable_plan That's not necessarily true; consider UPDATE/DELETE on a local join; in that case the topmost plan node for a subplan of a ModifyTable would be a join, and if that's a NestLoop, create_plan_recurse would call create_nestloop_plan, which would recursively call create_plan_recurse for its inner/outer subplans with flag=0, not CP_EXACT_TLIST. > so the > code path is not actually used. I think this is true for the postgres_fdw case; because use_physical_tlist would decide not to do build_physical_tlist for the reason mentioned above. BUT my question here is: why do we need the change to build_physical_tlist? >>> Since this uses fdw_scan_tlist so it is theoretically >>> back-patchable back to 9.6. >> >> IIRC, the fdw_scan_tlist stuff was introduced in PG9.5 as part of join >> pushdown infrastructure, so I think your patch can be back-patched to >> PG9.5, but I don't think that's enough; IIRC, this issue was >> introduced in PG9.3, so a solution for this should be back-patch-able >> to PG9.3, I think. > > In the previous version, fdw_scan_tlist is used to hold only > additional (junk) columns. I think that we can get rid of the > variable by scanning the full tlist for junk columns. Apparently > it's differnt patch for such versions. I'm not sure how much it > is invasive for now but will consider. Sorry, I don't fully understand this. Could you elaborate a bit more on this? >>> 0001-Add-test-for-postgres_fdw-foreign-parition-update.patch >>> >>> This should fail for unpatched postgres_fdw. (Just for demonstration) >> >> +CREATE TABLE p1 (a int, b int); >> +CREATE TABLE c1 (LIKE p1) INHERITS (p1); >> +CREATE TABLE c2 (LIKE p1) INHERITS (p1); >> +CREATE FOREIGN TABLE fp1 (a int, b int) >> + SERVER loopback OPTIONS (table_name 'p1'); >> +INSERT INTO c1 VALUES (0, 1); >> +INSERT INTO c2 VALUES (1, 1); >> +SELECT tableoid::int - (SELECT min(tableoid) FROM fp1)::int AS >> toiddiff, ctid, * FROM fp1; >> >> Does it make sense to evaluate toiddiff? I think that should always >> be 0. > > Right. it is checking that the values are not those of remote > table oids. Sorry, my explanation was not enough, but that seems to me more complicated than necessary. How about just evaluating tableoid::regclass, instead of toiddiff? > ======= >>> 0003-Fix-of-foreign-update-bug-of-PgFDW.patch >>> >>> Fix of postgres_fdw for this problem. >> >> Sorry, I have not looked at it closely yet, but before that I'd like >> to discuss the direction we go in. I'm not convinced that your >> approach is the right direction, so as promised, I wrote a patch using >> the Param-based approach, and compared the two approaches. Attached >> is a WIP patch for that, which includes the 0003 patch. I don't think >> there would be any warts as discussed above in the Param-based >> approach for now. (That approach modifies the planner so that the >> targetrel's tlist would contain Params as well as Vars/PHVs, so >> actually, it breaks the planner assumption that a rel's tlist would >> only include Vars/PHVs, but I don't find any issues on that at least >> for now. Will look into that in more detail.) And I don't think >> there would be any concern about performance regression, either. >> Maybe I'm missing something, though. >> >> What do you think about that? > > Hmm. It is beyond my understanding. Great work (for me)! I just implemented Tom's idea. I hope I did that correctly. > I confirmed that a FOREIGN_PARAM_EXEC is evaluated and stored > into the parent node. For the mentioned Merge/Sort/ForeignScan > case, Sort node takes the parameter value via projection. I > didn't know PARAM_EXEC works that way. I consulted nodeNestLoop > but not fully understood. > > So I think it works. I still don't think expanded tupledesc is > not wart but this is smarter than that. Addition to that, it > seems back-patchable. I must admit that yours is better. As mentioned above, I'm a bit scared of the idea that we include columns not defined anywhere in the system catalogs in a rel's tlist. For the reason mentioned above, I think we should avoid such a thing, IMO. >> Note about the attached: I tried to invent a utility for >> generate_new_param like SS_make_initplan_output_param as mentioned in >> [1], but since the FDW API doesn't pass PlannerInfo to the FDW, I >> think the FDW can't call the utility the same way. Instead, I >> modified the planner so that 1) the FDW adds Params without setting >> PARAM_EXEC Param IDs using a new function, and then 2) the core fixes >> the IDs. > > Agreed on not having PlannerInfo. I'll re-study this. Some > comments on this right now are the follows. Thanks for the comments! > It seems reserving the name remotetableoid, which doen't seem to > be used by users but not never. This has also been suggested by Tom [2]. > Maybe paramid space of FOREIGN_PARAM is not necessarily be the > same with ordinary params that needs signalling aid. Yeah, but I modified the planner so that it can distinguish one from the other; because I think it's better to avoid unneeded SS_finalize_plan processing when only generating foreign Params, and/or minimize the cost in set_plan_references by only converting foreign Params into simple Vars using search_indexed_tlist_for_non_var, which are both expensive. One thing I noticed is: in any approach, I think use_physical_tlist needs to be modified so that it disables doing build_physical_tlist for a foreign scan in the case where the FDW added resjunk columns for UPDATE/DELETE that are different from user/system columns of the foreign table; else such columns would not be emitted from the foreign scan. Best regards, Etsuro Fujita [2] https://www.postgresql.org/message-id/8627.1526591849%40sss.pgh.pa.us
pgsql-hackers by date: