Re: [v9.3] writable foreign tables - Mailing list pgsql-hackers
From | Kohei KaiGai |
---|---|
Subject | Re: [v9.3] writable foreign tables |
Date | |
Msg-id | CADyhKSV68j_4SiQpM0wW4w+Vf64iBofEj1oD-autSuesHx6DQg@mail.gmail.com Whole thread Raw |
In response to | Re: [v9.3] writable foreign tables ("Albe Laurenz" <laurenz.albe@wien.gv.at>) |
Responses |
Re: [v9.3] writable foreign tables
|
List | pgsql-hackers |
2012/11/20 Albe Laurenz <laurenz.albe@wien.gv.at>: > Kohei KaiGai wrote: >>>> This design tries to kill two-birds with one-stone. >>>> It enables to add multiple number of pseudo-columns, >>>> not only "rowid", and makes possible to push-down >>>> complex calculation of target list into external computing >>>> resource. >>>> >>>> For example, when user gives the following query: >>>> >>>> SELECT ((c1 - c2) * (c2 - c3))^2 FROM ftable >>>> >>>> it contains a complex calculation in the target-list, >>>> thus, it also takes CPU cycles of local process. >>>> >>>> If we can replace the "((c1 - c2) * (c2 - c3))^2" by >>>> a reference to a pseudo-column that also references >>>> the calculation result on external node, it effectively >>>> off-load CPU cycles. >>>> >>>> In this case, all we need to do is (1) acquire a slot >>>> for pseudo-column at GetForeignRelInfo (2) replace >>>> TargetEntry::expr by Var node that reference this >>>> pseudo-column. >>>> >>>> It makes sense for performance optimization, so I don't >>>> want to restrict this handler for "rowid" only. > >>> I understand. >>> >>> But I think that you still can do that with the change that >>> I suggest. I suggest that GetForeignRelInfo (or whatever the >>> name ends up being) gets the AttrNumber of the proposed "rowid" >>> column in addition to the parameters you need for what >>> you want to do. >>> >>> Then nothing would keep you from defining those >>> pseudo-columns. But all the setup necessary for the "rowid" >>> column could be moved out of the FDW. So for the 99% of all >>> FDW which are only interested in the rowid, things would >>> get much simpler and they don't all have to implement the >>> same code. > >> All we have to do at get_relation_info() to deal with pseudo- >> columns (including alternatives of complex calculation, not >> only "rowid") is just expansion of rel->max_attr. >> So, if FDW is not interested in something except for "rowid", >> it can just inform the caller "Yeah, we need just one slot for >> a pseudo-column of rowid". Otherwise, it can return another >> value to acquire the slot for arbitrary pseudo-column. >> I don't think it is a problematic design. >> >> However, I'm skeptical 99% of FDWs don't support target-list >> push-down. At least, it was very desired feature when I had >> a talk at PGconf.EU last month. :-) > > I agree that PostgreSQL should make this technique possible. > > My idea should not make this any more difficult. > >> So, if we rename it to GetForeignRelWidth, is it defined as >> follows? >> >> extern AttrNumber >> GetForeignRelWidth(PlannerInfo *root, >> RelOptInfo *baserel, >> Oid foreigntableid, >> bool inhparent, >> List *targetList); >> >> Right now, inhparent makes no sense because foreign table >> does not support table inheritance, but it seems to me we >> shall have this functionality near future. > > I am thinking of this declaration: > > extern bool > GetForeignRelWidth(PlannerInfo *root, > RelOptInfo *baserel, > Oid foreigntableid, > bool inhparent, > List *targetList, > AttrNumber rowidAttr); > > Let me illustrate my idea with some code. > > Here's your fileGetForeignRelInfo: > > > static void > fileGetForeignRelInfo(PlannerInfo *root, > RelOptInfo *baserel, > Oid foreigntableid, > bool inhparent, List *targetList) > { > FileFdwPlanState *fdw_private; > ListCell *cell; > > fdw_private = (FileFdwPlanState *) > palloc0(sizeof(FileFdwPlanState)); > > foreach(cell, targetList) > { > TargetEntry *tle = lfirst(cell); > > if (tle->resjunk && > tle->resname && strcmp(tle->resname, "rowid")==0) > { > Bitmapset *temp = NULL; > AttrNumber anum_rowid; > DefElem *defel; > > pull_varattnos((Node *)tle, baserel->relid, &temp); > anum_rowid = bms_singleton_member(temp) > + FirstLowInvalidHeapAttributeNumber; > /* adjust attr_needed of baserel */ > if (anum_rowid > baserel->max_attr) > baserel->max_attr = anum_rowid; > defel = makeDefElem("anum_rowid", > (Node *)makeInteger(anum_rowid)); > fdw_private->anum_rowid = defel; > } > } > baserel->fdw_private = fdw_private; > } > > > I hope that this can be reduced to: > > > static bool > fileGetForeignRelRowid(PlannerInfo *root, > RelOptInfo *baserel, > Oid foreigntableid, > bool inhparent, > List *targetList, > AttrNumber rowidAttr) > { > FileFdwPlanState *fdw_private; > fdw_private = (FileFdwPlanState *) > palloc0(sizeof(FileFdwPlanState)); > > defel = makeDefElem("anum_rowid", > (Node *)makeInteger(rowidAttr)); > fdw_private->anum_rowid = defel; > > baserel->fdw_private = fdw_private; > > return true; /* we'll use rowid, so please extend baserel->max_attr > */ > } > > > That wouldn't mean that the FDW cannot define any other > pseudo-columns in this function, just the case of rowid > would be simplified. > > Does that make any sense? > I think "bool" is not suitable data type to inform how many pseudo-columns are needed for this scan on foreign-table. Probably, it is helpful to provide a helper function that fetches an attribute- number of pseudo "rowid" column from the supplied targetlist. If we have GetPseudoRowidColumn() at foreign/foreign.c, the avove routine can be rewritten as: static AttrNumber fileGetForeignRelWidth(PlannerInfo *root, RelOptInfo *baserel, Relation foreignrel, bool inhparent, List *targetList) { FileFdwPlanState *fdw_private; AttrNumber nattrs = RelationGetNumberOfAttributes(foreignrel); AttrNumber anum_rowid; fdw_private = palloc0(sizeof(FileFdwPlanState)); anum_rowid = GetPseudoRowidColumn(..., targetList); if (anum_rowid> 0) { Assert(anum_rowid > nattrs); fdw_private->anum_rowid = makeDefElem("anum_rowid",(Node *)makeInteger(anum_rowid)); nattrs = anum_rowid; } baserel->fdw_private = fdw_private; return nattrs; } In case when FDW drive wants to push-down other target entry into foreign- side, thus, it needs multiple pseudo-columns, it is decision of the extension. In addition, it does not take API change in the future, if some more additional pseudo-column is required by some other new features. How about your opinion? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
pgsql-hackers by date: