Re: [v9.5] Custom Plan API - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: [v9.5] Custom Plan API |
Date | |
Msg-id | CA+Tgmoa=tH5a6hpxEKFJzEkLJhZRYp5k-cTnLDXxfwUkm=6EpA@mail.gmail.com Whole thread Raw |
In response to | Re: [v9.5] Custom Plan API (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
List | pgsql-hackers |
On Thu, Sep 4, 2014 at 7:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Regarding to the attached three patches: > [1] custom-path and hook > It adds register_custom_path_provider() interface for registration of > custom-path entrypoint. Callbacks are invoked on set_plain_rel_pathlist > to offer alternative scan path on regular relations. > I may need to explain the terms in use. I calls the path-node custom-path > that is the previous step of population of plan-node (like custom-scan > and potentially custom-join and so on). The node object created by > CreateCustomPlan() is called custom-plan because it is abstraction for > all the potential custom-xxx node; custom-scan is the first of all. I don't think it's a good thing that add_custom_path_type is declared as void (*)(void *) rather than having a real type. I suggest we add the path-creation callback function to CustomPlanMethods instead, like this: void (*CreateCustomScanPath)(PlannerInfo *root, RelOptInfo *baserel, RangeTblEntry *rte); Then, register_custom_path_provider() can just take CustomPathMethods * as an argument; and create_customscan_paths can just walk the list of CustomPlanMethods objects and call CreateCustomScanPath for each one where that is non-NULL. This conflates the path generation mechanism with the type of path getting generated a little bit, but I don't see any real downside to that. I don't see a reason why you'd ever want two different providers to offer the same type of custompath. Don't the changes to src/backend/optimizer/plan/createplan.c belong in patch #2? > [2] custom-scan node > It adds custom-scan node support. The custom-scan node is expected to > generate contents of a particular relation or sub-plan according to its > custom-logic. > Custom-scan provider needs to implement callbacks of CustomScanMethods > and CustomExecMethods. Once a custom-scan node is populated from > custom-path node, the backend calls back these methods in the planning > and execution stage. It looks to me like this patch is full of holdovers from its earlier life as a more-generic CustomPlan node. In particular, it contains numerous defenses against the case where scanrelid != 0. These are confusingly written as scanrelid > 0, but I think really they're just bogus altogether: if this is specifically a CustomScan, not a CustomPlan, then the relid should always be filled in. Please consider what can be simplified here. The comment in _copyCustomScan looks bogus to me. I think we should *require* a static method table. In create_custom_plan, you if (IsA(custom_plan, CustomScan)) { lots of stuff; } else elog(ERROR, ...). I think it would be clearer to write if (!IsA(custom_plan, CustomScan)) elog(ERROR, ...); lots of stuff; > Also, regarding to the use-case of multi-exec interface. > Below is an EXPLAIN output of PG-Strom. It shows the custom GpuHashJoin has > two sub-plans; GpuScan and MultiHash. > GpuHashJoin is stacked on the GpuScan. It is a case when these nodes utilize > multi-exec interface for more efficient data exchange between the nodes. > GpuScan already keeps a data structure that is suitable to send to/recv from > GPU devices and constructed on the memory segment being DMA available. > If we have to form a tuple, pass it via row-by-row interface, then deform it, > it will become a major performance degradation in this use case. > > postgres=# explain select * from t10 natural join t8 natural join t9 where x < 10; > QUERY PLAN > ----------------------------------------------------------------------------------------------- > Custom (GpuHashJoin) (cost=10979.56..90064.15 rows=333 width=49) > pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 2:<t8.data>, 5:[t8.aid], 6:[t9.bid] > hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid)) > -> Custom (GpuScan) on t10 (cost=10000.00..88831.26 rows=3333327 width=16) > Host References: aid, bid, x > Device References: x > Device Filter: (x < 10::double precision) > -> Custom (MultiHash) (cost=464.56..464.56 rows=1000 width=41) > hash keys: aid, bid > -> Hash Join (cost=60.06..464.56 rows=1000 width=41) > Hash Cond: (t9.data = t8.data) > -> Index Scan using t9_pkey on t9 (cost=0.29..357.29 rows=10000 width=37) > -> Hash (cost=47.27..47.27 rows=1000 width=37) > -> Index Scan using t8_pkey on t8 (cost=0.28..47.27 rows=1000 width=37) > Planning time: 0.810 ms > (15 rows) Why can't the Custom(GpuHashJoin) node build the hash table internally instead of using a separate node? Also, for this patch we are only considering custom scan. Custom join is another patch. We don't need to provide infrastructure for that patch in this one. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: