Home > mailing lists

Re: [v9.5] Custom Plan API - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: [v9.5] Custom Plan API
Date	September 8, 2014 16:40:11
Msg-id	CA+Tgmoa=tH5a6hpxEKFJzEkLJhZRYp5k-cTnLDXxfwUkm=6EpA@mail.gmail.com Whole thread Raw
In response to	Re: [v9.5] Custom Plan API (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List	pgsql-hackers

Tree view

On Thu, Sep 4, 2014 at 7:57 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> Regarding to the attached three patches:
> [1] custom-path and hook
> It adds register_custom_path_provider() interface for registration of
> custom-path entrypoint. Callbacks are invoked on set_plain_rel_pathlist
> to offer alternative scan path on regular relations.
> I may need to explain the terms in use. I calls the path-node custom-path
> that is the previous step of population of plan-node (like custom-scan
> and potentially custom-join and so on). The node object created by
> CreateCustomPlan() is called custom-plan because it is abstraction for
> all the potential custom-xxx node; custom-scan is the first of all.

I don't think it's a good thing that add_custom_path_type is declared
as void (*)(void *) rather than having a real type.  I suggest we add
the path-creation callback function to CustomPlanMethods instead, like
this:

void (*CreateCustomScanPath)(PlannerInfo *root, RelOptInfo *baserel,
RangeTblEntry *rte);

Then, register_custom_path_provider() can just take CustomPathMethods
* as an argument; and create_customscan_paths can just walk the list
of CustomPlanMethods objects and call CreateCustomScanPath for each
one where that is non-NULL.  This conflates the path generation
mechanism with the type of path getting generated a little bit, but I
don't see any real downside to that.  I don't see a reason why you'd
ever want two different providers to offer the same type of
custompath.

Don't the changes to src/backend/optimizer/plan/createplan.c belong in patch #2?

> [2] custom-scan node
> It adds custom-scan node support. The custom-scan node is expected to
> generate contents of a particular relation or sub-plan according to its
> custom-logic.
> Custom-scan provider needs to implement callbacks of CustomScanMethods
> and CustomExecMethods. Once a custom-scan node is populated from
> custom-path node, the backend calls back these methods in the planning
> and execution stage.

It looks to me like this patch is full of holdovers from its earlier
life as a more-generic CustomPlan node.  In particular, it contains
numerous defenses against the case where scanrelid != 0.  These are
confusingly written as scanrelid > 0, but I think really they're just
bogus altogether: if this is specifically a CustomScan, not a
CustomPlan, then the relid should always be filled in.  Please
consider what can be simplified here.

The comment in _copyCustomScan looks bogus to me.  I think we should
*require* a static method table.

In create_custom_plan, you if (IsA(custom_plan, CustomScan)) { lots of
stuff; } else elog(ERROR, ...).  I think it would be clearer to write
if (!IsA(custom_plan, CustomScan)) elog(ERROR, ...); lots of stuff;

> Also, regarding to the use-case of multi-exec interface.
> Below is an EXPLAIN output of PG-Strom. It shows the custom GpuHashJoin has
> two sub-plans; GpuScan and MultiHash.
> GpuHashJoin is stacked on the GpuScan. It is a case when these nodes utilize
> multi-exec interface for more efficient data exchange between the nodes.
> GpuScan already keeps a data structure that is suitable to send to/recv from
> GPU devices and constructed on the memory segment being DMA available.
> If we have to form a tuple, pass it via row-by-row interface, then deform it,
> it will become a major performance degradation in this use case.
>
> postgres=# explain select * from t10 natural join t8 natural join t9 where x < 10;
>                                           QUERY PLAN
> -----------------------------------------------------------------------------------------------
>  Custom (GpuHashJoin)  (cost=10979.56..90064.15 rows=333 width=49)
>    pseudo scan tlist: 1:(t10.bid), 3:(t10.aid), 4:<t10.x>, 2:<t8.data>, 5:[t8.aid], 6:[t9.bid]
>    hash clause 1: ((t8.aid = t10.aid) AND (t9.bid = t10.bid))
>    ->  Custom (GpuScan) on t10  (cost=10000.00..88831.26 rows=3333327 width=16)
>          Host References: aid, bid, x
>          Device References: x
>          Device Filter: (x < 10::double precision)
>    ->  Custom (MultiHash)  (cost=464.56..464.56 rows=1000 width=41)
>          hash keys: aid, bid
>          ->  Hash Join  (cost=60.06..464.56 rows=1000 width=41)
>                Hash Cond: (t9.data = t8.data)
>                ->  Index Scan using t9_pkey on t9  (cost=0.29..357.29 rows=10000 width=37)
>                ->  Hash  (cost=47.27..47.27 rows=1000 width=37)
>                      ->  Index Scan using t8_pkey on t8  (cost=0.28..47.27 rows=1000 width=37)
>  Planning time: 0.810 ms
> (15 rows)

Why can't the Custom(GpuHashJoin) node build the hash table internally
instead of using a separate node?

Also, for this patch we are only considering custom scan.  Custom join
is another patch.  We don't need to provide infrastructure for that
patch in this one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-hackers by date:

From: Jeff Janes
Date: 08 September 2014, 16:28:19
Subject: Re: PENDING_LIST_CLEANUP_SIZE - maximum size of GIN pending list Re: HEAD seems to generate larger WAL regarding GIN index

From: Merlin Moncure
Date: 08 September 2014, 16:42:13
Subject: Re: Scaling shared buffer eviction

Re: [v9.5] Custom Plan API - Mailing list pgsql-hackers

Previous

Next