Home > mailing lists
Re: [v9.5] Custom Plan API - Mailing list pgsql-hackers

From	Kouhei Kaigai
Subject	Re: [v9.5] Custom Plan API
Date	September 15, 2014 12:39:12
Msg-id	9A28C8860F777E439AA12E8AEA7694F8FE5CBE@BPXM15GP.gisp.nec.co.jp Whole thread Raw
In response to	[v9.5] Custom Plan API (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses	Re: [v9.5] Custom Plan API
List	pgsql-hackers
Tree view
> On Thu, Sep 11, 2014 at 8:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> >> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai
> >> <kaigai@ak.jp.nec.com>
> >> wrote:
> >> >> Don't the changes to src/backend/optimizer/plan/createplan.c
> >> >> belong in patch #2?
> >> >>
> >> > The borderline between #1 and #2 is little bit bogus. So, I moved
> >> > most of portion into #1, however, invocation of InitCustomScan
> >> > (that is a callback in CustomPlanMethod) in create_custom_plan() is
> still in #2.
> >>
> >> Eh, create_custom_scan() certainly looks like it is in #1 from here,
> >> or at least part of it is.  It calculates tlist and clauses and then
> >> does nothing with them.  That clearly can't be the right division.
> >>
> >> I think it would make sense to have create_custom_scan() compute
> >> tlist and clauses first, and then pass those to CreateCustomPlan().
> >> Then you don't need a separate InitCustomScan() - which is misnamed
> >> anyway, since it has nothing to do with ExecInitCustomScan().
> >>
> > The only reason why I put separate hooks here is, create_custom_scan()
> > needs to know exact size of the CustomScan node (including private
> > fields), however, it is helpful for extensions to kick its callback to
> > initialize the node next to the common initialization stuff.
> 
> Why does it need to know that?  I don't see that it's doing anything that
> requires knowing the size of that node, and if it is, I think it shouldn't
> be.  That should get delegated to the callback provided by the custom plan
> provider.
> 
Sorry, my explanation might be confusable. The create_custom_scan() does not
need to know the exact size of the CustomScan (or its inheritance) because of
the two separated hooks; one is node allocation time, the other is the tail
of the series of initialization.
If we have only one hook here, we need to have a mechanism to informs
create_custom_scan() an exact size of the CustomScan node; including private
fields managed by the provider, instead of the first hook on node allocation
time. In this case, node allocation shall be processed by create_custom_scan()
and it has to know exact size of the node to be allocated.

How do I implement the feature here? Is the combination of static node size
and callback on the tail more simple than the existing design that takes two
individual hooks on create_custom_scan()?

> > Regarding to the naming, how about GetCustomScan() instead of
> InitCustomScan()?
> > It follows the manner in create_foreignscan_plan().
> 
> I guess that's a bit better, but come to think of it, I'd really like to
> avoid baking in the assumption that the custom path provider has to return
> any particular type of plan node.  A good start would be to give it a name
> that doesn't imply that - e.g. PlanCustomPath().
> 
OK, I'll use this naming.

> >> > OK, I revised. Now custom-scan assumes it has a particular valid
> >> > relation to be scanned, so no code path with scanrelid == 0 at this
> moment.
> >> >
> >> > Let us revisit this scenario when custom-scan replaces relation-joins.
> >> > In this case, custom-scan will not be associated with a particular
> >> > base- relation, thus it needs to admit a custom-scan node with
> >> > scanrelid
> >> == 0.
> >>
> >> Yeah, I guess the question there is whether we'll want let CustomScan
> >> have scanrelid == 0 or require that CustomJoin be used there instead.
> >>
> > Right now, I cannot imagine a use case that requires individual
> > CustomJoin node because CustomScan with scanrelid==0 (that performs
> > like custom-plan rather than custom-scan in actually) is sufficient.
> >
> > If a CustomScan gets chosen instead of built-in join logics, it shall
> > looks like a relation scan on the virtual one that is consists of two
> > underlying relation. Callbacks of the CustomScan has a responsibility
> > to join underlying relations; that is invisible from the core executor.
> >
> > It seems to me CustomScan with scanrelid==0 is sufficient to implement
> > an alternative logic on relation joins, don't need an individual node
> > from the standpoint of executor.
> 
> That's valid logic, but it's not the only way to do it.  If we have CustomScan
> and CustomJoin, either of them will require some adaption to handle this
> case.  We can either allow a custom scan that isn't scanning any particular
> relation (i.e. scanrelid == 0), or we can allow a custom join that has no
> children.  I don't know which way will come out cleaner, and I think it's
> good to leave that decision to one side for now.
> 
Yep. I agree with you. It may not be productive discussion to conclude
this design topic right now. Let's assume CustomScan scans on a particular
relation (scanrelid != 0) on the first revision.

> >> >> Why can't the Custom(GpuHashJoin) node build the hash table
> >> >> internally instead of using a separate node?
> >> >>
> >> > It's possible, however, it prevents to check sub-plans using
> >> > EXPLAIN if we manage inner-plans internally. So, I'd like to have a
> >> > separate node being connected to the inner-plan.
> >>
> >> Isn't that just a matter of letting the EXPLAIN code print more stuff?
> >> Why can't it?
> >>
> > My GpuHashJoin takes multiple relations to load them a hash-table.
> > On the other hand, Plan node can have two underlying relations at most
> > (inner/outer). Outer-side is occupied by the larger relation, so it
> > needs to make multiple relations visible using inner-branch.
> > If CustomScan can has a list of multiple underlying plan-nodes, like
> > Append node, it can represent the structure above in straightforward
> > way, but I'm uncertain which is the better design.
> 
> Right.  I think the key point is that it is *possible* to make this work
> without a multiexec interface, and it seems like we're agreed that it is.
> Now perhaps we will decide that there is enough benefit in having multiexec
> support that we want to do it anyway, but it's clearly not a hard requirement,
> because it can be done without that in the way you describe here.  Let's
> leave to the future the decision as to how to proceed here; getting the
> basic thing done is hard enough.
> 
OK, let's postpone the discussion on the custom-join support.
Either of approaches (1. multi-exec support, or 2. multiple subplans like
Append) is sufficient for this purpose, and the multi-exec interface is
a way to implement it, not a goal.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date:
From: Alexander Korotkov
Date: 15 September 2014, 11:59:08
Subject: Re: PoC: Partial sort
From: Heikki Linnakangas
Date: 15 September 2014, 12:42:32
Subject: Re: WAL format and API changes (9.5)
Re: [v9.5] Custom Plan API - Mailing list pgsql-hackers

Previous

Next