Re: [v9.5] Custom Plan API - Mailing list pgsql-hackers
From | Kouhei Kaigai |
---|---|
Subject | Re: [v9.5] Custom Plan API |
Date | |
Msg-id | 9A28C8860F777E439AA12E8AEA7694F8FE5CBE@BPXM15GP.gisp.nec.co.jp Whole thread Raw |
In response to | [v9.5] Custom Plan API (Kouhei Kaigai <kaigai@ak.jp.nec.com>) |
Responses |
Re: [v9.5] Custom Plan API
|
List | pgsql-hackers |
> On Thu, Sep 11, 2014 at 8:40 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > >> On Thu, Sep 11, 2014 at 11:24 AM, Kouhei Kaigai > >> <kaigai@ak.jp.nec.com> > >> wrote: > >> >> Don't the changes to src/backend/optimizer/plan/createplan.c > >> >> belong in patch #2? > >> >> > >> > The borderline between #1 and #2 is little bit bogus. So, I moved > >> > most of portion into #1, however, invocation of InitCustomScan > >> > (that is a callback in CustomPlanMethod) in create_custom_plan() is > still in #2. > >> > >> Eh, create_custom_scan() certainly looks like it is in #1 from here, > >> or at least part of it is. It calculates tlist and clauses and then > >> does nothing with them. That clearly can't be the right division. > >> > >> I think it would make sense to have create_custom_scan() compute > >> tlist and clauses first, and then pass those to CreateCustomPlan(). > >> Then you don't need a separate InitCustomScan() - which is misnamed > >> anyway, since it has nothing to do with ExecInitCustomScan(). > >> > > The only reason why I put separate hooks here is, create_custom_scan() > > needs to know exact size of the CustomScan node (including private > > fields), however, it is helpful for extensions to kick its callback to > > initialize the node next to the common initialization stuff. > > Why does it need to know that? I don't see that it's doing anything that > requires knowing the size of that node, and if it is, I think it shouldn't > be. That should get delegated to the callback provided by the custom plan > provider. > Sorry, my explanation might be confusable. The create_custom_scan() does not need to know the exact size of the CustomScan (or its inheritance) because of the two separated hooks; one is node allocation time, the other is the tail of the series of initialization. If we have only one hook here, we need to have a mechanism to informs create_custom_scan() an exact size of the CustomScan node; including private fields managed by the provider, instead of the first hook on node allocation time. In this case, node allocation shall be processed by create_custom_scan() and it has to know exact size of the node to be allocated. How do I implement the feature here? Is the combination of static node size and callback on the tail more simple than the existing design that takes two individual hooks on create_custom_scan()? > > Regarding to the naming, how about GetCustomScan() instead of > InitCustomScan()? > > It follows the manner in create_foreignscan_plan(). > > I guess that's a bit better, but come to think of it, I'd really like to > avoid baking in the assumption that the custom path provider has to return > any particular type of plan node. A good start would be to give it a name > that doesn't imply that - e.g. PlanCustomPath(). > OK, I'll use this naming. > >> > OK, I revised. Now custom-scan assumes it has a particular valid > >> > relation to be scanned, so no code path with scanrelid == 0 at this > moment. > >> > > >> > Let us revisit this scenario when custom-scan replaces relation-joins. > >> > In this case, custom-scan will not be associated with a particular > >> > base- relation, thus it needs to admit a custom-scan node with > >> > scanrelid > >> == 0. > >> > >> Yeah, I guess the question there is whether we'll want let CustomScan > >> have scanrelid == 0 or require that CustomJoin be used there instead. > >> > > Right now, I cannot imagine a use case that requires individual > > CustomJoin node because CustomScan with scanrelid==0 (that performs > > like custom-plan rather than custom-scan in actually) is sufficient. > > > > If a CustomScan gets chosen instead of built-in join logics, it shall > > looks like a relation scan on the virtual one that is consists of two > > underlying relation. Callbacks of the CustomScan has a responsibility > > to join underlying relations; that is invisible from the core executor. > > > > It seems to me CustomScan with scanrelid==0 is sufficient to implement > > an alternative logic on relation joins, don't need an individual node > > from the standpoint of executor. > > That's valid logic, but it's not the only way to do it. If we have CustomScan > and CustomJoin, either of them will require some adaption to handle this > case. We can either allow a custom scan that isn't scanning any particular > relation (i.e. scanrelid == 0), or we can allow a custom join that has no > children. I don't know which way will come out cleaner, and I think it's > good to leave that decision to one side for now. > Yep. I agree with you. It may not be productive discussion to conclude this design topic right now. Let's assume CustomScan scans on a particular relation (scanrelid != 0) on the first revision. > >> >> Why can't the Custom(GpuHashJoin) node build the hash table > >> >> internally instead of using a separate node? > >> >> > >> > It's possible, however, it prevents to check sub-plans using > >> > EXPLAIN if we manage inner-plans internally. So, I'd like to have a > >> > separate node being connected to the inner-plan. > >> > >> Isn't that just a matter of letting the EXPLAIN code print more stuff? > >> Why can't it? > >> > > My GpuHashJoin takes multiple relations to load them a hash-table. > > On the other hand, Plan node can have two underlying relations at most > > (inner/outer). Outer-side is occupied by the larger relation, so it > > needs to make multiple relations visible using inner-branch. > > If CustomScan can has a list of multiple underlying plan-nodes, like > > Append node, it can represent the structure above in straightforward > > way, but I'm uncertain which is the better design. > > Right. I think the key point is that it is *possible* to make this work > without a multiexec interface, and it seems like we're agreed that it is. > Now perhaps we will decide that there is enough benefit in having multiexec > support that we want to do it anyway, but it's clearly not a hard requirement, > because it can be done without that in the way you describe here. Let's > leave to the future the decision as to how to proceed here; getting the > basic thing done is hard enough. > OK, let's postpone the discussion on the custom-join support. Either of approaches (1. multi-exec support, or 2. multiple subplans like Append) is sufficient for this purpose, and the multi-exec interface is a way to implement it, not a goal. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
pgsql-hackers by date: