Thread: Custom Scan APIs (Re: Custom Plan node)

Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

06 November 2013, 12:14:06

The attached patches provide a feature to implement custom scan node
that allows extension to replace a part of plan tree with its own code
instead of the built-in logic.
In addition to the previous proposition, it enables us to integrate custom
scan as a part of candidate paths to be chosen by optimizer.
Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
a set of API stuff and a simple demonstration module that implement
regular table scan using inequality operator on ctid system column.
The second one (pgsql-v9.4-custom-scan-remote-join) enhances
postgres_fdw to support remote join capability.

Below is an example to show how does custom-scan work.

We usually run sequential scan even if clause has inequality operator
that references ctid system column.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
                       QUERY PLAN
--------------------------------------------------------
 Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=43)
   Filter: (ctid > '(10,0)'::tid)
(2 rows)

An extension that performs as custom-scan provider suggests
an alternative path, and its cost was less than sequential scan,
thus optimizer choose it.

postgres=# LOAD 'ctidscan';
LOAD
postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
                              QUERY PLAN
----------------------------------------------------------------------
 Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=43)
   Filter: (ctid > '(10,0)'::tid)
(2 rows)

Of course, more cost effective plan will win if exists.

postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
                            QUERY PLAN
-------------------------------------------------------------------
 Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=43)
   Index Cond: (a = 200)
   Filter: (ctid > '(10,0)'::tid)
(3 rows)

One other worthwhile example is remote-join enhancement on the
postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
managed by same foreign server.

postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
                                              WHERE f_leak(b) AND y
like '%aaa%';
                                                               QUERY PLAN
------------------------------------------------------------------------------------------------------
 Custom Scan (postgres-fdw)  (cost=100.00..100.01 rows=0 width=72)
   Output: a, b, x, y
   Filter: f_leak(b)
   Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
(4 rows)

---------------------------
How does it works
---------------------------
This patch adds two hooks (for base and join relations) around allpaths.c
and joinpaths.c. It allows extensions to add alternative paths to handle
scanning on the base relation or join of two relations.

Its callback routine can add CustomPath using add_path() to inform
optimizer this alternative scan path. Every custom-scan provider is
identified by its name being registered preliminary using the following
function.

  void register_custom_provider(const CustomProvider *provider);

CustomProvider is a set of name string and function pointers of callbacks.

Once CustomPath got chosen, create_scan_plan() construct a custom-
scan plan and calls back extension to initialize the node.
Rest of portions are similar to foreign scan, however, some of detailed
portions are different. For example, foreign scan is assumed to return
a tuple being formed according to table definition. On the other hand,
custom-scan does not have such assumption, so extension needs to
set tuple-descriptor on the scan tuple slot of ScanState structure by
itself.

In case of join, custom-scan performs as like a regular scan but it
returns tuples being already joined on underlying relations.
The patched postgres_fdw utilizes a hook at joinpaths.c to run
remote join.

------------
Issues
------------
I'm not 100% certain whether arguments of add_join_path_hook is
reasonable. I guess the first 7 arguments are minimum necessity.
The mergeclause_list and semifactors might be useful if someone
tries to implement its own mergejoin or semijoin. Also, I'm not
good at usage of path parameterization, but the last two arguments
are related to. Where is the best code to learn about its usage?

+/* Hook for plugins to add custom join path, in addition to default ones */
+typedef void (*add_join_path_hook_type)(PlannerInfo *root,
+                                       RelOptInfo *joinrel,
+                                       RelOptInfo *outerrel,
+                                       RelOptInfo *innerrel,
+                                       JoinType jointype,
+                                       SpecialJoinInfo *sjinfo,
+                                       List *restrictlist,
+                                       List *mergeclause_list,
+                                       SemiAntiJoinFactors *semifactors,
+                                       Relids param_source_rels,
+                                       Relids extra_lateral_rels);
+extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;


When we replace a join by a custom scan, where is the best target
for Var node that referenced relations under the join?
Usually, Var->varno is given as rtindex of tables being joined, then,
it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
It eventually determines the slot to be fetched on ExecEvalScalarVar().
On the other hand, we want Var-node to reference scan-tuple-slot
neither outer-slot nor inner-slot, if we replaced a join.
I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
Probably, it is a straightforward way to run remote join as like a scan,
but I'm not certain whether it is the best way.


I was concerned about FDW callback of postgres_fdw is designed to
take ForeignState argument. Because of this, remote join code did
not available to call these routines, even though most of custom-join
portions are similar.
So, I'd like to rework postgres_fdw first to put a common routine that
can be called from FDW portion and remote join portions.
However, I thought it makes reviewing hard due to the large scale of
changeset. So, I'd like to have a code reworking first.


----------------
Jobs to do
----------------
* SGML documentation like fdwhandler.sgml is still under construction.
* Probably, a wikipage may help people to understand it well.
* Postgres_fdw needs reworking to share common code for both of
  FDW and remote join portions.

Thanks,

2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> 2013/10/3 Robert Haas <robertmhaas@gmail.com>:
>> Well, there were a lot of problems with your demonstration, which have
>> already been pointed out upthread.  I'm skeptical about the idea of
>> simply replacing planner nodes wholesale, and Tom is outright opposed.
>>  I think you'll do better to focus on a narrower case - I'd suggest
>> custom scan nodes - and leave the rest as a project for another time.
>>
> Thanks, it makes me clear what we should target on v9.4 development.
> Towards the next commitfest, I'm planning to develop the following
> features:
> * CustomScan node that can run custom code instead of built-in
>   scan nodes.
> * Join-pushdown of postgres_fdw using the hook to be located on
>   the add_paths_to_joinrel(), for demonstration purpose.
> * Something new way to scan a relation; probably, your suggested
>   ctid scan with less or bigger qualifier is a good example, also for
>   demonstration purpose.
>
> Probably, above set of jobs will be the first chunk of this feature.
> Then, let's do other stuff like Append, Sort, Aggregate and so on
> later. It seems to me a reasonable strategy.
>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

11 November 2013, 10:43:13

Hi,

I tried to write up a wikipage to introduce how custom-scan works.

https://wiki.postgresql.org/wiki/CustomScanAPI

Any comments please.

2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> The attached patches provide a feature to implement custom scan node
> that allows extension to replace a part of plan tree with its own code
> instead of the built-in logic.
> In addition to the previous proposition, it enables us to integrate custom
> scan as a part of candidate paths to be chosen by optimizer.
> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
> a set of API stuff and a simple demonstration module that implement
> regular table scan using inequality operator on ctid system column.
> The second one (pgsql-v9.4-custom-scan-remote-join) enhances
> postgres_fdw to support remote join capability.
>
> Below is an example to show how does custom-scan work.
>
> We usually run sequential scan even if clause has inequality operator
> that references ctid system column.
>
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>                        QUERY PLAN
> --------------------------------------------------------
>  Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=43)
>    Filter: (ctid > '(10,0)'::tid)
> (2 rows)
>
> An extension that performs as custom-scan provider suggests
> an alternative path, and its cost was less than sequential scan,
> thus optimizer choose it.
>
> postgres=# LOAD 'ctidscan';
> LOAD
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>                               QUERY PLAN
> ----------------------------------------------------------------------
>  Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=43)
>    Filter: (ctid > '(10,0)'::tid)
> (2 rows)
>
> Of course, more cost effective plan will win if exists.
>
> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
>                             QUERY PLAN
> -------------------------------------------------------------------
>  Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=43)
>    Index Cond: (a = 200)
>    Filter: (ctid > '(10,0)'::tid)
> (3 rows)
>
> One other worthwhile example is remote-join enhancement on the
> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
> managed by same foreign server.
>
> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
>                                               WHERE f_leak(b) AND y
> like '%aaa%';
>                                                                QUERY PLAN
> ------------------------------------------------------------------------------------------------------
>  Custom Scan (postgres-fdw)  (cost=100.00..100.01 rows=0 width=72)
>    Output: a, b, x, y
>    Filter: f_leak(b)
>    Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
> (4 rows)
>
> ---------------------------
> How does it works
> ---------------------------
> This patch adds two hooks (for base and join relations) around allpaths.c
> and joinpaths.c. It allows extensions to add alternative paths to handle
> scanning on the base relation or join of two relations.
>
> Its callback routine can add CustomPath using add_path() to inform
> optimizer this alternative scan path. Every custom-scan provider is
> identified by its name being registered preliminary using the following
> function.
>
>   void register_custom_provider(const CustomProvider *provider);
>
> CustomProvider is a set of name string and function pointers of callbacks.
>
> Once CustomPath got chosen, create_scan_plan() construct a custom-
> scan plan and calls back extension to initialize the node.
> Rest of portions are similar to foreign scan, however, some of detailed
> portions are different. For example, foreign scan is assumed to return
> a tuple being formed according to table definition. On the other hand,
> custom-scan does not have such assumption, so extension needs to
> set tuple-descriptor on the scan tuple slot of ScanState structure by
> itself.
>
> In case of join, custom-scan performs as like a regular scan but it
> returns tuples being already joined on underlying relations.
> The patched postgres_fdw utilizes a hook at joinpaths.c to run
> remote join.
>
> ------------
> Issues
> ------------
> I'm not 100% certain whether arguments of add_join_path_hook is
> reasonable. I guess the first 7 arguments are minimum necessity.
> The mergeclause_list and semifactors might be useful if someone
> tries to implement its own mergejoin or semijoin. Also, I'm not
> good at usage of path parameterization, but the last two arguments
> are related to. Where is the best code to learn about its usage?
>
> +/* Hook for plugins to add custom join path, in addition to default ones */
> +typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> +                                       RelOptInfo *joinrel,
> +                                       RelOptInfo *outerrel,
> +                                       RelOptInfo *innerrel,
> +                                       JoinType jointype,
> +                                       SpecialJoinInfo *sjinfo,
> +                                       List *restrictlist,
> +                                       List *mergeclause_list,
> +                                       SemiAntiJoinFactors *semifactors,
> +                                       Relids param_source_rels,
> +                                       Relids extra_lateral_rels);
> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>
>
> When we replace a join by a custom scan, where is the best target
> for Var node that referenced relations under the join?
> Usually, Var->varno is given as rtindex of tables being joined, then,
> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
> It eventually determines the slot to be fetched on ExecEvalScalarVar().
> On the other hand, we want Var-node to reference scan-tuple-slot
> neither outer-slot nor inner-slot, if we replaced a join.
> I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
> Probably, it is a straightforward way to run remote join as like a scan,
> but I'm not certain whether it is the best way.
>
>
> I was concerned about FDW callback of postgres_fdw is designed to
> take ForeignState argument. Because of this, remote join code did
> not available to call these routines, even though most of custom-join
> portions are similar.
> So, I'd like to rework postgres_fdw first to put a common routine that
> can be called from FDW portion and remote join portions.
> However, I thought it makes reviewing hard due to the large scale of
> changeset. So, I'd like to have a code reworking first.
>
>
> ----------------
> Jobs to do
> ----------------
> * SGML documentation like fdwhandler.sgml is still under construction.
> * Probably, a wikipage may help people to understand it well.
> * Postgres_fdw needs reworking to share common code for both of
>   FDW and remote join portions.
>
> Thanks,
>
> 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> 2013/10/3 Robert Haas <robertmhaas@gmail.com>:
>>> Well, there were a lot of problems with your demonstration, which have
>>> already been pointed out upthread.  I'm skeptical about the idea of
>>> simply replacing planner nodes wholesale, and Tom is outright opposed.
>>>  I think you'll do better to focus on a narrower case - I'd suggest
>>> custom scan nodes - and leave the rest as a project for another time.
>>>
>> Thanks, it makes me clear what we should target on v9.4 development.
>> Towards the next commitfest, I'm planning to develop the following
>> features:
>> * CustomScan node that can run custom code instead of built-in
>>   scan nodes.
>> * Join-pushdown of postgres_fdw using the hook to be located on
>>   the add_paths_to_joinrel(), for demonstration purpose.
>> * Something new way to scan a relation; probably, your suggested
>>   ctid scan with less or bigger qualifier is a good example, also for
>>   demonstration purpose.
>>
>> Probably, above set of jobs will be the first chunk of this feature.
>> Then, let's do other stuff like Append, Sort, Aggregate and so on
>> later. It seems to me a reasonable strategy.
>>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

18 November 2013, 12:25:51

The attached patches are the revised custom-scan APIs.
- Custom-scan.sgml was added to introduce the way to write custom-scan
  provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
  handlers into two parts; common portion and fdw specific one.
  Executor callbacks of custom-scan code utilizes the common portion above
  because most of its implementations are equivalent.

I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works well.

Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.

Thanks,

2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> Hi,
>
> I tried to write up a wikipage to introduce how custom-scan works.
>
> https://wiki.postgresql.org/wiki/CustomScanAPI
>
> Any comments please.
>
> 2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> The attached patches provide a feature to implement custom scan node
>> that allows extension to replace a part of plan tree with its own code
>> instead of the built-in logic.
>> In addition to the previous proposition, it enables us to integrate custom
>> scan as a part of candidate paths to be chosen by optimizer.
>> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
>> a set of API stuff and a simple demonstration module that implement
>> regular table scan using inequality operator on ctid system column.
>> The second one (pgsql-v9.4-custom-scan-remote-join) enhances
>> postgres_fdw to support remote join capability.
>>
>> Below is an example to show how does custom-scan work.
>>
>> We usually run sequential scan even if clause has inequality operator
>> that references ctid system column.
>>
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>>                        QUERY PLAN
>> --------------------------------------------------------
>>  Seq Scan on t1  (cost=0.00..209.00 rows=3333 width=43)
>>    Filter: (ctid > '(10,0)'::tid)
>> (2 rows)
>>
>> An extension that performs as custom-scan provider suggests
>> an alternative path, and its cost was less than sequential scan,
>> thus optimizer choose it.
>>
>> postgres=# LOAD 'ctidscan';
>> LOAD
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>>                               QUERY PLAN
>> ----------------------------------------------------------------------
>>  Custom Scan (ctidscan) on t1  (cost=0.00..100.00 rows=3333 width=43)
>>    Filter: (ctid > '(10,0)'::tid)
>> (2 rows)
>>
>> Of course, more cost effective plan will win if exists.
>>
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
>>                             QUERY PLAN
>> -------------------------------------------------------------------
>>  Index Scan using t1_pkey on t1  (cost=0.29..8.30 rows=1 width=43)
>>    Index Cond: (a = 200)
>>    Filter: (ctid > '(10,0)'::tid)
>> (3 rows)
>>
>> One other worthwhile example is remote-join enhancement on the
>> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
>> managed by same foreign server.
>>
>> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
>>                                               WHERE f_leak(b) AND y
>> like '%aaa%';
>>                                                                QUERY PLAN
>> ------------------------------------------------------------------------------------------------------
>>  Custom Scan (postgres-fdw)  (cost=100.00..100.01 rows=0 width=72)
>>    Output: a, b, x, y
>>    Filter: f_leak(b)
>>    Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
>> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
>> (4 rows)
>>
>> ---------------------------
>> How does it works
>> ---------------------------
>> This patch adds two hooks (for base and join relations) around allpaths.c
>> and joinpaths.c. It allows extensions to add alternative paths to handle
>> scanning on the base relation or join of two relations.
>>
>> Its callback routine can add CustomPath using add_path() to inform
>> optimizer this alternative scan path. Every custom-scan provider is
>> identified by its name being registered preliminary using the following
>> function.
>>
>>   void register_custom_provider(const CustomProvider *provider);
>>
>> CustomProvider is a set of name string and function pointers of callbacks.
>>
>> Once CustomPath got chosen, create_scan_plan() construct a custom-
>> scan plan and calls back extension to initialize the node.
>> Rest of portions are similar to foreign scan, however, some of detailed
>> portions are different. For example, foreign scan is assumed to return
>> a tuple being formed according to table definition. On the other hand,
>> custom-scan does not have such assumption, so extension needs to
>> set tuple-descriptor on the scan tuple slot of ScanState structure by
>> itself.
>>
>> In case of join, custom-scan performs as like a regular scan but it
>> returns tuples being already joined on underlying relations.
>> The patched postgres_fdw utilizes a hook at joinpaths.c to run
>> remote join.
>>
>> ------------
>> Issues
>> ------------
>> I'm not 100% certain whether arguments of add_join_path_hook is
>> reasonable. I guess the first 7 arguments are minimum necessity.
>> The mergeclause_list and semifactors might be useful if someone
>> tries to implement its own mergejoin or semijoin. Also, I'm not
>> good at usage of path parameterization, but the last two arguments
>> are related to. Where is the best code to learn about its usage?
>>
>> +/* Hook for plugins to add custom join path, in addition to default ones */
>> +typedef void (*add_join_path_hook_type)(PlannerInfo *root,
>> +                                       RelOptInfo *joinrel,
>> +                                       RelOptInfo *outerrel,
>> +                                       RelOptInfo *innerrel,
>> +                                       JoinType jointype,
>> +                                       SpecialJoinInfo *sjinfo,
>> +                                       List *restrictlist,
>> +                                       List *mergeclause_list,
>> +                                       SemiAntiJoinFactors *semifactors,
>> +                                       Relids param_source_rels,
>> +                                       Relids extra_lateral_rels);
>> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>>
>>
>> When we replace a join by a custom scan, where is the best target
>> for Var node that referenced relations under the join?
>> Usually, Var->varno is given as rtindex of tables being joined, then,
>> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
>> It eventually determines the slot to be fetched on ExecEvalScalarVar().
>> On the other hand, we want Var-node to reference scan-tuple-slot
>> neither outer-slot nor inner-slot, if we replaced a join.
>> I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
>> Probably, it is a straightforward way to run remote join as like a scan,
>> but I'm not certain whether it is the best way.
>>
>>
>> I was concerned about FDW callback of postgres_fdw is designed to
>> take ForeignState argument. Because of this, remote join code did
>> not available to call these routines, even though most of custom-join
>> portions are similar.
>> So, I'd like to rework postgres_fdw first to put a common routine that
>> can be called from FDW portion and remote join portions.
>> However, I thought it makes reviewing hard due to the large scale of
>> changeset. So, I'd like to have a code reworking first.
>>
>>
>> ----------------
>> Jobs to do
>> ----------------
>> * SGML documentation like fdwhandler.sgml is still under construction.
>> * Probably, a wikipage may help people to understand it well.
>> * Postgres_fdw needs reworking to share common code for both of
>>   FDW and remote join portions.
>>
>> Thanks,
>>
>> 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>>> 2013/10/3 Robert Haas <robertmhaas@gmail.com>:
>>>> Well, there were a lot of problems with your demonstration, which have
>>>> already been pointed out upthread.  I'm skeptical about the idea of
>>>> simply replacing planner nodes wholesale, and Tom is outright opposed.
>>>>  I think you'll do better to focus on a narrower case - I'd suggest
>>>> custom scan nodes - and leave the rest as a project for another time.
>>>>
>>> Thanks, it makes me clear what we should target on v9.4 development.
>>> Towards the next commitfest, I'm planning to develop the following
>>> features:
>>> * CustomScan node that can run custom code instead of built-in
>>>   scan nodes.
>>> * Join-pushdown of postgres_fdw using the hook to be located on
>>>   the add_paths_to_joinrel(), for demonstration purpose.
>>> * Something new way to scan a relation; probably, your suggested
>>>   ctid scan with less or bigger qualifier is a good example, also for
>>>   demonstration purpose.
>>>
>>> Probably, above set of jobs will be the first chunk of this feature.
>>> Then, let's do other stuff like Append, Sort, Aggregate and so on
>>> later. It seems to me a reasonable strategy.
>>>
>>
>> --
>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>



--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Jim Mlodgenski

Date:

18 November 2013, 23:47:18

On Mon, Nov 18, 2013 at 7:25 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

The attached patches are the revised custom-scan APIs.

My initial review on this feature:

- The patches apply and build, but it produces a warning:

ctidscan.c: In function ‘CTidInitCustomScanPlan’:

ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]

I'd recommend that you split the part1 patch containing the ctidscan contrib into its own patch. It is more than half of the patch and its certainly stands on its own. IMO, I think ctidscan fits a very specific use case and would be better off being an extension instead of in contrib.

- Custom-scan.sgml was added to introduce the way to write custom-scan
provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
handlers into two parts; common portion and fdw specific one.
Executor callbacks of custom-scan code utilizes the common portion above
because most of its implementations are equivalent.

I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works well.

Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.

Thanks,

2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> Hi,
>
> I tried to write up a wikipage to introduce how custom-scan works.
>
> https://wiki.postgresql.org/wiki/CustomScanAPI
>
> Any comments please.
>
> 2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> The attached patches provide a feature to implement custom scan node
>> that allows extension to replace a part of plan tree with its own code
>> instead of the built-in logic.
>> In addition to the previous proposition, it enables us to integrate custom
>> scan as a part of candidate paths to be chosen by optimizer.
>> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
>> a set of API stuff and a simple demonstration module that implement
>> regular table scan using inequality operator on ctid system column.
>> The second one (pgsql-v9.4-custom-scan-remote-join) enhances
>> postgres_fdw to support remote join capability.
>>
>> Below is an example to show how does custom-scan work.
>>
>> We usually run sequential scan even if clause has inequality operator
>> that references ctid system column.
>>
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>> QUERY PLAN
>> --------------------------------------------------------
>> Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
>> Filter: (ctid > '(10,0)'::tid)
>> (2 rows)
>>
>> An extension that performs as custom-scan provider suggests
>> an alternative path, and its cost was less than sequential scan,
>> thus optimizer choose it.
>>
>> postgres=# LOAD 'ctidscan';
>> LOAD
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>> QUERY PLAN
>> ----------------------------------------------------------------------
>> Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
>> Filter: (ctid > '(10,0)'::tid)
>> (2 rows)
>>
>> Of course, more cost effective plan will win if exists.
>>
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
>> QUERY PLAN
>> -------------------------------------------------------------------
>> Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
>> Index Cond: (a = 200)
>> Filter: (ctid > '(10,0)'::tid)
>> (3 rows)
>>
>> One other worthwhile example is remote-join enhancement on the
>> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
>> managed by same foreign server.
>>
>> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
>> WHERE f_leak(b) AND y
>> like '%aaa%';
>> QUERY PLAN
>> ------------------------------------------------------------------------------------------------------
>> Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
>> Output: a, b, x, y
>> Filter: f_leak(b)
>> Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
>> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
>> (4 rows)
>>
>> ---------------------------
>> How does it works
>> ---------------------------
>> This patch adds two hooks (for base and join relations) around allpaths.c
>> and joinpaths.c. It allows extensions to add alternative paths to handle
>> scanning on the base relation or join of two relations.
>>
>> Its callback routine can add CustomPath using add_path() to inform
>> optimizer this alternative scan path. Every custom-scan provider is
>> identified by its name being registered preliminary using the following
>> function.
>>
>> void register_custom_provider(const CustomProvider *provider);
>>
>> CustomProvider is a set of name string and function pointers of callbacks.
>>
>> Once CustomPath got chosen, create_scan_plan() construct a custom-
>> scan plan and calls back extension to initialize the node.
>> Rest of portions are similar to foreign scan, however, some of detailed
>> portions are different. For example, foreign scan is assumed to return
>> a tuple being formed according to table definition. On the other hand,
>> custom-scan does not have such assumption, so extension needs to
>> set tuple-descriptor on the scan tuple slot of ScanState structure by
>> itself.
>>
>> In case of join, custom-scan performs as like a regular scan but it
>> returns tuples being already joined on underlying relations.
>> The patched postgres_fdw utilizes a hook at joinpaths.c to run
>> remote join.
>>
>> ------------
>> Issues
>> ------------
>> I'm not 100% certain whether arguments of add_join_path_hook is
>> reasonable. I guess the first 7 arguments are minimum necessity.
>> The mergeclause_list and semifactors might be useful if someone
>> tries to implement its own mergejoin or semijoin. Also, I'm not
>> good at usage of path parameterization, but the last two arguments
>> are related to. Where is the best code to learn about its usage?
>>
>> +/* Hook for plugins to add custom join path, in addition to default ones */
>> +typedef void (*add_join_path_hook_type)(PlannerInfo *root,
>> + RelOptInfo *joinrel,
>> + RelOptInfo *outerrel,
>> + RelOptInfo *innerrel,
>> + JoinType jointype,
>> + SpecialJoinInfo *sjinfo,
>> + List *restrictlist,
>> + List *mergeclause_list,
>> + SemiAntiJoinFactors *semifactors,
>> + Relids param_source_rels,
>> + Relids extra_lateral_rels);
>> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>>
>>
>> When we replace a join by a custom scan, where is the best target
>> for Var node that referenced relations under the join?
>> Usually, Var->varno is given as rtindex of tables being joined, then,
>> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
>> It eventually determines the slot to be fetched on ExecEvalScalarVar().
>> On the other hand, we want Var-node to reference scan-tuple-slot
>> neither outer-slot nor inner-slot, if we replaced a join.
>> I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
>> Probably, it is a straightforward way to run remote join as like a scan,
>> but I'm not certain whether it is the best way.
>>
>>
>> I was concerned about FDW callback of postgres_fdw is designed to
>> take ForeignState argument. Because of this, remote join code did
>> not available to call these routines, even though most of custom-join
>> portions are similar.
>> So, I'd like to rework postgres_fdw first to put a common routine that
>> can be called from FDW portion and remote join portions.
>> However, I thought it makes reviewing hard due to the large scale of
>> changeset. So, I'd like to have a code reworking first.
>>
>>
>> ----------------
>> Jobs to do
>> ----------------
>> * SGML documentation like fdwhandler.sgml is still under construction.
>> * Probably, a wikipage may help people to understand it well.
>> * Postgres_fdw needs reworking to share common code for both of
>> FDW and remote join portions.
>>
>> Thanks,
>>
>> 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>>> 2013/10/3 Robert Haas <robertmhaas@gmail.com>:
>>>> Well, there were a lot of problems with your demonstration, which have
>>>> already been pointed out upthread. I'm skeptical about the idea of
>>>> simply replacing planner nodes wholesale, and Tom is outright opposed.
>>>> I think you'll do better to focus on a narrower case - I'd suggest
>>>> custom scan nodes - and leave the rest as a project for another time.
>>>>
>>> Thanks, it makes me clear what we should target on v9.4 development.
>>> Towards the next commitfest, I'm planning to develop the following
>>> features:
>>> * CustomScan node that can run custom code instead of built-in
>>> scan nodes.
>>> * Join-pushdown of postgres_fdw using the hook to be located on
>>> the add_paths_to_joinrel(), for demonstration purpose.
>>> * Something new way to scan a relation; probably, your suggested
>>> ctid scan with less or bigger qualifier is a good example, also for
>>> demonstration purpose.
>>>
>>> Probably, above set of jobs will be the first chunk of this feature.
>>> Then, let's do other stuff like Append, Sort, Aggregate and so on
>>> later. It seems to me a reasonable strategy.
>>>
>>
>> --
>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

19 November 2013, 14:42:15

Thanks for your review.

2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:
> My initial review on this feature:
> - The patches apply and build, but it produces a warning:
> ctidscan.c: In function ‘CTidInitCustomScanPlan’:
> ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]
>
This variable was only used in Assert() macro, so it causes a warning if you
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.

> I'd recommend that you split the part1 patch containing the ctidscan contrib
> into its own patch. It is more than half of the patch and its certainly
> stands on its own. IMO, I think ctidscan fits a very specific use case and
> would be better off being an extension instead of in contrib.
>
OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Jim Mlodgenski

Date:

22 November 2013, 14:20:52

KaiGai

On Tue, Nov 19, 2013 at 9:41 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Thanks for your review.

2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:
> My initial review on this feature:
> - The patches apply and build, but it produces a warning:
> ctidscan.c: In function ‘CTidInitCustomScanPlan’:
> ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]
>
This variable was only used in Assert() macro, so it causes a warning if you
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.

The warning is now gone.

> I'd recommend that you split the part1 patch containing the ctidscan contrib
> into its own patch. It is more than half of the patch and its certainly
> stands on its own. IMO, I think ctidscan fits a very specific use case and
> would be better off being an extension instead of in contrib.
>
OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
ctidscan portion, and the part-3 is remote join on postgres_fdw.

Attached is a patch for the documentation. I think the documentation still needs a little more work, but it is pretty close. I can add some more detail to it once finish adapting the hadoop_fdw to using the custom scan api and have a better understanding of all of the calls.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

custom-scan.sgml.patch

Re: Custom Scan APIs (Re: Custom Plan node)

From

Peter Eisentraut

Date:

27 November 2013, 02:40:28

contrib/ctidscan/ctidscan.c:44: indent with spaces.
contrib/ctidscan/ctidscan.c:250: indent with spaces.
contrib/ctidscan/ctidscan.c:266: trailing whitespace.
contrib/postgres_fdw/deparse.c:1044: indent with spaces.
contrib/postgres_fdw/postgres_fdw.c:940: indent with spaces.
src/backend/commands/explain.c:2097: indent with spaces.
src/backend/optimizer/plan/createplan.c:2097: trailing whitespace.

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

27 November 2013, 03:53:18

2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
> ctidscan portion, and the part-3 is remote join on postgres_fdw.

These three patches can be applied with no conflict onto 2013-11-27
HEAD, but some fixes are necessary to build because commit
784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21)
allows FunctionScan node to have multiple function expression, so Node
* funcexpr in CustomScan should be List *funcitons now.

I'll continue to review by applying patches onto 2013-11-19 HEAD.

-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

28 November 2013, 21:12:17

Thanks for the series of checks.

The attached ones are the revised patches.

I merged all the propositions from Jim. Thanks, it made the documentation
quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

An actual code changes are to follow the changes in FunctionScan when
CustomScan replaces a FunctionScan. It puts a List * object instead of
a single expression tree, to have multiple functions.

Nothing were changed from the previous version.

Best regards,

2013/11/27 Shigeru Hanada <shigeru.hanada@gmail.com>:
> 2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> OK, I split them off. The part-1 is custom-scan API itself, the part-2 is
>> ctidscan portion, and the part-3 is remote join on postgres_fdw.
>
> These three patches can be applied with no conflict onto 2013-11-27
> HEAD, but some fixes are necessary to build because commit
> 784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21)
> allows FunctionScan node to have multiple function expression, so Node
> * funcexpr in CustomScan should be List *funcitons now.
>
> I'll continue to review by applying patches onto 2013-11-19 HEAD.
>
> --
> Shigeru HANADA



--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

04 December 2013, 06:56:39

Hi KaiGai-san,


2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> The attached ones are the revised patches.
>
> I merged all the propositions from Jim. Thanks, it made the documentation
> quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.
>
> An actual code changes are to follow the changes in FunctionScan when
> CustomScan replaces a FunctionScan. It puts a List * object instead of
> a single expression tree, to have multiple functions.
>
> Nothing were changed from the previous version.

I first reviewed postgres_fdw portion of the patches to learn the
outline of Custom Plan.  Wiki page is also a good textbook of the
feature.  I have some random comments about the basic design of Custom
Plan:

(1) IIUC add_join_path and add_scan_path are added to allow extensions
to plug their code into planner.

(2) FDW framework has executor callbacks based on existing executor
nodes.  Is there any plan to integrate them into one way, or wrap on
by another?  I'm not sure that we should have two similar framework
side by side.
# I'm sorry if I've missed the past discussion about this issue.

(3) Internal routines such as is_self_managed_relation and
has_wholerow_reference seem to be useful for other FDWs.  Is it able
to move them  into core?

(4) postgres_fdw estimates costs of join by calculating local numbers.How about to support remote estimation by
throwingEXPLALAIN query
 
when use_remote_estimates = true.

-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

04 December 2013, 07:26:13

2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> I merged all the propositions from Jim. Thanks, it made the documentation
> quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.

I found some typos in documents and comments.  Please see attached
patch for detail.

--
Shigeru HANADA

Attachment

fix_typo.patch

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

04 December 2013, 15:23:36

Hanada-san,

Thanks for your reviewing,

2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:
> I first reviewed postgres_fdw portion of the patches to learn the
> outline of Custom Plan.  Wiki page is also a good textbook of the
> feature.  I have some random comments about the basic design of Custom
> Plan:
>
> (1) IIUC add_join_path and add_scan_path are added to allow extensions
> to plug their code into planner.
>
Almost yes. For more correctness, these hooks allows extensions to
plug paths they can provide into a particular join or scan. Then planner
will choose the cheapest one  according to the cost value.

> (2) FDW framework has executor callbacks based on existing executor
> nodes.  Is there any plan to integrate them into one way, or wrap on
> by another?  I'm not sure that we should have two similar framework
> side by side.
> # I'm sorry if I've missed the past discussion about this issue.
>
Probably, FDW has different role from the CustomScan API.
As literal, FDW performs as a bridge between a relation form and
an opaque external data source, to intermediate two different world
on behalf of a foreign table.
On the other hand, CustomScan allows to provide alternative logic
to scan or join particular relations, in addition to the built-in ones,
but does not perform on behalf of foreign tables.

Existing FDW is designed to implement a scan on an intangible
relation, thus it can assume some things; like a tuple returned
from FDW has equivalent TupleDesc as table definition, or it can
always use ExecScan() for all the cases.
So, I don't think these two frameworks should be consolidated
because it makes confusion on the existing extensions that
assumes FDW callbacks always has a particular foreign table
definition.

> (3) Internal routines such as is_self_managed_relation and
> has_wholerow_reference seem to be useful for other FDWs.  Is it able
> to move them  into core?
>
Probably, src/backend/foreign/foreign.c is a good host for them.

> (4) postgres_fdw estimates costs of join by calculating local numbers.
>  How about to support remote estimation by throwing EXPLALAIN query
> when use_remote_estimates = true.
>
I'm uncertain whether the cost value from remote EXPLAIN represents
right difficulty on the local side, because it indeed represents the
difficulty to join two relations on the remote side, however, does not
represents local job; that just fetches tuples from the result set of
remote query with table joining.
How about your opinion? Is the remote cost estimation value comparable
with local value?

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

04 December 2013, 15:29:32

Thanks for fixing many my carelessness.
I didn't know "seek" was an irregular verb...

Best regards,

2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:
> 2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> I merged all the propositions from Jim. Thanks, it made the documentation
>> quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab.
>
> I found some typos in documents and comments.  Please see attached
> patch for detail.
>
> --
> Shigeru HANADA



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

08 December 2013, 11:43:32

The attached patches include documentation fixup by Hanada-san,
and relocation of is_managed_relation (the portion to check whether
the relation is a foreign table managed by a particular FDW) and
has_wholerow_reference.
I didn't touch the EXPLAIN logic because I'm uncertain whether the
cost of remote join is reasonable towards the cost as an alternative
path to local joins.

Please check it. Thanks,

2013/12/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> Hanada-san,
>
> Thanks for your reviewing,
>
> 2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>:
>> I first reviewed postgres_fdw portion of the patches to learn the
>> outline of Custom Plan.  Wiki page is also a good textbook of the
>> feature.  I have some random comments about the basic design of Custom
>> Plan:
>>
>> (1) IIUC add_join_path and add_scan_path are added to allow extensions
>> to plug their code into planner.
>>
> Almost yes. For more correctness, these hooks allows extensions to
> plug paths they can provide into a particular join or scan. Then planner
> will choose the cheapest one  according to the cost value.
>
>> (2) FDW framework has executor callbacks based on existing executor
>> nodes.  Is there any plan to integrate them into one way, or wrap on
>> by another?  I'm not sure that we should have two similar framework
>> side by side.
>> # I'm sorry if I've missed the past discussion about this issue.
>>
> Probably, FDW has different role from the CustomScan API.
> As literal, FDW performs as a bridge between a relation form and
> an opaque external data source, to intermediate two different world
> on behalf of a foreign table.
> On the other hand, CustomScan allows to provide alternative logic
> to scan or join particular relations, in addition to the built-in ones,
> but does not perform on behalf of foreign tables.
>
> Existing FDW is designed to implement a scan on an intangible
> relation, thus it can assume some things; like a tuple returned
> from FDW has equivalent TupleDesc as table definition, or it can
> always use ExecScan() for all the cases.
> So, I don't think these two frameworks should be consolidated
> because it makes confusion on the existing extensions that
> assumes FDW callbacks always has a particular foreign table
> definition.
>
>> (3) Internal routines such as is_self_managed_relation and
>> has_wholerow_reference seem to be useful for other FDWs.  Is it able
>> to move them  into core?
>>
> Probably, src/backend/foreign/foreign.c is a good host for them.
>
>> (4) postgres_fdw estimates costs of join by calculating local numbers.
>>  How about to support remote estimation by throwing EXPLALAIN query
>> when use_remote_estimates = true.
>>
> I'm uncertain whether the cost value from remote EXPLAIN represents
> right difficulty on the local side, because it indeed represents the
> difficulty to join two relations on the remote side, however, does not
> represents local job; that just fetches tuples from the result set of
> remote query with table joining.
> How about your opinion? Is the remote cost estimation value comparable
> with local value?
>
> Thanks,
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>



--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

10 December 2013, 09:05:17

Hi KaiGai-san,

2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> The attached patches include documentation fixup by Hanada-san,
> and relocation of is_managed_relation (the portion to check whether
> the relation is a foreign table managed by a particular FDW) and
> has_wholerow_reference.
> I didn't touch the EXPLAIN logic because I'm uncertain whether the
> cost of remote join is reasonable towards the cost as an alternative
> path to local joins.
>
> Please check it. Thanks,

The patches could be applied cleanly, but I saw a compiler warning
about get_rel_relkind() in foreign.c, but it's minor issue.  Please
just add #include of utils/lsyscache.h there.

I have some more random comments about EXPLAIN.

1) You use "Operation" as the label of Custom Scan nodes in non-text
format, but it seems to me rather "provider name".  What is the string
shown there?

2) It would be nice if we can see the information about what the
Custom Scan node replaced in EXPLAIN output (even only in verbose
mode).  I know that we can't show plan tree below custom scan nodes,
because CS Provider can't obtain other candidates.  But even only
relation names used in the join or the scan would help users to
understand what is going on in Custom Scan.

Regards,
-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

11 December 2013, 14:32:45

Hanada-san,

Thanks for your reviewing.

2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>:
> Hi KaiGai-san,
>
> 2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> The attached patches include documentation fixup by Hanada-san,
>> and relocation of is_managed_relation (the portion to check whether
>> the relation is a foreign table managed by a particular FDW) and
>> has_wholerow_reference.
>> I didn't touch the EXPLAIN logic because I'm uncertain whether the
>> cost of remote join is reasonable towards the cost as an alternative
>> path to local joins.
>>
>> Please check it. Thanks,
>
> The patches could be applied cleanly, but I saw a compiler warning
> about get_rel_relkind() in foreign.c, but it's minor issue.  Please
> just add #include of utils/lsyscache.h there.
>
Fixed,

> I have some more random comments about EXPLAIN.
>
> 1) You use "Operation" as the label of Custom Scan nodes in non-text
> format, but it seems to me rather "provider name".  What is the string
> shown there?
>
I tried free-riding on the existing properties, but it does not make sense
indeed, as you pointed out.
I adjusted the explain.c to show "Custom-Provider" property for Custom-
Scan node, as follows.

postgres=# explain(format xml) select * from t1 where ctid > '(4,0)'::tid;
                        QUERY PLAN
----------------------------------------------------------
 <explain xmlns="http://www.postgresql.org/2009/explain">+
   <Query>                                               +
     <Plan>                                              +
       <Node-Type>Custom Scan</Node-Type>                +
       <Custom-Provider>ctidscan</Custom-Provider>       +
       <Relation-Name>t1</Relation-Name>                 +
       <Alias>t1</Alias>                                 +
       <Startup-Cost>0.00</Startup-Cost>                 +
       <Total-Cost>12.30</Total-Cost>                    +
       <Plan-Rows>410</Plan-Rows>                        +
       <Plan-Width>36</Plan-Width>                       +
       <Filter>(ctid > '(4,0)'::tid)</Filter>         +
     </Plan>                                             +
   </Query>                                              +
 </explain>
(1 row)

> 2) It would be nice if we can see the information about what the
> Custom Scan node replaced in EXPLAIN output (even only in verbose
> mode).  I know that we can't show plan tree below custom scan nodes,
> because CS Provider can't obtain other candidates.  But even only
> relation names used in the join or the scan would help users to
> understand what is going on in Custom Scan.
>
Even though I agree that it helps users to understand the plan,
it also has a headache to implement because CustomScan node
(and its super class) does not have an information which relations
are underlying. Probably, this functionality needs to show
the underlying relations on ExplainTargetRel() if CustomScan node
represents a scan instead of join. What data source can produce
the list of underlying relations here?
So, if it is not a significant restriction for users, I'd like to work on this
feature later.

The attached patch fixes up a minor warning around get_rel_relkind
and name of the property for custom-provider. Please check it.

Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

16 December 2013, 05:16:01

Hi Kaigai-san,

2013/12/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> 2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>:
>> The patches could be applied cleanly, but I saw a compiler warning
>> about get_rel_relkind() in foreign.c, but it's minor issue.  Please
>> just add #include of utils/lsyscache.h there.
>>
> Fixed,

Check.

>> I have some more random comments about EXPLAIN.
>>
>> 1) You use "Operation" as the label of Custom Scan nodes in non-text
>> format, but it seems to me rather "provider name".  What is the string
>> shown there?
>>
> I tried free-riding on the existing properties, but it does not make sense
> indeed, as you pointed out.
> I adjusted the explain.c to show "Custom-Provider" property for Custom-
> Scan node, as follows.

New name seems better, it is what the node express.

>> 2) It would be nice if we can see the information about what the
>> Custom Scan node replaced in EXPLAIN output (even only in verbose
>> mode).  I know that we can't show plan tree below custom scan nodes,
>> because CS Provider can't obtain other candidates.  But even only
>> relation names used in the join or the scan would help users to
>> understand what is going on in Custom Scan.
>>
> Even though I agree that it helps users to understand the plan,
> it also has a headache to implement because CustomScan node
> (and its super class) does not have an information which relations
> are underlying. Probably, this functionality needs to show
> the underlying relations on ExplainTargetRel() if CustomScan node
> represents a scan instead of join. What data source can produce
> the list of underlying relations here?
> So, if it is not a significant restriction for users, I'd like to work on this
> feature later.

Agreed.  It would be enough that Custom Scan Providers can add
arbitrary information, such as "Remote SQL" of postgres_fdw, to
EXPLAIN result via core API.  Some kind of framework which helps
authors of Custom Scan Providers, but it should be considered after
the first cut.

> The attached patch fixes up a minor warning around get_rel_relkind
> and name of the property for custom-provider. Please check it.

The patch can be applied onto 2013-12-16 HEAD cleanly, and gives no
unexpected error/warinig.

I'm sorry to post separately, but I have some comments on document.

(1) ctidscan
Is session_preload_libraries available to enable the feature, like
shared_*** and local_***?  According to my trial it works fine like
two similar GUCs.

(2) postgres_fdw
JOIN push--down is a killer application of Custom Scan Provider
feature, so I think it's good to mention it in the "Remote Query
Optimization" section.

Codes for core and contrib seem fine, so I'll mark the patches "Ready
for committer" after the document enhancement.

Regards,
-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

KaiGai Kohei

Date:

16 December 2013, 07:39:59

Hi Hanada-san,

(2013/12/16 14:15), Shigeru Hanada wrote:
> I'm sorry to post separately, but I have some comments on document.
>
> (1) ctidscan
> Is session_preload_libraries available to enable the feature, like
> shared_*** and local_***?  According to my trial it works fine like
> two similar GUCs.
>
It shall be available; nothing different from the two parameters that
we have supported for long time. Sorry, I missed the new feature to
mention about.

> (2) postgres_fdw
> JOIN push--down is a killer application of Custom Scan Provider
> feature, so I think it's good to mention it in the "Remote Query
> Optimization" section.
>
I added an explanation about remote join execution on the section.
Probably, it help users understand why Custom Scan node is here
instead of Join node. Thanks for your suggestion.

Best regards,
--
OSS Promotion Center / The PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

16 December 2013, 10:15:26

KaiGai-san,

2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> (2013/12/16 14:15), Shigeru Hanada wrote:
>> (1) ctidscan
>> Is session_preload_libraries available to enable the feature, like
>> shared_*** and local_***?  According to my trial it works fine like
>> two similar GUCs.
>>
> It shall be available; nothing different from the two parameters that
> we have supported for long time. Sorry, I missed the new feature to
> mention about.

Check.

>> (2) postgres_fdw
>> JOIN push--down is a killer application of Custom Scan Provider
>> feature, so I think it's good to mention it in the "Remote Query
>> Optimization" section.
>>
> I added an explanation about remote join execution on the section.
> Probably, it help users understand why Custom Scan node is here
> instead of Join node. Thanks for your suggestion.

Check.

I think that these patches are enough considered to mark as "Ready for
Committer".

Regards,
-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

14 January 2014, 14:19:56

Hello,

The attached patches are the ones rebased to the latest git tree, but
no functional
changes from the previous revision on the commit-fest:Nov.
Hanada-san volunteered to review the series of patches, including the
portion for
postgres_fdw, then marked it as "ready for committer" on the last commit fest.
So, I hope someone of committer also volunteer to review the patches for final
checking.

* Part-1 - CustomScan APIs
This patch provides a set of interfaces to interact query-optimizer
and -executor
for extensions. The new add_scan_path_hook or add_join_path_hook allows to
offer alternative ways to scan a particular relation or to join a
particular relations.
Then, once the alternative ways are chosen by the optimizer,
associated callbacks
shall be kicked from the executor. In this case, extension has responsibility to
return a slot that hold a tuple (or empty for end of scan) being
scanned from the
underlying relation.

* Part-2 - contrib/ctidscan
This patch provides a simple example implementation of CustomScan API.
It enables to skip pages when inequality operators are given on ctid system
columns. That is, at least, better than sequential full-scan, so it usually wins
to SeqScan, but Index-scan is much better.

* Part-3 - remote join implementation
This patch provides an example to replace a join by a custom scan node that
runs on a result set of remote join query, on top of existing postgres_fdw
extension. The idea is, a result set of remote query looks like a relation but
intangible, thus, it is feasible to replace a local join by a scan on the result
set of a query executed on the remote host, if both of the relation to be joined
belongs to the identical foreign server.
This patch gives postgres_fdw a capability to run a join on the remote host.

Thanks,

2013/12/16 Shigeru Hanada <shigeru.hanada@gmail.com>:
> KaiGai-san,
>
> 2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
>> (2013/12/16 14:15), Shigeru Hanada wrote:
>>> (1) ctidscan
>>> Is session_preload_libraries available to enable the feature, like
>>> shared_*** and local_***?  According to my trial it works fine like
>>> two similar GUCs.
>>>
>> It shall be available; nothing different from the two parameters that
>> we have supported for long time. Sorry, I missed the new feature to
>> mention about.
>
> Check.
>
>>> (2) postgres_fdw
>>> JOIN push--down is a killer application of Custom Scan Provider
>>> feature, so I think it's good to mention it in the "Remote Query
>>> Optimization" section.
>>>
>> I added an explanation about remote join execution on the section.
>> Probably, it help users understand why Custom Scan node is here
>> instead of Join node. Thanks for your suggestion.
>
> Check.
>
> I think that these patches are enough considered to mark as "Ready for
> Committer".
>
> Regards,
> --
> Shigeru HANADA

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

27 January 2014, 09:32:31

Hackers,

Is somebody available to volunteer to review the custom-scan patch?

Even though Hanada-san acknowledged before, it seems to me this patch
has potentially arguable implementations. Even if you have enough time
to review whole of the code, it helps me if you can comment on the
following topics.

(1) Interface to add alternative paths instead of built-in join paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,                                       RelOptInfo *joinrel,
                                   RelOptInfo *outerrel,                                       RelOptInfo *innerrel,
                                  JoinType jointype,                                       SpecialJoinInfo *sjinfo,
                                 List *restrictlist,                                       List *mergeclause_list,
                                SemiAntiJoinFactors *semifactors,                                       Relids
param_source_rels,                                      Relids extra_lateral_rels); 
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

(2) CUSTOM_VAR for special Var reference

@@ -134,6 +134,7 @@ typedef struct Expr#define    INNER_VAR       65000       /* reference to inner subplan */#define
OUTER_VAR       65001       /* reference to outer subplan */#define    INDEX_VAR       65002       /* reference to
indexcolumn */ 
+#define    CUSTOM_VAR      65003       /* reference to custom column */

I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
instead of built-in joins, because its tuples being fetched are usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai
> Sent: Tuesday, January 14, 2014 11:20 PM
> To: Shigeru Hanada
> Cc: Kaigai, Kouhei(海外, 浩平); Jim Mlodgenski; Robert Haas; Tom Lane;
> PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hello,
>
> The attached patches are the ones rebased to the latest git tree, but no
> functional changes from the previous revision on the commit-fest:Nov.
> Hanada-san volunteered to review the series of patches, including the
> portion for postgres_fdw, then marked it as "ready for committer" on the
> last commit fest.
> So, I hope someone of committer also volunteer to review the patches for
> final checking.
>
> * Part-1 - CustomScan APIs
> This patch provides a set of interfaces to interact query-optimizer and
> -executor for extensions. The new add_scan_path_hook or add_join_path_hook
> allows to offer alternative ways to scan a particular relation or to join
> a particular relations.
> Then, once the alternative ways are chosen by the optimizer, associated
> callbacks shall be kicked from the executor. In this case, extension has
> responsibility to return a slot that hold a tuple (or empty for end of scan)
> being scanned from the underlying relation.
>
> * Part-2 - contrib/ctidscan
> This patch provides a simple example implementation of CustomScan API.
> It enables to skip pages when inequality operators are given on ctid system
> columns. That is, at least, better than sequential full-scan, so it usually
> wins to SeqScan, but Index-scan is much better.
>
> * Part-3 - remote join implementation
> This patch provides an example to replace a join by a custom scan node that
> runs on a result set of remote join query, on top of existing postgres_fdw
> extension. The idea is, a result set of remote query looks like a relation
> but intangible, thus, it is feasible to replace a local join by a scan on
> the result set of a query executed on the remote host, if both of the relation
> to be joined belongs to the identical foreign server.
> This patch gives postgres_fdw a capability to run a join on the remote host.
>
> Thanks,
>
>
> 2013/12/16 Shigeru Hanada <shigeru.hanada@gmail.com>:
> > KaiGai-san,
> >
> > 2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>:
> >> (2013/12/16 14:15), Shigeru Hanada wrote:
> >>> (1) ctidscan
> >>> Is session_preload_libraries available to enable the feature, like
> >>> shared_*** and local_***?  According to my trial it works fine like
> >>> two similar GUCs.
> >>>
> >> It shall be available; nothing different from the two parameters that
> >> we have supported for long time. Sorry, I missed the new feature to
> >> mention about.
> >
> > Check.
> >
> >>> (2) postgres_fdw
> >>> JOIN push--down is a killer application of Custom Scan Provider
> >>> feature, so I think it's good to mention it in the "Remote Query
> >>> Optimization" section.
> >>>
> >> I added an explanation about remote join execution on the section.
> >> Probably, it help users understand why Custom Scan node is here
> >> instead of Join node. Thanks for your suggestion.
> >
> > Check.
> >
> > I think that these patches are enough considered to mark as "Ready for
> > Committer".
> >
> > Regards,
> > --
> > Shigeru HANADA
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

27 January 2014, 14:39:01

KaiGai Kohei,

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> Is somebody available to volunteer to review the custom-scan patch?

I looked through it a bit and my first take away from it was that the
patches to actually use the new hooks were also making more changes to
the backend code, leaving me with the impression that the proposed
interface isn't terribly stable.  Perhaps those changes should have just
been in the first patch, but they weren't and that certainly gave me
pause.

I'm also not entirely convinced that this is the direction to go in when
it comes to pushing down joins to FDWs.  While that's certainly a goal
that I think we all share, this seems to be intending to add a
completely different feature which happens to be able to be used for
that.  For FDWs, wouldn't we only present the FDW with the paths where
the foreign tables for that FDW, or perhaps just a given foreign server,
are being joined?
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

28 January 2014, 00:23:09

Hi Stephen,

Thanks for your comments.

> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > Is somebody available to volunteer to review the custom-scan patch?
>
> I looked through it a bit and my first take away from it was that the patches
> to actually use the new hooks were also making more changes to the backend
> code, leaving me with the impression that the proposed interface isn't
> terribly stable.  Perhaps those changes should have just been in the first
> patch, but they weren't and that certainly gave me pause.
>
Yes, the part-1 patch provides a set of interface portion to interact
between the backend code and extension code. Rest of part-2 and part-3
portions are contrib modules that implements its feature on top of
custom-scan API.

> I'm also not entirely convinced that this is the direction to go in when
> it comes to pushing down joins to FDWs.  While that's certainly a goal that
> I think we all share, this seems to be intending to add a completely different
> feature which happens to be able to be used for that.  For FDWs, wouldn't
> we only present the FDW with the paths where the foreign tables for that
> FDW, or perhaps just a given foreign server, are being joined?
>
FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU acceleration
feature on top of this interface, that offers alternative way to scan or join
relations or potentially sort or aggregate.
Probably, it is too stretch interpretation if we implement radix-sort on top
of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down)
is a demonstration of custom-scan interface for application, but not designed
for a special purpose.

Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
side, it might be an idea to have common code (like a logic to check whether
the both relations to be joined belongs to same foreign server) on the backend
side as something like a gateway of them.

As an aside, what should be the scope of FDW interface?
In my understanding, it allows extension to implement "something" on behalf of
a particular data structure being declared with CREATE FOREIGN TABLE.
In other words, extension's responsibility is to generate a view of "something"
according to PostgreSQL' internal data structure, instead of the object itself.
On the other hands, custom-scan interface allows extensions to implement
alternative methods to scan or join particular relations, but it is not a role
to perform as a target being referenced in queries. In other words, it is methods
to access objects.
It is natural both features are similar because both of them intends extensions
to hook the planner and executor, however, its purpose is different.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Christian Convey

Date:

29 January 2014, 12:42:28

On Mon, Jan 27, 2014 at 7:14 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU acceleration
feature on top of this interface, that offers alternative way to scan or join
relations or potentially sort or aggregate.

I'm curious how this relates to the pluggable storage idea discussed here https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting and here

http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html

I haven't seen a specific proposal about how much functionality should be encapsulated by a pluggable storage system. But I wonder if that would be the best place for specialized table-scan code to end up?

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

29 January 2014, 13:31:46

2014-01-29 Christian Convey <christian.convey@gmail.com>:
>
> On Mon, Jan 27, 2014 at 7:14 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>>
>> FDW's join pushing down is one of the valuable use-cases of this
>> interface,
>> but not all. As you might know, my motivation is to implement GPU
>> acceleration
>> feature on top of this interface, that offers alternative way to scan or
>> join
>> relations or potentially sort or aggregate.
>
>
> I'm curious how this relates to the pluggable storage idea discussed here
> https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting and here
> http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html
>
> I haven't seen a specific proposal about how much functionality should be
> encapsulated by a pluggable storage system.  But I wonder if that would be
> the best place for specialized table-scan code to end up?
>
If you are interested in designing your own storage layer (thus it needs to
have own scan/writer implementation), FDW is an option currently available.
It defines a set of interface that allows extensions to generate "things to be
there" on the fly. It does not force to perform as a client of remote database,
even though it was originally designed for dblink purpose.
In other words, FDW is a feature to translate a particular data source into
something visible according to the table definition. As long as driver can
intermediate table definition and data format of your own storage layer,
it shall work.

On the other hands, custom-scan interface, basically, allows extensions to
implement "alternative way to access" the data. If we have wiser way to
scan or join relations than built-in logic (yep, it will be a wiser
logic to scan
a result set of remote-join than local join on a couple of remote scan results),
this interface suggest the backend "I have such a wise strategy", then planner
will choose one of them; including either built-in or additional one, according
to the cost.

Let's back to your question. This interface is, right now, not designed to
implement pluggable storage layer. FDW is an option now, and maybe
a development item in v9.5 cycle if we want regular tables being pluggable.
Because I'm motivated to implement my GPU acceleration feature to
perform on regular relations, I put my higher priority on the interface to
allow extension to suggest "how to scan" it.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

23 February 2014, 13:24:28

Folks,

Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join relations.
Unfortunately, small amount of discussion we could have in this commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.

Prior to time-up, I'd like to ask hacker's opinion about its potential
arguable points (from my standpoint) if it needs to be fixed up.
One is hook definition to add alternative join path, and the other one
is a special varno when a custom scan replace a join node.
I'd like to see your opinion about them while we still have to change
the design if needed.

(1) Interface to add alternative paths in addition to built-in join paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
                                        RelOptInfo *joinrel,
                                        RelOptInfo *outerrel,
                                        RelOptInfo *innerrel,
                                        JoinType jointype,
                                        SpecialJoinInfo *sjinfo,
                                        List *restrictlist,
                                        List *mergeclause_list,
                                        SemiAntiJoinFactors *semifactors,
                                        Relids param_source_rels,
                                        Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

(2) CUSTOM_VAR for special Var reference

@@ -134,6 +134,7 @@ typedef struct Expr
 #define    INNER_VAR       65000       /* reference to inner subplan */
 #define    OUTER_VAR       65001       /* reference to outer subplan */
 #define    INDEX_VAR       65002       /* reference to index column */
+#define    CUSTOM_VAR      65003       /* reference to custom column */

I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
instead of built-in joins, because its tuples being fetched are usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> Hi Stephen,
>
> Thanks for your comments.
>
>> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>> > Is somebody available to volunteer to review the custom-scan patch?
>>
>> I looked through it a bit and my first take away from it was that the patches
>> to actually use the new hooks were also making more changes to the backend
>> code, leaving me with the impression that the proposed interface isn't
>> terribly stable.  Perhaps those changes should have just been in the first
>> patch, but they weren't and that certainly gave me pause.
>>
> Yes, the part-1 patch provides a set of interface portion to interact
> between the backend code and extension code. Rest of part-2 and part-3
> portions are contrib modules that implements its feature on top of
> custom-scan API.
>
>> I'm also not entirely convinced that this is the direction to go in when
>> it comes to pushing down joins to FDWs.  While that's certainly a goal that
>> I think we all share, this seems to be intending to add a completely different
>> feature which happens to be able to be used for that.  For FDWs, wouldn't
>> we only present the FDW with the paths where the foreign tables for that
>> FDW, or perhaps just a given foreign server, are being joined?
>>
> FDW's join pushing down is one of the valuable use-cases of this interface,
> but not all. As you might know, my motivation is to implement GPU acceleration
> feature on top of this interface, that offers alternative way to scan or join
> relations or potentially sort or aggregate.
> Probably, it is too stretch interpretation if we implement radix-sort on top
> of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down)
> is a demonstration of custom-scan interface for application, but not designed
> for a special purpose.
>
> Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
> side, it might be an idea to have common code (like a logic to check whether
> the both relations to be joined belongs to same foreign server) on the backend
> side as something like a gateway of them.
>
>
> As an aside, what should be the scope of FDW interface?
> In my understanding, it allows extension to implement "something" on behalf of
> a particular data structure being declared with CREATE FOREIGN TABLE.
> In other words, extension's responsibility is to generate a view of "something"
> according to PostgreSQL' internal data structure, instead of the object itself.
> On the other hands, custom-scan interface allows extensions to implement
> alternative methods to scan or join particular relations, but it is not a role
> to perform as a target being referenced in queries. In other words, it is methods
> to access objects.
> It is natural both features are similar because both of them intends extensions
> to hook the planner and executor, however, its purpose is different.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei <kaigai@ak.jp.nec.com>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

24 February 2014, 15:41:12

Hi Kaigai-san,

Sorry to leave the thread for a while.

2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> (1) Interface to add alternative paths in addition to built-in join paths
>
> This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
> extensions to provide alternative scan path in addition to the built-in
> join paths. Custom-scan path being added is assumed to perform to scan
> on a (virtual) relation that is a result set of joining relations.
> My concern is its arguments to be pushed. This hook is declared as follows:
>
> /* Hook for plugins to add custom join path, in addition to default ones */
> typedef void (*add_join_path_hook_type)(PlannerInfo *root,
>                                         RelOptInfo *joinrel,
>                                         RelOptInfo *outerrel,
>                                         RelOptInfo *innerrel,
>                                         JoinType jointype,
>                                         SpecialJoinInfo *sjinfo,
>                                         List *restrictlist,
>                                         List *mergeclause_list,
>                                         SemiAntiJoinFactors *semifactors,
>                                         Relids param_source_rels,
>                                         Relids extra_lateral_rels);
> extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>
> Likely, its arguments upper than restrictlist should be informed to extensions,
> because these are also arguments of add_paths_to_joinrel().
> However, I'm not 100% certain how about other arguments should be informed.
> Probably, it makes sense to inform param_source_rels and extra_lateral_rels
> to check whether the path is sensible for parameterized paths.
> On the other hand, I doubt whether mergeclause_list is usuful to deliver.
> (It may make sense if someone tries to implement their own merge-join
> implementation??)
>
> I'd like to seem idea to improve the current interface specification.

I've read the code path to add custom join again, and felt that
providing semifactors seems not necessary for the first cut, because
it is used in only initial_cost_nestloop (final_cost_nestloop receives
semifactors but it is not used there), and external module would not
become so smart before 9.5 development cycle.  It seems enough complex
to postpone determinig  whether it's essential for add_join_path_hook.Do you have any concrete use case for the
parameter?

mergeclause_list and param_source_rels seem little easier to use, but
anyway it should be documented how to use those parameters.

IMHO, minimal interface seems better than fully-fledged but half-baked
one, especially in the first-cut.

-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

24 February 2014, 15:50:21

2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> Folks,
>
> Let me remind the custom-scan patches; that is a basis feature of
> remote join of postgres_fdw, cache-only scan, (upcoming) GPU
> acceleration feature or various alternative ways to scan/join relations.
> Unfortunately, small amount of discussion we could have in this commit
> fest, even though Hanada-san volunteered to move the patches into
> "ready for committer" state at the CF-Nov.

I found some cosmetic flaw and .gitignore leak in the patches.  Please
see attached a patch for details.

--
Shigeru HANADA

Attachment

custom_scan_cosmetic_fix.patch

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

25 February 2014, 04:29:36

> > /* Hook for plugins to add custom join path, in addition to default
> > ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> >                                         RelOptInfo *joinrel,
> >                                         RelOptInfo *outerrel,
> >                                         RelOptInfo *innerrel,
> >                                         JoinType jointype,
> >                                         SpecialJoinInfo *sjinfo,
> >                                         List *restrictlist,
> >                                         List *mergeclause_list,
> >                                         SemiAntiJoinFactors *semifactors,
> >                                         Relids param_source_rels,
> >                                         Relids extra_lateral_rels);
> > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
> >
> > Likely, its arguments upper than restrictlist should be informed to
> > extensions, because these are also arguments of add_paths_to_joinrel().
> > However, I'm not 100% certain how about other arguments should be informed.
> > Probably, it makes sense to inform param_source_rels and
> > extra_lateral_rels to check whether the path is sensible for parameterized
> paths.
> > On the other hand, I doubt whether mergeclause_list is usuful to deliver.
> > (It may make sense if someone tries to implement their own merge-join
> > implementation??)
> >
> > I'd like to seem idea to improve the current interface specification.
> 
> I've read the code path to add custom join again, and felt that providing
> semifactors seems not necessary for the first cut, because it is used in
> only initial_cost_nestloop (final_cost_nestloop receives semifactors but
> it is not used there), and external module would not become so smart before
> 9.5 development cycle.  It seems enough complex to postpone determinig
> whether it's essential for add_join_path_hook.
>  Do you have any concrete use case for the parameter?
> 
The reason why I asked the question above is, I haven't been 100% certain
about its usage. Indeed, semifactors is applied on a limited usage, but
quite easy to reproduce by extension later (using clauselist_selectivity)
if extension wants this factor. So, I agree with removing the semifactors
here.

> mergeclause_list and param_source_rels seem little easier to use, but
> anyway it should be documented how to use those parameters.
>
The mergeclause_list might not be sufficient for extensions to determine
whether its own mergejoin is applicable here. See the comment below; that
is on the head of select_mergejoin_clauses.

|  * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
|  * this is a right/full join and there are nonmergejoinable join clauses.
|  * The executor's mergejoin machinery cannot handle such cases, so we have
|  * to avoid generating a mergejoin plan.  (Note that this flag does NOT
|  * consider whether there are actually any mergejoinable clauses.  This is
|  * correct because in some cases we need to build a clauseless mergejoin.
|  * Simply returning NIL is therefore not enough to distinguish safe from
|  * unsafe cases.)
| 
It says, mergejoin_clause == NIL is not a sufficient check to determine
whether the mergejoin logic is applicable on the target join.
So, either of them is probably an option for extension that tries to implement
their own mergejoin logic; (1) putting both of mergejoin_allowed and
mergeclause_list as arguments of the hook, or (2) re-definition of
select_mergejoin_clauses() as extern function to reproduce the variables on
demand. Which one is more preferable?

BTW, I found a duplicate clause_sides_match_join() definition, that is
invoked at select_mergejoin_clauses(), in joinpath.c and analyzejoins.c.
Either of them should be eliminated, I think.

For param_source_rels and extra_lateral_rels, I'll put source code comments
around add_join_path_hook.
Earlier half of try_(nestloop|hashjoin|mergejoin)_path is probably useful
as example of extension implementation.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com]
> Sent: Tuesday, February 25, 2014 12:41 AM
> To: Kohei KaiGai
> Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Jim Mlodgenski; Robert Haas;
> Tom Lane; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> 
> Hi Kaigai-san,
> 
> Sorry to leave the thread for a while.
> 
> 2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> > (1) Interface to add alternative paths in addition to built-in join
> > paths
> >
> > This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
> > extensions to provide alternative scan path in addition to the
> > built-in join paths. Custom-scan path being added is assumed to
> > perform to scan on a (virtual) relation that is a result set of joining
> relations.
> > My concern is its arguments to be pushed. This hook is declared as follows:
> >
> > /* Hook for plugins to add custom join path, in addition to default
> > ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> >                                         RelOptInfo *joinrel,
> >                                         RelOptInfo *outerrel,
> >                                         RelOptInfo *innerrel,
> >                                         JoinType jointype,
> >                                         SpecialJoinInfo *sjinfo,
> >                                         List *restrictlist,
> >                                         List *mergeclause_list,
> >                                         SemiAntiJoinFactors
> *semifactors,
> >                                         Relids param_source_rels,
> >                                         Relids extra_lateral_rels);
> > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
> >
> > Likely, its arguments upper than restrictlist should be informed to
> > extensions, because these are also arguments of add_paths_to_joinrel().
> > However, I'm not 100% certain how about other arguments should be informed.
> > Probably, it makes sense to inform param_source_rels and
> > extra_lateral_rels to check whether the path is sensible for parameterized
> paths.
> > On the other hand, I doubt whether mergeclause_list is usuful to deliver.
> > (It may make sense if someone tries to implement their own merge-join
> > implementation??)
> >
> > I'd like to seem idea to improve the current interface specification.
> 
> I've read the code path to add custom join again, and felt that providing
> semifactors seems not necessary for the first cut, because it is used in
> only initial_cost_nestloop (final_cost_nestloop receives semifactors but
> it is not used there), and external module would not become so smart before
> 9.5 development cycle.  It seems enough complex to postpone determinig
> whether it's essential for add_join_path_hook.
>  Do you have any concrete use case for the parameter?
> 
> mergeclause_list and param_source_rels seem little easier to use, but
> anyway it should be documented how to use those parameters.
> 
> IMHO, minimal interface seems better than fully-fledged but half-baked one,
> especially in the first-cut.
> 
> --
> Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Ashutosh Bapat

Date:

25 February 2014, 08:59:25

On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:

Folks,

Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join relations.
Unfortunately, small amount of discussion we could have in this commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.

Sorry for jumping into this late.

Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose.

Prior to time-up, I'd like to ask hacker's opinion about its potential
arguable points (from my standpoint) if it needs to be fixed up.
One is hook definition to add alternative join path, and the other one
is a special varno when a custom scan replace a join node.
I'd like to see your opinion about them while we still have to change
the design if needed.

(1) Interface to add alternative paths in addition to built-in join paths

This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:

/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;

Likely, its arguments upper than restrictlist should be informed to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)

I'd like to seem idea to improve the current interface specification.

Since a custom node is open implementation, it will be important to pass as much information down to the hooks as possible; lest the hooks will be constrained. Since the functions signatures within the planner, optimizer will change from time to time, so the custom node hook signatures will need to change from time to time. That might turn out to be maintenance overhead.

BTW, is it a good idea for custom nodes to also affect other paths like append, group etc.? Will it need separate hooks for each of those?

(2) CUSTOM_VAR for special Var reference

@@ -134,6 +134,7 @@ typedef struct Expr
#define INNER_VAR 65000 /* reference to inner subplan */
#define OUTER_VAR 65001 /* reference to outer subplan */
#define INDEX_VAR 65002 /* reference to index column */
+#define CUSTOM_VAR 65003 /* reference to custom column */

I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
instead of built-in joins, because its tuples being fetched are usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.

SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.

If you have comments around above two topic, or others, please give
your ideas.

Thanks,

2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> Hi Stephen,
>
> Thanks for your comments.
>
>> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>> > Is somebody available to volunteer to review the custom-scan patch?
>>
>> I looked through it a bit and my first take away from it was that the patches
>> to actually use the new hooks were also making more changes to the backend
>> code, leaving me with the impression that the proposed interface isn't
>> terribly stable. Perhaps those changes should have just been in the first
>> patch, but they weren't and that certainly gave me pause.
>>
> Yes, the part-1 patch provides a set of interface portion to interact
> between the backend code and extension code. Rest of part-2 and part-3
> portions are contrib modules that implements its feature on top of
> custom-scan API.
>
>> I'm also not entirely convinced that this is the direction to go in when
>> it comes to pushing down joins to FDWs. While that's certainly a goal that
>> I think we all share, this seems to be intending to add a completely different
>> feature which happens to be able to be used for that. For FDWs, wouldn't
>> we only present the FDW with the paths where the foreign tables for that
>> FDW, or perhaps just a given foreign server, are being joined?
>>
> FDW's join pushing down is one of the valuable use-cases of this interface,
> but not all. As you might know, my motivation is to implement GPU acceleration
> feature on top of this interface, that offers alternative way to scan or join
> relations or potentially sort or aggregate.
> Probably, it is too stretch interpretation if we implement radix-sort on top
> of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down)
> is a demonstration of custom-scan interface for application, but not designed
> for a special purpose.
>
> Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
> side, it might be an idea to have common code (like a logic to check whether
> the both relations to be joined belongs to same foreign server) on the backend
> side as something like a gateway of them.
>
>
> As an aside, what should be the scope of FDW interface?
> In my understanding, it allows extension to implement "something" on behalf of
> a particular data structure being declared with CREATE FOREIGN TABLE.
> In other words, extension's responsibility is to generate a view of "something"
> according to PostgreSQL' internal data structure, instead of the object itself.
> On the other hands, custom-scan interface allows extensions to implement
> alternative methods to scan or join particular relations, but it is not a role
> to perform as a target being referenced in queries. In other words, it is methods
> to access objects.
> It is natural both features are similar because both of them intends extensions
> to hook the planner and executor, however, its purpose is different.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei <kaigai@ak.jp.nec.com>

--
KaiGai Kohei <kaigai@kaigai.gr.jp>

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

25 February 2014, 10:10:32

> Sorry for jumping into this late.
>
> Instead of custom node, it might be better idea to improve FDW infrastructure
> to push join. For the starters, is it possible for the custom scan node
> hooks to create a ForeignScan node? In general, I think, it might be better
> for the custom scan hooks to create existing nodes if they serve the purpose.
>
It does not work well because existing FDW infrastructure is designed to
perform on foreign tables, not regular tables. Probably, it needs to revise
much our assumption around the background code, if we re-define the purpose
of FDW infrastructure. For example, ForeignScan is expected to return a tuple
according to the TupleDesc that is exactly same with table definition.
It does not fit the requirement if we replace a join-node by ForeignScan
because its TupleDesc of joined relations is not predefined.

I'd like to define these features are designed for individual purpose.
FDW is designed to intermediate an external data source and internal heap
representation according to foreign table definition. In other words, its
role is to generate contents of predefined database object on the fly.
On the other hands, custom-scan is designed to implement alternative ways
to scan / join relations in addition to the methods supported by built-in
feature.

I'm motivated to implement GPU acceleration feature that works transparently
for application. Thus, it has to be capable on regular tables, because most
of application stores data on regular tables, not foreign ones.

> Since a custom node is open implementation, it will be important to pass
> as much information down to the hooks as possible; lest the hooks will be
> constrained.  Since the functions signatures within the planner, optimizer
> will change from time to time, so the custom node hook signatures will need
> to change from time to time. That might turn out to be maintenance overhead.
>
Yes. You are also right. But it also makes maintenance overhead if hook has
many arguments nobody uses.
Probably, it makes sense to list up the arguments that cannot be reproduced
from other information, can be reproduced but complicated steps, and can be
reproduced easily.

Below is the information we cannot reproduce:- PlannerInfo *root- RelOptInfo *joinrel- RelOptInfo *outerrel- RelOptInfo
*innerrel-JoinType jointype- SpecialJoinInfo *sjinfo- List *restrictlist 

Below is the information we can reproduce but complicated steps:- List *mergeclause_list- bool mergejoin_allow- Relids
param_source_rels-Relids extra_lateral_rels 

Below is the information we can reproduce easily:- SemiAntiJoinFactors *semifactors

I think, the first two categories or the first category (if functions to
reproduce the second group is exposed) should be informed to extension,
however, priority of the third group is not high.


> BTW, is it a good idea for custom nodes to also affect other paths like
> append, group etc.? Will it need separate hooks for each of those?
>
Yes. I plan to support above plan node, in addition to scan / join only.
The custom-scan node is thin abstraction towards general executor behavior,
so I believe it is not hard to enhance this node, without new plan node
for each of them.
Of course, it will need separate hook to add alternative path on the planner
stage, but no individual plan nodes. (Sorry, it was unclear for me what
does the "hook" mean.)

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com]
> Sent: Tuesday, February 25, 2014 5:59 PM
> To: Kohei KaiGai
> Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim
> Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
>
>
>
> On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>
>
>     Folks,
>
>     Let me remind the custom-scan patches; that is a basis feature of
>     remote join of postgres_fdw, cache-only scan, (upcoming) GPU
>     acceleration feature or various alternative ways to scan/join
> relations.
>     Unfortunately, small amount of discussion we could have in this
> commit
>     fest, even though Hanada-san volunteered to move the patches into
>     "ready for committer" state at the CF-Nov.
>
>
>
> Sorry for jumping into this late.
>
> Instead of custom node, it might be better idea to improve FDW infrastructure
> to push join. For the starters, is it possible for the custom scan node
> hooks to create a ForeignScan node? In general, I think, it might be better
> for the custom scan hooks to create existing nodes if they serve the purpose.
>
>
>
>
>     Prior to time-up, I'd like to ask hacker's opinion about its
> potential
>     arguable points (from my standpoint) if it needs to be fixed up.
>     One is hook definition to add alternative join path, and the other
> one
>     is a special varno when a custom scan replace a join node.
>     I'd like to see your opinion about them while we still have to change
>     the design if needed.
>
>     (1) Interface to add alternative paths in addition to built-in join
> paths
>
>
>     This patch adds "add_join_path_hook" on add_paths_to_joinrel to
> allow
>     extensions to provide alternative scan path in addition to the
> built-in
>     join paths. Custom-scan path being added is assumed to perform to
> scan
>     on a (virtual) relation that is a result set of joining relations.
>     My concern is its arguments to be pushed. This hook is declared
> as follows:
>
>     /* Hook for plugins to add custom join path, in addition to default
> ones */
>     typedef void (*add_join_path_hook_type)(PlannerInfo *root,
>                                             RelOptInfo *joinrel,
>                                             RelOptInfo *outerrel,
>                                             RelOptInfo *innerrel,
>                                             JoinType jointype,
>                                             SpecialJoinInfo
> *sjinfo,
>                                             List *restrictlist,
>                                             List *mergeclause_list,
>                                             SemiAntiJoinFactors
> *semifactors,
>                                             Relids
> param_source_rels,
>                                             Relids
> extra_lateral_rels);
>     extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>
>     Likely, its arguments upper than restrictlist should be informed
> to extensions,
>     because these are also arguments of add_paths_to_joinrel().
>     However, I'm not 100% certain how about other arguments should be
> informed.
>     Probably, it makes sense to inform param_source_rels and
> extra_lateral_rels
>     to check whether the path is sensible for parameterized paths.
>     On the other hand, I doubt whether mergeclause_list is usuful to
> deliver.
>     (It may make sense if someone tries to implement their own
> merge-join
>     implementation??)
>
>     I'd like to seem idea to improve the current interface
> specification.
>
>
>
>
> Since a custom node is open implementation, it will be important to pass
> as much information down to the hooks as possible; lest the hooks will be
> constrained.  Since the functions signatures within the planner, optimizer
> will change from time to time, so the custom node hook signatures will need
> to change from time to time. That might turn out to be maintenance overhead.
>
>
> BTW, is it a good idea for custom nodes to also affect other paths like
> append, group etc.? Will it need separate hooks for each of those?
>
>
>
>
>     (2) CUSTOM_VAR for special Var reference
>
>     @@ -134,6 +134,7 @@ typedef struct Expr
>      #define    INNER_VAR       65000       /* reference to inner
> subplan */
>      #define    OUTER_VAR       65001       /* reference to outer
> subplan */
>      #define    INDEX_VAR       65002       /* reference to index
> column */
>     +#define    CUSTOM_VAR      65003       /* reference to custom
> column */
>
>     I newly added CUSTOM_VAR to handle a case when custom-scan override
>     join relations.
>     Var-nodes within join plan are adjusted to refer either
> ecxt_innertuple or
>     ecxt_outertuple of ExprContext. It makes a trouble if custom-scan
> runs
>     instead of built-in joins, because its tuples being fetched are
> usually
>     stored on the ecxt_scantuple, thus Var-nodes also need to have right
>     varno neither inner nor outer.
>
>     SetPlanRefCustomScan callback, being kicked on set_plan_refs,
> allows
>     extensions to rewrite Var-nodes within custom-scan node to indicate
>     ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
>     For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
>     this node reference the third attribute of the tuple in
> ecxt_scantuple.
>     I think it is a reasonable solution, however, I'm not 100% certain
>     whether people have more graceful idea to implement it.
>
>     If you have comments around above two topic, or others, please give
>     your ideas.
>
>     Thanks,
>
>
>     2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
>
>     > Hi Stephen,
>     >
>     > Thanks for your comments.
>     >
>     >> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>     >> > Is somebody available to volunteer to review the custom-scan
> patch?
>     >>
>     >> I looked through it a bit and my first take away from it was
> that the patches
>     >> to actually use the new hooks were also making more changes to
> the backend
>     >> code, leaving me with the impression that the proposed interface
> isn't
>     >> terribly stable.  Perhaps those changes should have just been
> in the first
>     >> patch, but they weren't and that certainly gave me pause.
>     >>
>     > Yes, the part-1 patch provides a set of interface portion to
> interact
>     > between the backend code and extension code. Rest of part-2 and
> part-3
>     > portions are contrib modules that implements its feature on top
> of
>     > custom-scan API.
>     >
>     >> I'm also not entirely convinced that this is the direction to
> go in when
>     >> it comes to pushing down joins to FDWs.  While that's certainly
> a goal that
>     >> I think we all share, this seems to be intending to add a
> completely different
>     >> feature which happens to be able to be used for that.  For FDWs,
> wouldn't
>     >> we only present the FDW with the paths where the foreign tables
> for that
>     >> FDW, or perhaps just a given foreign server, are being joined?
>     >>
>     > FDW's join pushing down is one of the valuable use-cases of this
> interface,
>     > but not all. As you might know, my motivation is to implement
> GPU acceleration
>     > feature on top of this interface, that offers alternative way
> to scan or join
>     > relations or potentially sort or aggregate.
>     > Probably, it is too stretch interpretation if we implement
> radix-sort on top
>     > of FDW. I'd like you to understand the part-3 patch (FDW's join
> pushing-down)
>     > is a demonstration of custom-scan interface for application, but
> not designed
>     > for a special purpose.
>     >
>     > Right now, I put all the logic to interact CSI and FDW driver
> on postgres_fdw
>     > side, it might be an idea to have common code (like a logic to
> check whether
>     > the both relations to be joined belongs to same foreign server)
> on the backend
>     > side as something like a gateway of them.
>     >
>     >
>     > As an aside, what should be the scope of FDW interface?
>     > In my understanding, it allows extension to implement "something"
> on behalf of
>     > a particular data structure being declared with CREATE FOREIGN
> TABLE.
>     > In other words, extension's responsibility is to generate a view
> of "something"
>     > according to PostgreSQL' internal data structure, instead of the
> object itself.
>     > On the other hands, custom-scan interface allows extensions to
> implement
>     > alternative methods to scan or join particular relations, but
> it is not a role
>     > to perform as a target being referenced in queries. In other words,
> it is methods
>     > to access objects.
>     > It is natural both features are similar because both of them
> intends extensions
>     > to hook the planner and executor, however, its purpose is
> different.
>     >
>     > Thanks,
>     > --
>     > NEC OSS Promotion Center / PG-Strom Project
>     > KaiGai Kohei <kaigai@ak.jp.nec.com>
>
>
>
>     --
>
>     KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>
>     --
>     Sent via pgsql-hackers mailing list
> (pgsql-hackers@postgresql.org)
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
>
>
>
> --
>
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Ashutosh Bapat

Date:

25 February 2014, 11:32:41

On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> Sorry for jumping into this late.
>
> Instead of custom node, it might be better idea to improve FDW infrastructure
> to push join. For the starters, is it possible for the custom scan node
> hooks to create a ForeignScan node? In general, I think, it might be better
> for the custom scan hooks to create existing nodes if they serve the purpose.
>
It does not work well because existing FDW infrastructure is designed to
perform on foreign tables, not regular tables. Probably, it needs to revise
much our assumption around the background code, if we re-define the purpose
of FDW infrastructure. For example, ForeignScan is expected to return a tuple
according to the TupleDesc that is exactly same with table definition.
It does not fit the requirement if we replace a join-node by ForeignScan
because its TupleDesc of joined relations is not predefined.

If one wants to push joins, aggregates, grouping across to other data sources capable of handling them, that will need to change. But, at the same time, letting custom scan node being able to decide that doesn't seem to be a very good idea. Although, through custom scan nodes, we can see the potential in adding these features.

I'd like to define these features are designed for individual purpose.
FDW is designed to intermediate an external data source and internal heap
representation according to foreign table definition. In other words, its
role is to generate contents of predefined database object on the fly.
On the other hands, custom-scan is designed to implement alternative ways
to scan / join relations in addition to the methods supported by built-in
feature.

I'm motivated to implement GPU acceleration feature that works transparently
for application. Thus, it has to be capable on regular tables, because most
of application stores data on regular tables, not foreign ones.

It looks like my description was misleading. In some cases, it might be possible that the ultimate functionality that a particular instantiation of custom node scan is already available as a Plan node in PG, but PG optimizer is not able to optimize the operation that way. In such case, custom scan node infrastructure should produce the corresponding Path node and not implement that functionality itself.

> Since a custom node is open implementation, it will be important to pass
> as much information down to the hooks as possible; lest the hooks will be
> constrained. Since the functions signatures within the planner, optimizer
> will change from time to time, so the custom node hook signatures will need
> to change from time to time. That might turn out to be maintenance overhead.
>
Yes. You are also right. But it also makes maintenance overhead if hook has
many arguments nobody uses.
Probably, it makes sense to list up the arguments that cannot be reproduced
from other information, can be reproduced but complicated steps, and can be
reproduced easily.

Below is the information we cannot reproduce:
- PlannerInfo *root
- RelOptInfo *joinrel
- RelOptInfo *outerrel
- RelOptInfo *innerrel
- JoinType jointype
- SpecialJoinInfo *sjinfo
- List *restrictlist

Most of this information is available through corresponding RelOptInfo, or we should make RelOptInfo contain all the information related to every relation required to be computed during the query. So, any function which creates paths can just take that RelOptInfo as an argument and produce the path/s. That way there is lesser chance that the function signatures change.

Below is the information we can reproduce but complicated steps:
- List *mergeclause_list
- bool mergejoin_allow
- Relids param_source_rels
- Relids extra_lateral_rels

Below is the information we can reproduce easily:
- SemiAntiJoinFactors *semifactors

I think, the first two categories or the first category (if functions to
reproduce the second group is exposed) should be informed to extension,
however, priority of the third group is not high.

> BTW, is it a good idea for custom nodes to also affect other paths like
> append, group etc.? Will it need separate hooks for each of those?
>
Yes. I plan to support above plan node, in addition to scan / join only.
The custom-scan node is thin abstraction towards general executor behavior,
so I believe it is not hard to enhance this node, without new plan node
for each of them.
Of course, it will need separate hook to add alternative path on the planner
stage, but no individual plan nodes. (Sorry, it was unclear for me what
does the "hook" mean.)

If we represent all the operation like grouping, sorting, aggregation, as some sort of relation, we can create paths for each of the relation like we do (I am heavily borrowing from Tom's idea of pathifying those operations). We will need much lesser hooks in custom scan node.

BTW, from the patch, I do not see this change to be light weight. I was expecting more of a list of hooks to be defined by the user and this infrastructure just calling them at appropriate places.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

> -----Original Message-----
> From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com]
> Sent: Tuesday, February 25, 2014 5:59 PM
> To: Kohei KaiGai
> Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim
> Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
>
>
>
> On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>
>
> Folks,
>
> Let me remind the custom-scan patches; that is a basis feature of
> remote join of postgres_fdw, cache-only scan, (upcoming) GPU
> acceleration feature or various alternative ways to scan/join
> relations.
> Unfortunately, small amount of discussion we could have in this
> commit
> fest, even though Hanada-san volunteered to move the patches into
> "ready for committer" state at the CF-Nov.
>
>
>
> Sorry for jumping into this late.
>
> Instead of custom node, it might be better idea to improve FDW infrastructure
> to push join. For the starters, is it possible for the custom scan node
> hooks to create a ForeignScan node? In general, I think, it might be better
> for the custom scan hooks to create existing nodes if they serve the purpose.
>
>
>
>
> Prior to time-up, I'd like to ask hacker's opinion about its
> potential
> arguable points (from my standpoint) if it needs to be fixed up.
> One is hook definition to add alternative join path, and the other
> one
> is a special varno when a custom scan replace a join node.
> I'd like to see your opinion about them while we still have to change
> the design if needed.
>
> (1) Interface to add alternative paths in addition to built-in join
> paths
>
>
> This patch adds "add_join_path_hook" on add_paths_to_joinrel to
> allow
> extensions to provide alternative scan path in addition to the
> built-in
> join paths. Custom-scan path being added is assumed to perform to
> scan
> on a (virtual) relation that is a result set of joining relations.
> My concern is its arguments to be pushed. This hook is declared
> as follows:
>
> /* Hook for plugins to add custom join path, in addition to default
> ones */
> typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> RelOptInfo *joinrel,
> RelOptInfo *outerrel,
> RelOptInfo *innerrel,
> JoinType jointype,
> SpecialJoinInfo
> *sjinfo,
> List *restrictlist,
> List *mergeclause_list,
> SemiAntiJoinFactors
> *semifactors,
> Relids
> param_source_rels,
> Relids
> extra_lateral_rels);
> extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>
> Likely, its arguments upper than restrictlist should be informed
> to extensions,
> because these are also arguments of add_paths_to_joinrel().
> However, I'm not 100% certain how about other arguments should be
> informed.
> Probably, it makes sense to inform param_source_rels and
> extra_lateral_rels
> to check whether the path is sensible for parameterized paths.
> On the other hand, I doubt whether mergeclause_list is usuful to
> deliver.
> (It may make sense if someone tries to implement their own
> merge-join
> implementation??)
>
> I'd like to seem idea to improve the current interface
> specification.
>
>
>
>
> Since a custom node is open implementation, it will be important to pass
> as much information down to the hooks as possible; lest the hooks will be
> constrained. Since the functions signatures within the planner, optimizer
> will change from time to time, so the custom node hook signatures will need
> to change from time to time. That might turn out to be maintenance overhead.
>
>
> BTW, is it a good idea for custom nodes to also affect other paths like
> append, group etc.? Will it need separate hooks for each of those?
>
>
>
>
> (2) CUSTOM_VAR for special Var reference
>
> @@ -134,6 +134,7 @@ typedef struct Expr
> #define INNER_VAR 65000 /* reference to inner
> subplan */
> #define OUTER_VAR 65001 /* reference to outer
> subplan */
> #define INDEX_VAR 65002 /* reference to index
> column */
> +#define CUSTOM_VAR 65003 /* reference to custom
> column */
>
> I newly added CUSTOM_VAR to handle a case when custom-scan override
> join relations.
> Var-nodes within join plan are adjusted to refer either
> ecxt_innertuple or
> ecxt_outertuple of ExprContext. It makes a trouble if custom-scan
> runs
> instead of built-in joins, because its tuples being fetched are
> usually
> stored on the ecxt_scantuple, thus Var-nodes also need to have right
> varno neither inner nor outer.
>
> SetPlanRefCustomScan callback, being kicked on set_plan_refs,
> allows
> extensions to rewrite Var-nodes within custom-scan node to indicate
> ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
> For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
> this node reference the third attribute of the tuple in
> ecxt_scantuple.
> I think it is a reasonable solution, however, I'm not 100% certain
> whether people have more graceful idea to implement it.
>
> If you have comments around above two topic, or others, please give
> your ideas.
>
> Thanks,
>
>
> 2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
>
> > Hi Stephen,
> >
> > Thanks for your comments.
> >
> >> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> >> > Is somebody available to volunteer to review the custom-scan
> patch?
> >>
> >> I looked through it a bit and my first take away from it was
> that the patches
> >> to actually use the new hooks were also making more changes to
> the backend
> >> code, leaving me with the impression that the proposed interface
> isn't
> >> terribly stable. Perhaps those changes should have just been
> in the first
> >> patch, but they weren't and that certainly gave me pause.
> >>
> > Yes, the part-1 patch provides a set of interface portion to
> interact
> > between the backend code and extension code. Rest of part-2 and
> part-3
> > portions are contrib modules that implements its feature on top
> of
> > custom-scan API.
> >
> >> I'm also not entirely convinced that this is the direction to
> go in when
> >> it comes to pushing down joins to FDWs. While that's certainly
> a goal that
> >> I think we all share, this seems to be intending to add a
> completely different
> >> feature which happens to be able to be used for that. For FDWs,
> wouldn't
> >> we only present the FDW with the paths where the foreign tables
> for that
> >> FDW, or perhaps just a given foreign server, are being joined?
> >>
> > FDW's join pushing down is one of the valuable use-cases of this
> interface,
> > but not all. As you might know, my motivation is to implement
> GPU acceleration
> > feature on top of this interface, that offers alternative way
> to scan or join
> > relations or potentially sort or aggregate.
> > Probably, it is too stretch interpretation if we implement
> radix-sort on top
> > of FDW. I'd like you to understand the part-3 patch (FDW's join
> pushing-down)
> > is a demonstration of custom-scan interface for application, but
> not designed
> > for a special purpose.
> >
> > Right now, I put all the logic to interact CSI and FDW driver
> on postgres_fdw
> > side, it might be an idea to have common code (like a logic to
> check whether
> > the both relations to be joined belongs to same foreign server)
> on the backend
> > side as something like a gateway of them.
> >
> >
> > As an aside, what should be the scope of FDW interface?
> > In my understanding, it allows extension to implement "something"
> on behalf of
> > a particular data structure being declared with CREATE FOREIGN
> TABLE.
> > In other words, extension's responsibility is to generate a view
> of "something"
> > according to PostgreSQL' internal data structure, instead of the
> object itself.
> > On the other hands, custom-scan interface allows extensions to
> implement
> > alternative methods to scan or join particular relations, but
> it is not a role
> > to perform as a target being referenced in queries. In other words,
> it is methods
> > to access objects.
> > It is natural both features are similar because both of them
> intends extensions
> > to hook the planner and executor, however, its purpose is
> different.
> >
> > Thanks,
> > --
> > NEC OSS Promotion Center / PG-Strom Project
> > KaiGai Kohei <kaigai@ak.jp.nec.com>
>
>
>
> --
>
> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>
> --
> Sent via pgsql-hackers mailing list
> (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
>
>
>
> --
>
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

25 February 2014, 14:24:17

2014-02-25 20:32 GMT+09:00 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>:
>
> On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>>
>> > Sorry for jumping into this late.
>> >
>> > Instead of custom node, it might be better idea to improve FDW
>> > infrastructure
>> > to push join. For the starters, is it possible for the custom scan node
>> > hooks to create a ForeignScan node? In general, I think, it might be
>> > better
>> > for the custom scan hooks to create existing nodes if they serve the
>> > purpose.
>> >
>> It does not work well because existing FDW infrastructure is designed to
>> perform on foreign tables, not regular tables. Probably, it needs to
>> revise
>> much our assumption around the background code, if we re-define the
>> purpose
>> of FDW infrastructure. For example, ForeignScan is expected to return a
>> tuple
>> according to the TupleDesc that is exactly same with table definition.
>> It does not fit the requirement if we replace a join-node by ForeignScan
>> because its TupleDesc of joined relations is not predefined.
>>
>
> If one wants to push joins, aggregates, grouping across to other data
> sources capable of handling them, that will need to change. But, at the same
> time, letting custom scan node being able to decide that doesn't seem to be
> a very good idea. Although, through custom scan nodes, we can see the
> potential in adding these features.
>
Of course, existing form of custom-scan node is designed to support
scan or join relations, as a first step. It will also need some enhancement
to support other class of execution node in the future version.
I'm not certain why it is problematic.

>> I'd like to define these features are designed for individual purpose.
>> FDW is designed to intermediate an external data source and internal heap
>> representation according to foreign table definition. In other words, its
>> role is to generate contents of predefined database object on the fly.
>> On the other hands, custom-scan is designed to implement alternative ways
>> to scan / join relations in addition to the methods supported by built-in
>> feature.
>>
>> I'm motivated to implement GPU acceleration feature that works
>> transparently
>> for application. Thus, it has to be capable on regular tables, because
>> most
>> of application stores data on regular tables, not foreign ones.
>>
>
> It looks like my description was misleading. In some cases, it might be
> possible that the ultimate functionality that a particular instantiation of
> custom node scan is already available as a Plan node in PG, but PG optimizer
> is not able to optimize the operation that way. In such case, custom scan
> node infrastructure should produce the corresponding Path node and not
> implement that functionality itself.
>
You are suggesting that CustomSort, CustomAgg, CustomAppend and
so on should be supported in the future version, for better integration with
the plan optimizer. Right?
It is probably a good idea if optimizer needs to identify CustomXXXX node
using node tag, rather than something others like custom-scan provider
name,
Right now, custom-scan feature focuses on optimization of relation scan
and join as its first scope, and does not need to identify the class of
corresponding Path node.
On the upthread of this discussion, I initially proposed to have separated
CustomScan and CustomJoin node, however, our consensus was that
CustomScan can perform as like a scan on the result set of joined
relations, so I dropped multiple node types from the first version.

>> > Since a custom node is open implementation, it will be important to pass
>> > as much information down to the hooks as possible; lest the hooks will
>> > be
>> > constrained.  Since the functions signatures within the planner,
>> > optimizer
>> > will change from time to time, so the custom node hook signatures will
>> > need
>> > to change from time to time. That might turn out to be maintenance
>> > overhead.
>> >
>> Yes. You are also right. But it also makes maintenance overhead if hook
>> has
>> many arguments nobody uses.
>> Probably, it makes sense to list up the arguments that cannot be
>> reproduced
>> from other information, can be reproduced but complicated steps, and can
>> be
>> reproduced easily.
>>
>> Below is the information we cannot reproduce:
>>  - PlannerInfo *root
>>  - RelOptInfo *joinrel
>>  - RelOptInfo *outerrel
>>  - RelOptInfo *innerrel
>>  - JoinType jointype
>>  - SpecialJoinInfo *sjinfo
>>  - List *restrictlist
>>
>
> Most of this information is available through corresponding RelOptInfo, or
> we should make RelOptInfo contain all the information related to every
> relation required to be computed during the query. So, any function which
> creates paths can just take that RelOptInfo as an argument and produce the
> path/s. That way there is lesser chance that the function signatures change.
>
Uhmm.... It is inconvenience to write extensions. I want the variables
in the first and second groups being delivered to the hook, even though
it may have minor modification in the future release.
Relations join is one of the heart of RDBMS, so I'd like to believe these
arguments are one of the most stable stuffs.

>> Below is the information we can reproduce but complicated steps:
>>  - List *mergeclause_list
>>  - bool mergejoin_allow
>>  - Relids param_source_rels
>>  - Relids extra_lateral_rels
>>
>> Below is the information we can reproduce easily:
>>  - SemiAntiJoinFactors *semifactors
>>
>> I think, the first two categories or the first category (if functions to
>> reproduce the second group is exposed) should be informed to extension,
>> however, priority of the third group is not high.
>>
>>
>> > BTW, is it a good idea for custom nodes to also affect other paths like
>> > append, group etc.? Will it need separate hooks for each of those?
>> >
>> Yes. I plan to support above plan node, in addition to scan / join only.
>> The custom-scan node is thin abstraction towards general executor
>> behavior,
>> so I believe it is not hard to enhance this node, without new plan node
>> for each of them.
>> Of course, it will need separate hook to add alternative path on the
>> planner
>> stage, but no individual plan nodes. (Sorry, it was unclear for me what
>> does the "hook" mean.)
>>
>
> If we represent all the operation like grouping, sorting, aggregation, as
> some sort of relation, we can create paths for each of the relation like we
> do (I am heavily borrowing from Tom's idea of pathifying those operations).
> We will need much lesser hooks in custom scan node.
>
> BTW, from the patch, I do not see this change to be light weight. I was
> expecting more of a list of hooks to be defined by the user and this
> infrastructure just calling them at appropriate places.
>
Let's focus on scan and join that we are currently working on.
Even if we need separate node type for grouping or sorting, it will not
be necessary to construct whole of the framework from the scratch.
For example, definition of CustomProvider table will be able to reuse
for other class of operations, because most of them are thin abstraction
of existing executor's interface.

Thanks,

>> Thanks,
>> --
>> NEC OSS Promotion Center / PG-Strom Project
>> KaiGai Kohei <kaigai@ak.jp.nec.com>
>>
>>
>> > -----Original Message-----
>> > From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com]
>> > Sent: Tuesday, February 25, 2014 5:59 PM
>> > To: Kohei KaiGai
>> > Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim
>> > Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut
>> > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>> >
>> >
>> >
>> >
>> > On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp>
>> > wrote:
>> >
>> >
>> >       Folks,
>> >
>> >       Let me remind the custom-scan patches; that is a basis feature of
>> >       remote join of postgres_fdw, cache-only scan, (upcoming) GPU
>> >       acceleration feature or various alternative ways to scan/join
>> > relations.
>> >       Unfortunately, small amount of discussion we could have in this
>> > commit
>> >       fest, even though Hanada-san volunteered to move the patches into
>> >       "ready for committer" state at the CF-Nov.
>> >
>> >
>> >
>> > Sorry for jumping into this late.
>> >
>> > Instead of custom node, it might be better idea to improve FDW
>> > infrastructure
>> > to push join. For the starters, is it possible for the custom scan node
>> > hooks to create a ForeignScan node? In general, I think, it might be
>> > better
>> > for the custom scan hooks to create existing nodes if they serve the
>> > purpose.
>> >
>> >
>> >
>> >
>> >       Prior to time-up, I'd like to ask hacker's opinion about its
>> > potential
>> >       arguable points (from my standpoint) if it needs to be fixed up.
>> >       One is hook definition to add alternative join path, and the other
>> > one
>> >       is a special varno when a custom scan replace a join node.
>> >       I'd like to see your opinion about them while we still have to
>> > change
>> >       the design if needed.
>> >
>> >       (1) Interface to add alternative paths in addition to built-in
>> > join
>> > paths
>> >
>> >
>> >       This patch adds "add_join_path_hook" on add_paths_to_joinrel to
>> > allow
>> >       extensions to provide alternative scan path in addition to the
>> > built-in
>> >       join paths. Custom-scan path being added is assumed to perform to
>> > scan
>> >       on a (virtual) relation that is a result set of joining relations.
>> >       My concern is its arguments to be pushed. This hook is declared
>> > as follows:
>> >
>> >       /* Hook for plugins to add custom join path, in addition to
>> > default
>> > ones */
>> >       typedef void (*add_join_path_hook_type)(PlannerInfo *root,
>> >                                               RelOptInfo *joinrel,
>> >                                               RelOptInfo *outerrel,
>> >                                               RelOptInfo *innerrel,
>> >                                               JoinType jointype,
>> >                                               SpecialJoinInfo
>> > *sjinfo,
>> >                                               List *restrictlist,
>> >                                               List *mergeclause_list,
>> >                                               SemiAntiJoinFactors
>> > *semifactors,
>> >                                               Relids
>> > param_source_rels,
>> >                                               Relids
>> > extra_lateral_rels);
>> >       extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>> >
>> >       Likely, its arguments upper than restrictlist should be informed
>> > to extensions,
>> >       because these are also arguments of add_paths_to_joinrel().
>> >       However, I'm not 100% certain how about other arguments should be
>> > informed.
>> >       Probably, it makes sense to inform param_source_rels and
>> > extra_lateral_rels
>> >       to check whether the path is sensible for parameterized paths.
>> >       On the other hand, I doubt whether mergeclause_list is usuful to
>> > deliver.
>> >       (It may make sense if someone tries to implement their own
>> > merge-join
>> >       implementation??)
>> >
>> >       I'd like to seem idea to improve the current interface
>> > specification.
>> >
>> >
>> >
>> >
>> > Since a custom node is open implementation, it will be important to pass
>> > as much information down to the hooks as possible; lest the hooks will
>> > be
>> > constrained.  Since the functions signatures within the planner,
>> > optimizer
>> > will change from time to time, so the custom node hook signatures will
>> > need
>> > to change from time to time. That might turn out to be maintenance
>> > overhead.
>> >
>> >
>> > BTW, is it a good idea for custom nodes to also affect other paths like
>> > append, group etc.? Will it need separate hooks for each of those?
>> >
>> >
>> >
>> >
>> >       (2) CUSTOM_VAR for special Var reference
>> >
>> >       @@ -134,6 +134,7 @@ typedef struct Expr
>> >        #define    INNER_VAR       65000       /* reference to inner
>> > subplan */
>> >        #define    OUTER_VAR       65001       /* reference to outer
>> > subplan */
>> >        #define    INDEX_VAR       65002       /* reference to index
>> > column */
>> >       +#define    CUSTOM_VAR      65003       /* reference to custom
>> > column */
>> >
>> >       I newly added CUSTOM_VAR to handle a case when custom-scan
>> > override
>> >       join relations.
>> >       Var-nodes within join plan are adjusted to refer either
>> > ecxt_innertuple or
>> >       ecxt_outertuple of ExprContext. It makes a trouble if custom-scan
>> > runs
>> >       instead of built-in joins, because its tuples being fetched are
>> > usually
>> >       stored on the ecxt_scantuple, thus Var-nodes also need to have
>> > right
>> >       varno neither inner nor outer.
>> >
>> >       SetPlanRefCustomScan callback, being kicked on set_plan_refs,
>> > allows
>> >       extensions to rewrite Var-nodes within custom-scan node to
>> > indicate
>> >       ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
>> >       For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
>> >       this node reference the third attribute of the tuple in
>> > ecxt_scantuple.
>> >       I think it is a reasonable solution, however, I'm not 100% certain
>> >       whether people have more graceful idea to implement it.
>> >
>> >       If you have comments around above two topic, or others, please
>> > give
>> >       your ideas.
>> >
>> >       Thanks,
>> >
>> >
>> >       2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
>> >
>> >       > Hi Stephen,
>> >       >
>> >       > Thanks for your comments.
>> >       >
>> >       >> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>> >       >> > Is somebody available to volunteer to review the custom-scan
>> > patch?
>> >       >>
>> >       >> I looked through it a bit and my first take away from it was
>> > that the patches
>> >       >> to actually use the new hooks were also making more changes to
>> > the backend
>> >       >> code, leaving me with the impression that the proposed
>> > interface
>> > isn't
>> >       >> terribly stable.  Perhaps those changes should have just been
>> > in the first
>> >       >> patch, but they weren't and that certainly gave me pause.
>> >       >>
>> >       > Yes, the part-1 patch provides a set of interface portion to
>> > interact
>> >       > between the backend code and extension code. Rest of part-2 and
>> > part-3
>> >       > portions are contrib modules that implements its feature on top
>> > of
>> >       > custom-scan API.
>> >       >
>> >       >> I'm also not entirely convinced that this is the direction to
>> > go in when
>> >       >> it comes to pushing down joins to FDWs.  While that's certainly
>> > a goal that
>> >       >> I think we all share, this seems to be intending to add a
>> > completely different
>> >       >> feature which happens to be able to be used for that.  For
>> > FDWs,
>> > wouldn't
>> >       >> we only present the FDW with the paths where the foreign tables
>> > for that
>> >       >> FDW, or perhaps just a given foreign server, are being joined?
>> >       >>
>> >       > FDW's join pushing down is one of the valuable use-cases of this
>> > interface,
>> >       > but not all. As you might know, my motivation is to implement
>> > GPU acceleration
>> >       > feature on top of this interface, that offers alternative way
>> > to scan or join
>> >       > relations or potentially sort or aggregate.
>> >       > Probably, it is too stretch interpretation if we implement
>> > radix-sort on top
>> >       > of FDW. I'd like you to understand the part-3 patch (FDW's join
>> > pushing-down)
>> >       > is a demonstration of custom-scan interface for application, but
>> > not designed
>> >       > for a special purpose.
>> >       >
>> >       > Right now, I put all the logic to interact CSI and FDW driver
>> > on postgres_fdw
>> >       > side, it might be an idea to have common code (like a logic to
>> > check whether
>> >       > the both relations to be joined belongs to same foreign server)
>> > on the backend
>> >       > side as something like a gateway of them.
>> >       >
>> >       >
>> >       > As an aside, what should be the scope of FDW interface?
>> >       > In my understanding, it allows extension to implement
>> > "something"
>> > on behalf of
>> >       > a particular data structure being declared with CREATE FOREIGN
>> > TABLE.
>> >       > In other words, extension's responsibility is to generate a view
>> > of "something"
>> >       > according to PostgreSQL' internal data structure, instead of the
>> > object itself.
>> >       > On the other hands, custom-scan interface allows extensions to
>> > implement
>> >       > alternative methods to scan or join particular relations, but
>> > it is not a role
>> >       > to perform as a target being referenced in queries. In other
>> > words,
>> > it is methods
>> >       > to access objects.
>> >       > It is natural both features are similar because both of them
>> > intends extensions
>> >       > to hook the planner and executor, however, its purpose is
>> > different.
>> >       >
>> >       > Thanks,
>> >       > --
>> >       > NEC OSS Promotion Center / PG-Strom Project
>> >       > KaiGai Kohei <kaigai@ak.jp.nec.com>
>> >
>> >
>> >
>> >       --
>> >
>> >       KaiGai Kohei <kaigai@kaigai.gr.jp>
>> >
>> >
>> >       --
>> >       Sent via pgsql-hackers mailing list
>> > (pgsql-hackers@postgresql.org)
>> >       To make changes to your subscription:
>> >       http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>> >
>> >
>> >
>> >
>> >
>> > --
>> >
>> > Best Wishes,
>> > Ashutosh Bapat
>> > EnterpriseDB Corporation
>> > The Postgres Database Company
>>
>
>
>
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company



-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

26 February 2014, 02:29:24

Hi Kaigai-san,

2014-02-25 13:28 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> The reason why I asked the question above is, I haven't been 100% certain
> about its usage. Indeed, semifactors is applied on a limited usage, but
> quite easy to reproduce by extension later (using clauselist_selectivity)
> if extension wants this factor. So, I agree with removing the semifactors
> here.

Agreed.  It would be nice to mention how to obtain semifactos for
people who want to implement advanced join overriding.

>> mergeclause_list and param_source_rels seem little easier to use, but
>> anyway it should be documented how to use those parameters.
>>
> The mergeclause_list might not be sufficient for extensions to determine
> whether its own mergejoin is applicable here. See the comment below; that
> is on the head of select_mergejoin_clauses.
>
> |  * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if
> |  * this is a right/full join and there are nonmergejoinable join clauses.
> |  * The executor's mergejoin machinery cannot handle such cases, so we have
> |  * to avoid generating a mergejoin plan.  (Note that this flag does NOT
> |  * consider whether there are actually any mergejoinable clauses.  This is
> |  * correct because in some cases we need to build a clauseless mergejoin.
> |  * Simply returning NIL is therefore not enough to distinguish safe from
> |  * unsafe cases.)
> |
> It says, mergejoin_clause == NIL is not a sufficient check to determine
> whether the mergejoin logic is applicable on the target join.
> So, either of them is probably an option for extension that tries to implement

Perhaps you mean "both of them"?

> their own mergejoin logic; (1) putting both of mergejoin_allowed and
> mergeclause_list as arguments of the hook, or (2) re-definition of
> select_mergejoin_clauses() as extern function to reproduce the variables on
> demand. Which one is more preferable?

I prefer (1), because exposing inside of planner might blocks changing
those internal functions.  If (at the moment) those information is
enough for overriding merge join for CSP, let's provide as parameters.


-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

26 February 2014, 03:14:18

Hi Kaigai-san,

2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>:
> (1) Interface to add alternative paths in addition to built-in join paths

I found that create_custom_path is not used at all in your patch.
I revised postgresql_fdw.c to use it like this.

...   /* Create join information which is stored as private information. */   memset(&jinfo, 0,
sizeof(PgRemoteJoinInfo));  jinfo.fdw_server_oid = o_server_oid;   jinfo.fdw_user_oid = o_user_oid;   jinfo.relids =
joinrel->relids;  jinfo.jointype = jointype;   jinfo.outer_rel = o_relinfo;   jinfo.inner_rel = i_relinfo;
jinfo.remote_conds= j_remote_conds;   jinfo.local_conds = j_local_conds;
 
   /* OK, make a CustomScan node to run remote join */   cpath = create_customscan_path(root,
      joinrel,                                  0, 0, 0,     /* estimate later */                                  NIL,
                                required_outer,                                  "postgres-fdw",
         0,                                  packPgRemoteJoinInfo(&jinfo));
 
   estimate_remote_join_cost(root, cpath, &jinfo, sjinfo);
   add_path(joinrel, &cpath->path);
...

This seems to work fine.  Is this right approach?  If so,this portion
would be a good example to replace local join with custom scan for
authors of custom scan providers.  One thing I worry is the case that
you've intentionally avoided calling create_customscan_path.

-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

26 February 2014, 04:21:25

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > Instead of custom node, it might be better idea to improve FDW infrastructure
> > to push join. For the starters, is it possible for the custom scan node
> > hooks to create a ForeignScan node? In general, I think, it might be better
> > for the custom scan hooks to create existing nodes if they serve the purpose.
> >
> It does not work well because existing FDW infrastructure is designed to
> perform on foreign tables, not regular tables. Probably, it needs to revise
> much our assumption around the background code, if we re-define the purpose
> of FDW infrastructure. For example, ForeignScan is expected to return a tuple
> according to the TupleDesc that is exactly same with table definition.
> It does not fit the requirement if we replace a join-node by ForeignScan
> because its TupleDesc of joined relations is not predefined.

I'm not following this logic at all- how are you defining "foreign" from
"regular"?  Certainly, in-memory-only tables which are sitting out in
some non-persistent GPU memory aren't "regular" by any PG definition.
Perhaps you can't make ForeignScan suddenly work as a join-node
replacement, but I've not seen where anyone has proposed that (directly-
I've implied it on occation where a remote view can be used, but that's
not the same thing as having proper push-down support for joins).

> I'd like to define these features are designed for individual purpose.

My previous complaint about this patch set has been precisely that each
piece seems to be custom-built and every patch needs more and more
backend changes.  If every time someone wants to do something with this
CustomScan API, they need changes made to the backend code, then it's
not a generally useful external API.  We really don't want to define
such an external API as then we have to deal with backwards
compatibility, particularly when it's all specialized to specific use
cases which are all different.

> FDW is designed to intermediate an external data source and internal heap
> representation according to foreign table definition. In other words, its
> role is to generate contents of predefined database object on the fly.

There's certainly nothing in the FDW API which requires that the remote
side have an internal heap representation, as evidenced by the various
FDWs which already exist and certainly are not any kind of 'normal'
heap.  Every query against the foriegn relation goes through the FDW API
and can end up returning whatever the FDW author decides is appropriate
to return at that time, as long as it matches the tuple description-
which is absolutely necessary for any kind of sanity, imv.

> On the other hands, custom-scan is designed to implement alternative ways
> to scan / join relations in addition to the methods supported by built-in
> feature.

I can see the usefulness in being able to push down aggregates or other
function-type calls to the remote side of an FDW and would love to see
work done along those lines, along with the ability to push down joins
to remote systems- but I'm not convinced that the claimed flexibility
with the CustomScan API is there, given the need to continue modifying
the backend code for each use-case, nor that there are particularly new
and inventive ways of saying "find me all the cases where set X overlaps
with set Y".  I'm certainly open to the idea that we could have an FDW
API which allows us to ask exactly that question and let the remote side
cost it out and give us an answer for a pair of relations but that isn't
what this is.  Note also that in any kind of aggregation push-down we
must be sure that the function is well-defined and that the FDW is on
the hook to ensure that the returned data is the same as if we ran the
same aggregate function locally, otherwise the results of a query might
differ based on if the aggregate was fired locally or remotely (which
could be influenced by costing- eg: the size of the relation or its
statistics).

> I'm motivated to implement GPU acceleration feature that works transparently
> for application. Thus, it has to be capable on regular tables, because most
> of application stores data on regular tables, not foreign ones.

You want to persist that data in the GPU across multiple calls though,
which makes it unlike any kind of regular PG table and much more like
some foreign table.  Perhaps the data is initially loaded from a local
table and then updated on the GPU card in some way when the 'real' table
is updated, but neither of those makes it a "regular" PG table.

> > Since a custom node is open implementation, it will be important to pass
> > as much information down to the hooks as possible; lest the hooks will be
> > constrained.  Since the functions signatures within the planner, optimizer
> > will change from time to time, so the custom node hook signatures will need
> > to change from time to time. That might turn out to be maintenance overhead.

It's more than "from time-to-time", it was "for each use case in the
given patch set asking for this feature", which is why I'm pushing back
on it.

> Yes. You are also right. But it also makes maintenance overhead if hook has
> many arguments nobody uses.

I can agree with this- there should be a sensible API if we're going to
do this.

> Probably, it makes sense to list up the arguments that cannot be reproduced
> from other information, can be reproduced but complicated steps, and can be
> reproduced easily.

This really strikes me as the wrong approach for an FDW join-pushdown
API, which should be geared around giving the remote side an opportunity
on a case-by-case basis to cost out joins using whatever methods it has
available to implement them.  I've outlined above the reasons I don't
agree with just making the entire planner/optimizer pluggable.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

26 February 2014, 04:42:34

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> Yes, the part-1 patch provides a set of interface portion to interact
> between the backend code and extension code. Rest of part-2 and part-3
> portions are contrib modules that implements its feature on top of
> custom-scan API.

Just to come back to this- the other two "contrib module" patches, at
least as I read over their initial submission, were *also* patching
portions of backend code which it was apparently discovered that they
needed.  That's a good bit of my complaint regarding this approach.

> FDW's join pushing down is one of the valuable use-cases of this interface,
> but not all. As you might know, my motivation is to implement GPU acceleration
> feature on top of this interface, that offers alternative way to scan or join
> relations or potentially sort or aggregate.

If you're looking to just use GPU acceleration for improving individual
queries, I would think that Robert's work around backend workers would
be a more appropriate way to go, with the ability to move a working set
of data from shared buffers and on-disk representation of a relation
over to the GPU's memory, perform the operation, and then copy the
results back.  If that's not possible or effective wrt performance, then
I think we need to look at managing the external GPU memory as a foreign
system through an FDW which happens to be updated through triggers or
similar.  The same could potentially be done for memcached systems, etc.

"regular" PG tables, just to point out one issue, can be locked on a
row-by-row basis, and we know exactly where in shared buffers to go hunt
down the rows.  How is that going to work here, if this is both a
"regular" table and stored off in a GPU's memory across subsequent
queries or even transactions?

> Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
> side, it might be an idea to have common code (like a logic to check whether
> the both relations to be joined belongs to same foreign server) on the backend
> side as something like a gateway of them.

Yes, that's what I was suggesting above- we should be asking the FDWs on
a case-by-case basis how to cost out the join between foreign tables
which they are responsible for.  Asking two different FDWs servers to
cost out a join between their tables doesn't make any sense to me.

> As an aside, what should be the scope of FDW interface?
> In my understanding, it allows extension to implement "something" on behalf of
> a particular data structure being declared with CREATE FOREIGN TABLE.

That's where it is today, but certainly not our end goal.

> In other words, extension's responsibility is to generate a view of "something"
> according to PostgreSQL' internal data structure, instead of the object itself.

The result of the FDW call needs to be something which PG understands
and can work with, otherwise we wouldn't be able to, say, run PL/pgsql
code on the result, or pass it into some other aggregate which we
decided was cheaper to run locally.  Being able to push down aggregates
to the remote side of an FDW certainly fits in quite well with that.

> On the other hands, custom-scan interface allows extensions to implement
> alternative methods to scan or join particular relations, but it is not a role
> to perform as a target being referenced in queries. In other words, it is methods
> to access objects.

The custom-scan interface still needs to produce "something" according
to PG's internal data structures, so it's not clear to me where you're
going with this.

> It is natural both features are similar because both of them intends extensions
> to hook the planner and executor, however, its purpose is different.

I disagree as I don't really view FDWs as "hooks".  A "hook" is more
like a trigger- sure, you can modify the data in transit, or throw an
error if you see an issue, but you don't get to redefine the world and
throw out what the planner or optimizer knows about the rest of what is
going on in the query.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

26 February 2014, 06:52:24

> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > > Instead of custom node, it might be better idea to improve FDW
> > > infrastructure to push join. For the starters, is it possible for
> > > the custom scan node hooks to create a ForeignScan node? In general,
> > > I think, it might be better for the custom scan hooks to create existing
> nodes if they serve the purpose.
> > >
> > It does not work well because existing FDW infrastructure is designed
> > to perform on foreign tables, not regular tables. Probably, it needs
> > to revise much our assumption around the background code, if we
> > re-define the purpose of FDW infrastructure. For example, ForeignScan
> > is expected to return a tuple according to the TupleDesc that is exactly
> same with table definition.
> > It does not fit the requirement if we replace a join-node by
> > ForeignScan because its TupleDesc of joined relations is not predefined.
>
> I'm not following this logic at all- how are you defining "foreign" from
> "regular"?  Certainly, in-memory-only tables which are sitting out in some
> non-persistent GPU memory aren't "regular" by any PG definition.
> Perhaps you can't make ForeignScan suddenly work as a join-node replacement,
> but I've not seen where anyone has proposed that (directly- I've implied
> it on occation where a remote view can be used, but that's not the same
> thing as having proper push-down support for joins).
>
This regular one means usual tables. Even though custom implementation
may reference self-managed in-memory cache instead of raw heap, the table
pointed in user's query shall be a usual table.
In the past, Hanada-san had proposed an enhancement of FDW to support
remote-join but eventually rejected.

> > I'd like to define these features are designed for individual purpose.
>
> My previous complaint about this patch set has been precisely that each
> piece seems to be custom-built and every patch needs more and more backend
> changes.  If every time someone wants to do something with this CustomScan
> API, they need changes made to the backend code, then it's not a generally
> useful external API.  We really don't want to define such an external API
> as then we have to deal with backwards compatibility, particularly when
> it's all specialized to specific use cases which are all different.
>
The changes to backend are just for convenient. We may be able to implement
functions to translate Bitmapset from/to cstring form in postgres_fdw,
does it make sense to maintain individually?
I thought these functions were useful to have in the backend commonly, but
is not a fundamental functionality lacks of the custom-scan interface.

> > FDW is designed to intermediate an external data source and internal
> > heap representation according to foreign table definition. In other
> > words, its role is to generate contents of predefined database object
> on the fly.
>
> There's certainly nothing in the FDW API which requires that the remote
> side have an internal heap representation, as evidenced by the various FDWs
> which already exist and certainly are not any kind of 'normal'
> heap.  Every query against the foriegn relation goes through the FDW API
> and can end up returning whatever the FDW author decides is appropriate
> to return at that time, as long as it matches the tuple description- which
> is absolutely necessary for any kind of sanity, imv.
>
Yes. It's my understanding for the role of FDW driver.

> > On the other hands, custom-scan is designed to implement alternative
> > ways to scan / join relations in addition to the methods supported by
> > built-in feature.
>
> I can see the usefulness in being able to push down aggregates or other
> function-type calls to the remote side of an FDW and would love to see work
> done along those lines, along with the ability to push down joins to remote
> systems- but I'm not convinced that the claimed flexibility with the
> CustomScan API is there, given the need to continue modifying the backend
> code for each use-case, nor that there are particularly new and inventive
> ways of saying "find me all the cases where set X overlaps with set Y".
> I'm certainly open to the idea that we could have an FDW API which allows
> us to ask exactly that question and let the remote side cost it out and
> give us an answer for a pair of relations but that isn't what this is.  Note
> also that in any kind of aggregation push-down we must be sure that the
> function is well-defined and that the FDW is on the hook to ensure that
> the returned data is the same as if we ran the same aggregate function locally,
> otherwise the results of a query might differ based on if the aggregate
> was fired locally or remotely (which could be influenced by costing- eg:
> the size of the relation or its statistics).
>
I can also understand the usefulness of join or aggregation into the remote
side in case of foreign table reference. In similar way, it is also useful
if we can push these CPU intensive operations into co-processors on regular
table references.
As I mentioned above, the backend changes by the part-2/-3 patches are just
minor stuff, and I thought it should not be implemented by contrib module
locally.
Regarding to the condition where we can run remote aggregation, you are
right. As current postgres_fdw push-down qualifiers into remote side,
we need to ensure remote aggregate definition is identical with local one.

> > I'm motivated to implement GPU acceleration feature that works
> > transparently for application. Thus, it has to be capable on regular
> > tables, because most of application stores data on regular tables, not
> foreign ones.
>
> You want to persist that data in the GPU across multiple calls though, which
> makes it unlike any kind of regular PG table and much more like some foreign
> table.  Perhaps the data is initially loaded from a local table and then
> updated on the GPU card in some way when the 'real' table is updated, but
> neither of those makes it a "regular" PG table.
>
No. What I want to implement is, read the regular table and transfer the
contents into GPU's local memory for calculation, then receives its
calculation result. The in-memory cache (also I'm working on) is supplemental
stuff because disk access is much slower and row-oriented data structure is
not suitable for SIMD style instructions.

> > > Since a custom node is open implementation, it will be important to
> > > pass as much information down to the hooks as possible; lest the
> > > hooks will be constrained.  Since the functions signatures within
> > > the planner, optimizer will change from time to time, so the custom
> > > node hook signatures will need to change from time to time. That might
> turn out to be maintenance overhead.
>
> It's more than "from time-to-time", it was "for each use case in the given
> patch set asking for this feature", which is why I'm pushing back on it.
>
My patch set didn't change the interface itself. All it added was (probably)
useful utility routines to be placed on the backend, rather than contrib.

> > Yes. You are also right. But it also makes maintenance overhead if
> > hook has many arguments nobody uses.
>
> I can agree with this- there should be a sensible API if we're going to
> do this.
>
> > Probably, it makes sense to list up the arguments that cannot be
> > reproduced from other information, can be reproduced but complicated
> > steps, and can be reproduced easily.
>
> This really strikes me as the wrong approach for an FDW join-pushdown API,
> which should be geared around giving the remote side an opportunity on a
> case-by-case basis to cost out joins using whatever methods it has available
> to implement them.  I've outlined above the reasons I don't agree with just
> making the entire planner/optimizer pluggable.
>
I'm also inclined to have arguments that will provide enough information
for extensions to determine the best path for them.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

26 February 2014, 07:29:08

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> This regular one means usual tables. Even though custom implementation
> may reference self-managed in-memory cache instead of raw heap, the table
> pointed in user's query shall be a usual table.
> In the past, Hanada-san had proposed an enhancement of FDW to support
> remote-join but eventually rejected.

I'm not aware of the specifics around that proposal but I don't believe
we, as a community, have decided to reject the idea in general.

> The changes to backend are just for convenient. We may be able to implement
> functions to translate Bitmapset from/to cstring form in postgres_fdw,
> does it make sense to maintain individually?

Perhaps not.

> I thought these functions were useful to have in the backend commonly, but
> is not a fundamental functionality lacks of the custom-scan interface.

Then perhaps they should be exposed more directly?  I can understand
generally useful functionality being exposed in a way that anyone can
use it, but we need to avoid interfaces which can't be stable due to
normal / ongoing changes to the backend code.

> I can also understand the usefulness of join or aggregation into the remote
> side in case of foreign table reference. In similar way, it is also useful
> if we can push these CPU intensive operations into co-processors on regular
> table references.

That's fine, if we can get data to and from those co-processors
efficiently enough that it's worth doing so.  If moving the data to the
GPU's memory will take longer than running the actual aggregation, then
it doesn't make any sense for regular tables because then we'd have to
cache the data in the GPU's memory in some way across multiple queries,
which isn't something we're set up to do.

> As I mentioned above, the backend changes by the part-2/-3 patches are just
> minor stuff, and I thought it should not be implemented by contrib module
> locally.

Fine- then propose them as generally useful additions, not as patches
which are supposed to just be for contrib modules using an already
defined interface.  If you can make a case for that then perhaps this is
more practical.

> Regarding to the condition where we can run remote aggregation, you are
> right. As current postgres_fdw push-down qualifiers into remote side,
> we need to ensure remote aggregate definition is identical with local one.

Of course.

> No. What I want to implement is, read the regular table and transfer the
> contents into GPU's local memory for calculation, then receives its
> calculation result. The in-memory cache (also I'm working on) is supplemental
> stuff because disk access is much slower and row-oriented data structure is
> not suitable for SIMD style instructions.

Is that actually performant?  Is it actually faster than processing the
data directly?  The discussions that I've had with folks have cast a
great deal of doubt in my mind about just how well that kind of quick
turn-around to the GPU's memory actually works.

> > This really strikes me as the wrong approach for an FDW join-pushdown API,
> > which should be geared around giving the remote side an opportunity on a
> > case-by-case basis to cost out joins using whatever methods it has available
> > to implement them.  I've outlined above the reasons I don't agree with just
> > making the entire planner/optimizer pluggable.
> >
> I'm also inclined to have arguments that will provide enough information
> for extensions to determine the best path for them.

For join push-down, I proposed above that we have an interface to the
FDW which allows us to ask it how much each join of the tables which are
on a given FDW's server would cost if the FDW did it vs. pulling it back
and doing it locally.  We could also pass all of the relations to the
FDW with the various join-quals and try to get an answer to everything,
but I'm afraid that'd simply end up duplicating the logic of the
optimizer into every FDW, which would be counter-productive.
Admittedly, getting the costing right isn't easy either, but it's not
clear to me how it'd make sense for the local server to be doing costing
for remote servers.
    Thanks,
        Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

26 February 2014, 07:47:20

> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > Yes, the part-1 patch provides a set of interface portion to interact
> > between the backend code and extension code. Rest of part-2 and part-3
> > portions are contrib modules that implements its feature on top of
> > custom-scan API.
>
> Just to come back to this- the other two "contrib module" patches, at least
> as I read over their initial submission, were *also* patching portions of
> backend code which it was apparently discovered that they needed.  That's
> a good bit of my complaint regarding this approach.
>
?? Sorry, are you still negative on the portion of backend patched
by the part-2 and part-3 portion??

> > FDW's join pushing down is one of the valuable use-cases of this
> > interface, but not all. As you might know, my motivation is to
> > implement GPU acceleration feature on top of this interface, that
> > offers alternative way to scan or join relations or potentially sort or
> aggregate.
>
> If you're looking to just use GPU acceleration for improving individual
> queries, I would think that Robert's work around backend workers would be
> a more appropriate way to go, with the ability to move a working set of
> data from shared buffers and on-disk representation of a relation over to
> the GPU's memory, perform the operation, and then copy the results back.
>
The approach is similar to the Robert's work except for GPU adoption,
instead of multicore CPUs. So, I tried to review his work to apply
the facilities on my extension also.

> If that's not possible or effective wrt performance, then I think we need
> to look at managing the external GPU memory as a foreign system through
> an FDW which happens to be updated through triggers or similar.  The same
> could potentially be done for memcached systems, etc.
>
I didn't imagine the idea that expose GPU's local memory.
A supplemental stuff for the data load performance I'm planning is just
a cache mechanism besides regular tables.

> "regular" PG tables, just to point out one issue, can be locked on a
> row-by-row basis, and we know exactly where in shared buffers to go hunt
> down the rows.  How is that going to work here, if this is both a "regular"
> table and stored off in a GPU's memory across subsequent queries or even
> transactions?
>
It shall be handled "case-by-case" basis, I think. If row-level lock is
required over the table scan, custom-scan node shall return a tuple being
located on the shared buffer, instead of the cached tuples. Of course,
it is an option for custom-scan node to calculate qualifiers by GPU with
cached data and returns tuples identified by ctid of the cached tuples.
Anyway, it is not a significant problem.

> > Right now, I put all the logic to interact CSI and FDW driver on
> > postgres_fdw side, it might be an idea to have common code (like a
> > logic to check whether the both relations to be joined belongs to same
> > foreign server) on the backend side as something like a gateway of them.
>
> Yes, that's what I was suggesting above- we should be asking the FDWs on
> a case-by-case basis how to cost out the join between foreign tables which
> they are responsible for.  Asking two different FDWs servers to cost out
> a join between their tables doesn't make any sense to me.
>
OK, I'll move the portion that will be needed commonly for other FDWs into
the backend code.

> > As an aside, what should be the scope of FDW interface?
> > In my understanding, it allows extension to implement "something" on
> > behalf of a particular data structure being declared with CREATE FOREIGN
> TABLE.
>
> That's where it is today, but certainly not our end goal.
>
> > In other words, extension's responsibility is to generate a view of
> "something"
> > according to PostgreSQL' internal data structure, instead of the object
> itself.
>
> The result of the FDW call needs to be something which PG understands and
> can work with, otherwise we wouldn't be able to, say, run PL/pgsql code
> on the result, or pass it into some other aggregate which we decided was
> cheaper to run locally.  Being able to push down aggregates to the remote
> side of an FDW certainly fits in quite well with that.
>
Yes. According to the previous discussion around postgres_fdw getting
merged, all we can trust on the remote side are built-in data types,
functions, operators or other stuffs only.

> > On the other hands, custom-scan interface allows extensions to
> > implement alternative methods to scan or join particular relations,
> > but it is not a role to perform as a target being referenced in
> > queries. In other words, it is methods to access objects.
>
> The custom-scan interface still needs to produce "something" according to
> PG's internal data structures, so it's not clear to me where you're going
> with this.
>
The custom-scan node is intended to perform on regular relations, not
only foreign tables. It means a special feature (like GPU acceleration)
can perform transparently for most of existing applications. Usually,
it defines regular tables for their work on installation, not foreign
tables. It is the biggest concern for me.

> > It is natural both features are similar because both of them intends
> > extensions to hook the planner and executor, however, its purpose is
> different.
>
> I disagree as I don't really view FDWs as "hooks".  A "hook" is more like
> a trigger- sure, you can modify the data in transit, or throw an error if
> you see an issue, but you don't get to redefine the world and throw out
> what the planner or optimizer knows about the rest of what is going on in
> the query.
>
I might have miswording. Anyway, I want plan nodes that enable extensions
to define its behavior, even though it's similar to ForeignScan, but allows
to perform on regular relations. Also, not only custom-scan and foreign-scan,
any plan nodes work according to the interface to co-work with other nodes,
it is not strange that both of interfaces are similar.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

26 February 2014, 08:01:53

2014-02-26 16:46 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
>> Just to come back to this- the other two "contrib module" patches, at least
>> as I read over their initial submission, were *also* patching portions of
>> backend code which it was apparently discovered that they needed.  That's
>> a good bit of my complaint regarding this approach.
>>
> ?? Sorry, are you still negative on the portion of backend patched
> by the part-2 and part-3 portion??

Perhaps he meant to separate patches based on feature-based rule.  IMO
if exposing utilities is essential for Custom Scan API in practical
meaning, IOW to implement and maintain an extension which implements
Custom Scan API, they should be go into the first patch.  IIUC two
contrib modules are also PoC for the API, so part-2/3 patch should
contain only changes against contrib and its document.

Besides that, some typo fixing are mixed in part-2 patch.  They should
go into the part-1 patch where the typo introduced.

-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

26 February 2014, 08:01:57

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > Just to come back to this- the other two "contrib module" patches, at least
> > as I read over their initial submission, were *also* patching portions of
> > backend code which it was apparently discovered that they needed.  That's
> > a good bit of my complaint regarding this approach.
> >
> ?? Sorry, are you still negative on the portion of backend patched
> by the part-2 and part-3 portion??

Pretty sure that I sent that prior to your last email, or at least
before I was to the end of it.

> > If you're looking to just use GPU acceleration for improving individual
> > queries, I would think that Robert's work around backend workers would be
> > a more appropriate way to go, with the ability to move a working set of
> > data from shared buffers and on-disk representation of a relation over to
> > the GPU's memory, perform the operation, and then copy the results back.
> >
> The approach is similar to the Robert's work except for GPU adoption,
> instead of multicore CPUs. So, I tried to review his work to apply
> the facilities on my extension also.

Good, I'd be very curious to hear how that might solve the issue for
you, instead of using hte CustomScan approach..

> > "regular" PG tables, just to point out one issue, can be locked on a
> > row-by-row basis, and we know exactly where in shared buffers to go hunt
> > down the rows.  How is that going to work here, if this is both a "regular"
> > table and stored off in a GPU's memory across subsequent queries or even
> > transactions?
> >
> It shall be handled "case-by-case" basis, I think. If row-level lock is
> required over the table scan, custom-scan node shall return a tuple being
> located on the shared buffer, instead of the cached tuples. Of course,
> it is an option for custom-scan node to calculate qualifiers by GPU with
> cached data and returns tuples identified by ctid of the cached tuples.
> Anyway, it is not a significant problem.

I think you're being a bit too hand-wavey here, but if we're talking
about pre-scanning the data using PG before sending it to the GPU and
then only performing a single statement on the GPU, we should be able to
deal with it.  I'm worried about your ideas to try and cache things on
the GPU though, if you're not prepared to deal with locks happening in
shared memory on the rows you've got cached out on the GPU, or hint
bits, or the visibility map being updated, etc...

> OK, I'll move the portion that will be needed commonly for other FDWs into
> the backend code.

Alright- but realize that there may be objections there on the basis
that the code/structures which you're exposing aren't, and will not be,
stable.  I'll have to go back and look at them myself, certainly, and
their history.

> Yes. According to the previous discussion around postgres_fdw getting
> merged, all we can trust on the remote side are built-in data types,
> functions, operators or other stuffs only.

Well, we're going to need to expand that a bit for aggregates, I'm
afraid, but we should be able to define the API for those aggregates
very tightly based on what PG does today and require that any FDW
purporting to provides those aggregates do it the way PG does.  Note
that this doesn't solve all the problems- we've got other issues with
regard to pushing aggregates down into FDWs that need to be solved.

> The custom-scan node is intended to perform on regular relations, not
> only foreign tables. It means a special feature (like GPU acceleration)
> can perform transparently for most of existing applications. Usually,
> it defines regular tables for their work on installation, not foreign
> tables. It is the biggest concern for me.

The line between a foreign table and a local one is becoming blurred
already, but still, if this is the goal then I really think the
background worker is where you should be focused, not on this Custom
Scan API.  Consider that, once we've got proper background workers,
we're going to need new nodes which operate in parallel (or some other
rejiggering of the nodes- I don't pretend to know exactly what Robert is
thinking here, and I've apparently forgotten it if he's posted it
somewhere) and those interfaces may drive changes which would impact the
Custom Scan API- or worse, make us deprecate or regret having added it
because now we'll need to break backwards compatibility to add in the
parallel node capability to satisfy the more general non-GPU case.

> I might have miswording. Anyway, I want plan nodes that enable extensions
> to define its behavior, even though it's similar to ForeignScan, but allows
> to perform on regular relations. Also, not only custom-scan and foreign-scan,
> any plan nodes work according to the interface to co-work with other nodes,
> it is not strange that both of interfaces are similar.

It sounds a lot like you're trying to define, external to PG, what
Robert is already trying to get going *internal* to PG, and I really
don't want to end up in a situation where we've got a solution for the
uncommon case but aren't able to address the common case due to risk of
breaking backwards compatibility...
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

26 February 2014, 08:02:49

* Shigeru Hanada (shigeru.hanada@gmail.com) wrote:
> Perhaps he meant to separate patches based on feature-based rule.  IMO
> if exposing utilities is essential for Custom Scan API in practical
> meaning, IOW to implement and maintain an extension which implements
> Custom Scan API, they should be go into the first patch.  IIUC two
> contrib modules are also PoC for the API, so part-2/3 patch should
> contain only changes against contrib and its document.

That's what I was getting at, yes.

> Besides that, some typo fixing are mixed in part-2 patch.  They should
> go into the part-1 patch where the typo introduced.

Agreed.
THanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

26 February 2014, 08:32:34

> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > This regular one means usual tables. Even though custom implementation
> > may reference self-managed in-memory cache instead of raw heap, the
> > table pointed in user's query shall be a usual table.
> > In the past, Hanada-san had proposed an enhancement of FDW to support
> > remote-join but eventually rejected.
>
> I'm not aware of the specifics around that proposal but I don't believe
> we, as a community, have decided to reject the idea in general.
>
IIUC, his approach was integration of join-pushdown within FDW APIs,
however, it does not mean the idea of remote-join is rejected.
I believe it is still one of our killer feature if we can revise the
implementation.

Hanada-san, could you put the reason why your proposition was rejected
before?

> > I thought these functions were useful to have in the backend commonly,
> > but is not a fundamental functionality lacks of the custom-scan interface.
>
> Then perhaps they should be exposed more directly?  I can understand
> generally useful functionality being exposed in a way that anyone can use
> it, but we need to avoid interfaces which can't be stable due to normal
> / ongoing changes to the backend code.
>
The functions my patches want to expose are:- get_restriction_qual_cost()- fix_expr_common()

And, the functions my patches newly want are:- bms_to_string()- bms_from_string()

Above two functions are defined as static functions because cost estimation
is done at costsize.c and set-reference is done at setrefs.c, however,
custom-scan breaks this assumption, so I moved it into public.
These are used by everyone, but everyone exists on a particular file.

> > I can also understand the usefulness of join or aggregation into the
> > remote side in case of foreign table reference. In similar way, it is
> > also useful if we can push these CPU intensive operations into
> > co-processors on regular table references.
>
> That's fine, if we can get data to and from those co-processors efficiently
> enough that it's worth doing so.  If moving the data to the GPU's memory
> will take longer than running the actual aggregation, then it doesn't make
> any sense for regular tables because then we'd have to cache the data in
> the GPU's memory in some way across multiple queries, which isn't something
> we're set up to do.
>
When I made a prototype implementation on top of FDW, using CUDA, it enabled
to run sequential scan 10 times faster than SeqScan on regular tables, if
qualifiers are enough complex.
Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer
mode using hardware DMA. It allows to hide the cost of data transfer by
pipelining, if here is enough number of records to be transferred.
Also, the recent trend of semiconductor device is GPU integration with CPU,
that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or
Tegra K1 of nvidia. All of them shares same memory, so no need to transfer
the data to be calculated. This trend is dominated by physical law because
of energy consumption by semiconductor. So, I'm optimistic for my idea.

> > As I mentioned above, the backend changes by the part-2/-3 patches are
> > just minor stuff, and I thought it should not be implemented by
> > contrib module locally.
>
> Fine- then propose them as generally useful additions, not as patches which
> are supposed to just be for contrib modules using an already defined
> interface.  If you can make a case for that then perhaps this is more
> practical.
>
The usage was found by the contrib module that wants to call static
functions, or feature to translate existing data structure to/from
cstring. But, anyway, does separated patch make sense?

> > No. What I want to implement is, read the regular table and transfer
> > the contents into GPU's local memory for calculation, then receives
> > its calculation result. The in-memory cache (also I'm working on) is
> > supplemental stuff because disk access is much slower and row-oriented
> > data structure is not suitable for SIMD style instructions.
>
> Is that actually performant?  Is it actually faster than processing the
> data directly?  The discussions that I've had with folks have cast a great
> deal of doubt in my mind about just how well that kind of quick turn-around
> to the GPU's memory actually works.
>
See above.

> > > This really strikes me as the wrong approach for an FDW
> > > join-pushdown API, which should be geared around giving the remote
> > > side an opportunity on a case-by-case basis to cost out joins using
> > > whatever methods it has available to implement them.  I've outlined
> > > above the reasons I don't agree with just making the entire
> planner/optimizer pluggable.
> > >
> > I'm also inclined to have arguments that will provide enough
> > information for extensions to determine the best path for them.
>
> For join push-down, I proposed above that we have an interface to the FDW
> which allows us to ask it how much each join of the tables which are on
> a given FDW's server would cost if the FDW did it vs. pulling it back and
> doing it locally.  We could also pass all of the relations to the FDW with
> the various join-quals and try to get an answer to everything, but I'm afraid
> that'd simply end up duplicating the logic of the optimizer into every FDW,
> which would be counter-productive.
>
Hmm... It seems to me we should follow the existing manner to construct
join path, rather than special handling. Even if a query contains three or
more foreign tables managed by same server, it shall be consolidated into
one remote join as long as its cost is less than local ones.
So, I'd like to bed using the new add_join_path_hook to compute possible
join path. If remote join implemented by custom-scan is cheaper than local
join, it shall be chosen, then optimizer will try joining with other foreign
tables with this custom-scan node. If remote-join is still cheap, then it
shall be consolidated again.

> Admittedly, getting the costing right isn't easy either, but it's not clear
> to me how it'd make sense for the local server to be doing costing for remote
> servers.
>
Right now, I ignored the cost to run remote-server, focused on the cost to
transfer via network. It might be an idea to discount the CPU cost of remote
execution.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

26 February 2014, 09:16:40

> > > If you're looking to just use GPU acceleration for improving
> > > individual queries, I would think that Robert's work around backend
> > > workers would be a more appropriate way to go, with the ability to
> > > move a working set of data from shared buffers and on-disk
> > > representation of a relation over to the GPU's memory, perform the
> operation, and then copy the results back.
> > >
> > The approach is similar to the Robert's work except for GPU adoption,
> > instead of multicore CPUs. So, I tried to review his work to apply the
> > facilities on my extension also.
>
> Good, I'd be very curious to hear how that might solve the issue for you,
> instead of using hte CustomScan approach..
>
I (plan to) use custom-scan of course. Once a relation is referenced
and optimizer decided GPU acceleration is cheaper, associated custom-
scan node read the data from underlying relation (or in-memory cache
if exists) then move to the shared memory buffer to deliver GPU
management background worker that launches asynchronous DMA one by one.
After that, custom-scan node receives filtered records via shared-
memory buffer, so it can construct tuples to be returned to the upper
node.

> > > "regular" PG tables, just to point out one issue, can be locked on a
> > > row-by-row basis, and we know exactly where in shared buffers to go
> > > hunt down the rows.  How is that going to work here, if this is both
> a "regular"
> > > table and stored off in a GPU's memory across subsequent queries or
> > > even transactions?
> > >
> > It shall be handled "case-by-case" basis, I think. If row-level lock
> > is required over the table scan, custom-scan node shall return a tuple
> > being located on the shared buffer, instead of the cached tuples. Of
> > course, it is an option for custom-scan node to calculate qualifiers
> > by GPU with cached data and returns tuples identified by ctid of the cached
> tuples.
> > Anyway, it is not a significant problem.
>
> I think you're being a bit too hand-wavey here, but if we're talking about
> pre-scanning the data using PG before sending it to the GPU and then only
> performing a single statement on the GPU, we should be able to deal with
> it.
It's what I want to implement.

> I'm worried about your ideas to try and cache things on the GPU though,
> if you're not prepared to deal with locks happening in shared memory on
> the rows you've got cached out on the GPU, or hint bits, or the visibility
> map being updated, etc...
>
It does not remain any state/information on the GPU side. Things related
to PG internal stuff is job of CPU.

> > OK, I'll move the portion that will be needed commonly for other FDWs
> > into the backend code.
>
> Alright- but realize that there may be objections there on the basis that
> the code/structures which you're exposing aren't, and will not be, stable.
> I'll have to go back and look at them myself, certainly, and their history.
>
I see, but it is a process during code getting merged.

> > Yes. According to the previous discussion around postgres_fdw getting
> > merged, all we can trust on the remote side are built-in data types,
> > functions, operators or other stuffs only.
>
> Well, we're going to need to expand that a bit for aggregates, I'm afraid,
> but we should be able to define the API for those aggregates very tightly
> based on what PG does today and require that any FDW purporting to provides
> those aggregates do it the way PG does.  Note that this doesn't solve all
> the problems- we've got other issues with regard to pushing aggregates down
> into FDWs that need to be solved.
>
I see. It probably needs more detailed investigation.

> > The custom-scan node is intended to perform on regular relations, not
> > only foreign tables. It means a special feature (like GPU
> > acceleration) can perform transparently for most of existing
> > applications. Usually, it defines regular tables for their work on
> > installation, not foreign tables. It is the biggest concern for me.
>
> The line between a foreign table and a local one is becoming blurred already,
> but still, if this is the goal then I really think the background worker
> is where you should be focused, not on this Custom Scan API.  Consider that,
> once we've got proper background workers, we're going to need new nodes
> which operate in parallel (or some other rejiggering of the nodes- I don't
> pretend to know exactly what Robert is thinking here, and I've apparently
> forgotten it if he's posted it
> somewhere) and those interfaces may drive changes which would impact the
> Custom Scan API- or worse, make us deprecate or regret having added it
> because now we'll need to break backwards compatibility to add in the
> parallel node capability to satisfy the more general non-GPU case.
>
The custom-scan API is thin abstraction towards the plan node interface,
not tightly convinced with a particular use case, like GPU, remote-join
and so on. So, I'm quite optimistic for the future maintainability.
Also, please remind the discussion at the last developer meeting.
The purpose of custom-scan (we didn't name it at that time) is to avoid
unnecessary project branch for people who want to implement their own
special feature but no facilities to enhance optimizer/executor are
supported.
Even though we have in-core parallel execution feature by CPU, it also
makes sense to provide some unique implementation that may be suitable
for a specific region.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Shigeru Hanada

Date:

26 February 2014, 09:17:51

2014-02-26 17:31 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
> IIUC, his approach was integration of join-pushdown within FDW APIs,
> however, it does not mean the idea of remote-join is rejected.
> I believe it is still one of our killer feature if we can revise the
> implementation.
>
> Hanada-san, could you put the reason why your proposition was rejected
> before?

IIUC it was not rejected, just returned-with-feedback.  We could not
get consensus about how join-push-down works.  A duscussion point was
multiple paths for a joinrel, but it was not so serious point.  Here
is the tail of the thread.

http://www.postgresql.org/message-id/4F058241.2000606@enterprisedb.com

>> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com>  writes:
>>>
>>> Hmm, so you're saying that the FDW function needs to be able to return
>>> multiple paths for a single joinrel. Fair enough, and that's not
>>> specific to remote joins. Even a single-table foreign scan could be
>>> implemented differently depending on whether you prefer fast-start or
>>> cheapest total.
>>
>>
>> ... or ordered vs unordered, etc.  Yeah, good point, we already got this
>> wrong with the PlanForeignScan API.  Good thing we didn't promise that
>> would be stable.
>
>
> This discussion withered down here...
>
> I think the advice to Shigeru-san is to work on the API. We didn't reach a
> consensus on what exactly it should look like, but at least you need to be
> able to return multiple paths for a single joinrel, and should look at
> fixing the PlanForeignScan API to allow that too.

And I've gave up for lack of time, IOW to finish more fundamental
portion of FDW API.

http://www.postgresql.org/message-id/4F39FC1A.7090202@gmail.com

-- 
Shigeru HANADA

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

26 February 2014, 15:22:43

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> IIUC, his approach was integration of join-pushdown within FDW APIs,
> however, it does not mean the idea of remote-join is rejected.

For my part, trying to consider doing remote joins *without* going
through FDWs is just nonsensical.  What are you joining remotely if not
two foreign tables?  With regard to the GPU approach, if that model
works whereby the normal PG tuples are read off disk, fed over to the
GPU, processed, then returned back to the user through PG, then I
wouldn't consider it really a 'remote' join but rather simply a new
execution node inside of PG which is planned and costed just like the
others.  We've been over the discussion already about trying to make
that a pluggable system but the, very reasonable, push-back on that has
been if it's really possible and really makes sense to be pluggable.  It
certainly doesn't *have* to be- PostgreSQL is written in C, as we all
know, and plenty of C code talks to GPUs and shuffles memory around- and
that's almost exactly what Robert is working on supporting with regular
CPUs and PG backends already.

In many ways, trying to conflate this idea of using-GPUs-to-do-work with
the idea of remote-FDW-joins has really disillusioned me with regard to
the CustomScan approach.

> > Then perhaps they should be exposed more directly?  I can understand
> > generally useful functionality being exposed in a way that anyone can use
> > it, but we need to avoid interfaces which can't be stable due to normal
> > / ongoing changes to the backend code.
> >
> The functions my patches want to expose are:
>  - get_restriction_qual_cost()
>  - fix_expr_common()

I'll try and find time to go look at these in more detail later this
week.  I have reservations about exposing the current estimates on costs
as we may want to adjust them in the future- but such adjustments may
need to be made in balance with other changes throughout the system and
an external module which depends on one result from the qual costing
might end up having problems with the costing changes because the
extension author wasn't aware of the other changes happening in other
areas of the costing.

I'm talking about this from a "beyond-just-the-GUCs" point of view, I
realize that the extension author could go look at the GUC settings, but
it's entirely reasonable to believe we'll make changes to the default
GUC settings along with how they're used in the future.

> And, the functions my patches newly want are:
>  - bms_to_string()
>  - bms_from_string()

Offhand, these look fine, if there's really an external use for them.
Will try to look at them in more detail later.

> > That's fine, if we can get data to and from those co-processors efficiently
> > enough that it's worth doing so.  If moving the data to the GPU's memory
> > will take longer than running the actual aggregation, then it doesn't make
> > any sense for regular tables because then we'd have to cache the data in
> > the GPU's memory in some way across multiple queries, which isn't something
> > we're set up to do.
> >
> When I made a prototype implementation on top of FDW, using CUDA, it enabled
> to run sequential scan 10 times faster than SeqScan on regular tables, if
> qualifiers are enough complex.
> Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer
> mode using hardware DMA. It allows to hide the cost of data transfer by
> pipelining, if here is enough number of records to be transferred.

That sounds very interesting and certainly figuring out the costing to
support that model will be tricky.  Also, shuffling the data around in
that way will also be interesting.  It strikes me that it'll be made
more difficult if we're trying to do it through the limitations of a
pre-defined API between the core code and an extension.

> Also, the recent trend of semiconductor device is GPU integration with CPU,
> that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or
> Tegra K1 of nvidia. All of them shares same memory, so no need to transfer
> the data to be calculated. This trend is dominated by physical law because
> of energy consumption by semiconductor. So, I'm optimistic for my idea.

And this just makes me wonder why the focus isn't on the background
worker approach instead of trying to do this all in an extension.

> The usage was found by the contrib module that wants to call static
> functions, or feature to translate existing data structure to/from
> cstring. But, anyway, does separated patch make sense?

I haven't had a chance to go back and look into the functions in detail,
but offhand I'd say the bms ones are probably fine while the others
would need more research as to if they make sense to expose to an
extension.

> Hmm... It seems to me we should follow the existing manner to construct
> join path, rather than special handling. Even if a query contains three or
> more foreign tables managed by same server, it shall be consolidated into
> one remote join as long as its cost is less than local ones.

I'm not convinced that it's going to be that simple, but I'm certainly
interested in the general idea.

> So, I'd like to bed using the new add_join_path_hook to compute possible
> join path. If remote join implemented by custom-scan is cheaper than local
> join, it shall be chosen, then optimizer will try joining with other foreign
> tables with this custom-scan node. If remote-join is still cheap, then it
> shall be consolidated again.

And I'm still unconvinced that trying to make this a hook and
implemented by an extension makes sense.

> > Admittedly, getting the costing right isn't easy either, but it's not clear
> > to me how it'd make sense for the local server to be doing costing for remote
> > servers.
> >
> Right now, I ignored the cost to run remote-server, focused on the cost to
> transfer via network. It might be an idea to discount the CPU cost of remote
> execution.

Pretty sure we're going to need to consider the remote processing cost
of the join as well..
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

26 February 2014, 15:56:43

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> I (plan to) use custom-scan of course. Once a relation is referenced
> and optimizer decided GPU acceleration is cheaper, associated custom-
> scan node read the data from underlying relation (or in-memory cache
> if exists) then move to the shared memory buffer to deliver GPU
> management background worker that launches asynchronous DMA one by one.
> After that, custom-scan node receives filtered records via shared-
> memory buffer, so it can construct tuples to be returned to the upper
> node.

Alright- but have you discussed this with Robert?  We're going to be
whacking things around for parallel support with new nodes and more
built-in helper functionality for doing this work and I'm not anxious to
have CustomScan end up being a legacy interface that we're required to
pull forward because we accepted it before things had settled.

> > I'm worried about your ideas to try and cache things on the GPU though,
> > if you're not prepared to deal with locks happening in shared memory on
> > the rows you've got cached out on the GPU, or hint bits, or the visibility
> > map being updated, etc...
> >
> It does not remain any state/information on the GPU side. Things related
> to PG internal stuff is job of CPU.

Right, good, I'm glad to hear that this approach is for doing things at
only a individual statement level and it's good to know that it can be
performant at that level now.

> > Well, we're going to need to expand that a bit for aggregates, I'm afraid,
> > but we should be able to define the API for those aggregates very tightly
> > based on what PG does today and require that any FDW purporting to provides
> > those aggregates do it the way PG does.  Note that this doesn't solve all
> > the problems- we've got other issues with regard to pushing aggregates down
> > into FDWs that need to be solved.
> >
> I see. It probably needs more detailed investigation.

These issues will hopefully not be a problem (or at least, one that can
be worked around) for non-FDW implementations which are part of core and
implemented in a similar way to the existing aggregates..  Where the
scan node could continue to be a simple SeqScan as it is today.

> The custom-scan API is thin abstraction towards the plan node interface,
> not tightly convinced with a particular use case, like GPU, remote-join
> and so on. So, I'm quite optimistic for the future maintainability.

I don't see how you can be when there hasn't been any discussion that
I've seen about how parallel query execution is going to change things
for us.

> Also, please remind the discussion at the last developer meeting.
> The purpose of custom-scan (we didn't name it at that time) is to avoid
> unnecessary project branch for people who want to implement their own
> special feature but no facilities to enhance optimizer/executor are
> supported.
> Even though we have in-core parallel execution feature by CPU, it also
> makes sense to provide some unique implementation that may be suitable
> for a specific region.

The issue here is that we're going to be expected to maintain an
interface once we provide it and so that isn't something we should be
doing lightly.  Particularly when it's as involved as this kind of
change is with what's going on in the backend where we are nearly 100%
sure to be changing things in the next release or two.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

27 February 2014, 00:25:56

> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > IIUC, his approach was integration of join-pushdown within FDW APIs,
> > however, it does not mean the idea of remote-join is rejected.
>
> For my part, trying to consider doing remote joins *without* going through
> FDWs is just nonsensical.  What are you joining remotely if not two foreign
> tables?
>
It is a case to be joined locally. If query has two foreign tables managed
by same server, this couple shall be found during the optimizer tries
various possible combinations.

> With regard to the GPU approach, if that model works whereby the
> normal PG tuples are read off disk, fed over to the GPU, processed, then
> returned back to the user through PG, then I wouldn't consider it really
> a 'remote' join but rather simply a new execution node inside of PG which
> is planned and costed just like the others.  We've been over the discussion
> already about trying to make that a pluggable system but the, very reasonable,
> push-back on that has been if it's really possible and really makes sense
> to be pluggable.  It certainly doesn't *have* to be- PostgreSQL is written
> in C, as we all know, and plenty of C code talks to GPUs and shuffles memory
> around- and that's almost exactly what Robert is working on supporting with
> regular CPUs and PG backends already.
>
> In many ways, trying to conflate this idea of using-GPUs-to-do-work with
> the idea of remote-FDW-joins has really disillusioned me with regard to
> the CustomScan approach.
>
Are you suggesting me to focus on the GPU stuff, rather than killing two birds
with a stone? It may be an approach, however, these have common part because
the plan-node for remote-join will pops tuples towards its upper node.
From viewpoint of the upper node, it looks like a black box that returns tuples
that joined two underlying relations. On the other hands, here is another black
box that returns tuples that scans or joins underlying relations with GPU assist.
Both of implementation detail is not visible for the upper node, but its external
interface is common. The custom-scan node can provide a pluggable way for both
of use-case.
Anyway, I'm not motivated to remote-join feature more than GPU-acceleration
stuff. If it is better to drop FDW's remote-join stuff from the custom-scan
scope, I don't claim it.

> > > Then perhaps they should be exposed more directly?  I can understand
> > > generally useful functionality being exposed in a way that anyone
> > > can use it, but we need to avoid interfaces which can't be stable
> > > due to normal / ongoing changes to the backend code.
> > >
> > The functions my patches want to expose are:
> >  - get_restriction_qual_cost()
> >  - fix_expr_common()
>
> I'll try and find time to go look at these in more detail later this week.
> I have reservations about exposing the current estimates on costs as we
> may want to adjust them in the future- but such adjustments may need to
> be made in balance with other changes throughout the system and an external
> module which depends on one result from the qual costing might end up having
> problems with the costing changes because the extension author wasn't aware
> of the other changes happening in other areas of the costing.
>
It is also the point of mine. If cost estimation logic is revised in
the future, it makes a problem if extension cuts and copies the code.

> I'm talking about this from a "beyond-just-the-GUCs" point of view, I
> realize that the extension author could go look at the GUC settings, but
> it's entirely reasonable to believe we'll make changes to the default GUC
> settings along with how they're used in the future.
>
Is the GUC something like Boolean that shows whether the new costing model
is applied or not? If so, extension needs to keep two cost estimation logics
within its code, isn't it?
If the GUC shows something like a weight, I also think it makes sense.

> > And, the functions my patches newly want are:
> >  - bms_to_string()
> >  - bms_from_string()
>
> Offhand, these look fine, if there's really an external use for them.
> Will try to look at them in more detail later.
>
At least, it makes sense to carry bitmap data structure on the private
field of custom-scan, because all the plan node has to be safe for
copyObject() manner.

> > > That's fine, if we can get data to and from those co-processors
> > > efficiently enough that it's worth doing so.  If moving the data to
> > > the GPU's memory will take longer than running the actual
> > > aggregation, then it doesn't make any sense for regular tables
> > > because then we'd have to cache the data in the GPU's memory in some
> > > way across multiple queries, which isn't something we're set up to do.
> > >
> > When I made a prototype implementation on top of FDW, using CUDA, it
> > enabled to run sequential scan 10 times faster than SeqScan on regular
> > tables, if qualifiers are enough complex.
> > Library to communicate GPU (OpenCL/CUDA) has asynchronous data
> > transfer mode using hardware DMA. It allows to hide the cost of data
> > transfer by pipelining, if here is enough number of records to be
> transferred.
>
> That sounds very interesting and certainly figuring out the costing to
> support that model will be tricky.  Also, shuffling the data around in that
> way will also be interesting.  It strikes me that it'll be made more
> difficult if we're trying to do it through the limitations of a pre-defined
> API between the core code and an extension.
>
This data shuffling is done within extension side, so it looks like the core
PG just picks up tuples from the box that handles underlying table scan in
some way.

> > Also, the recent trend of semiconductor device is GPU integration with
> > CPU, that shares a common memory space. See, Haswell of Intel, Kaveri
> > of AMD, or Tegra K1 of nvidia. All of them shares same memory, so no
> > need to transfer the data to be calculated. This trend is dominated by
> > physical law because of energy consumption by semiconductor. So, I'm
> optimistic for my idea.
>
> And this just makes me wonder why the focus isn't on the background worker
> approach instead of trying to do this all in an extension.
>
The GPU portion of above processors have different instruction set from CPU,
so we cannot utilize its parallel execution capability even if we launch
tons of background workers; that run existing CPU instructions.

> > Hmm... It seems to me we should follow the existing manner to
> > construct join path, rather than special handling. Even if a query
> > contains three or more foreign tables managed by same server, it shall
> > be consolidated into one remote join as long as its cost is less than
> local ones.
>
> I'm not convinced that it's going to be that simple, but I'm certainly
> interested in the general idea.
>
That is implemented in my part-3 patch, add_join_path hook adds custom-scan
path that joins two foreign tables, a foreign table and a custom-scan, or
two custom-scans if all of them are managed in same foreign server.
As long as its execution cost is reasonable, it allows to run remote join
that contains three or more relations.

> > So, I'd like to bed using the new add_join_path_hook to compute
> > possible join path. If remote join implemented by custom-scan is
> > cheaper than local join, it shall be chosen, then optimizer will try
> > joining with other foreign tables with this custom-scan node. If
> > remote-join is still cheap, then it shall be consolidated again.
>
> And I'm still unconvinced that trying to make this a hook and implemented
> by an extension makes sense.
>
The postgresAddJoinPaths() in my part-3 patch is doing that. Of course,
some portion of its code might have been supported at the code backend.
However, I don't think overall design is unreasonable than special handling.

> > > Admittedly, getting the costing right isn't easy either, but it's
> > > not clear to me how it'd make sense for the local server to be doing
> > > costing for remote servers.
> > >
> > Right now, I ignored the cost to run remote-server, focused on the
> > cost to transfer via network. It might be an idea to discount the CPU
> > cost of remote execution.
>
> Pretty sure we're going to need to consider the remote processing cost of
> the join as well..
>
I also think so, even though it is not done yet.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

27 February 2014, 00:53:42

> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> > I (plan to) use custom-scan of course. Once a relation is referenced
> > and optimizer decided GPU acceleration is cheaper, associated custom-
> > scan node read the data from underlying relation (or in-memory cache
> > if exists) then move to the shared memory buffer to deliver GPU
> > management background worker that launches asynchronous DMA one by one.
> > After that, custom-scan node receives filtered records via shared-
> > memory buffer, so it can construct tuples to be returned to the upper
> > node.
>
> Alright- but have you discussed this with Robert?  We're going to be
> whacking things around for parallel support with new nodes and more built-in
> helper functionality for doing this work and I'm not anxious to have
> CustomScan end up being a legacy interface that we're required to pull
> forward because we accepted it before things had settled.
>
I had briefly introduced him my idea using GPU at Ottawa last year,
even though I'm not certain he remembered it.
At least, idea of custom-scan node came from the discussion at that
time.

> > The custom-scan API is thin abstraction towards the plan node
> > interface, not tightly convinced with a particular use case, like GPU,
> > remote-join and so on. So, I'm quite optimistic for the future
> maintainability.
>
> I don't see how you can be when there hasn't been any discussion that I've
> seen about how parallel query execution is going to change things for us.
>
If parallel query execution changes whole of the structure of plan nodes,
it will also affect to the interface of custom-scan because it is a thin-
abstraction of plan-node. However, if parallel execution feature is
implemented as one of new plan node in addition to existing one, I cannot
imagine a scenario that affects to the structure of another plan node.

> > Also, please remind the discussion at the last developer meeting.
> > The purpose of custom-scan (we didn't name it at that time) is to
> > avoid unnecessary project branch for people who want to implement
> > their own special feature but no facilities to enhance
> > optimizer/executor are supported.
> > Even though we have in-core parallel execution feature by CPU, it also
> > makes sense to provide some unique implementation that may be suitable
> > for a specific region.
>
> The issue here is that we're going to be expected to maintain an interface
> once we provide it and so that isn't something we should be doing lightly.
> Particularly when it's as involved as this kind of change is with what's
> going on in the backend where we are nearly 100% sure to be changing things
> in the next release or two.
>
FDW APIs are also revised several times in the recent releases. If we can
design "perfect" interface from the beginning, it's best but usually hard
to design.
Also, custom-scan interface is almost symmetric with existing plan node
structures, so its stability is relatively high, I think.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

28 February 2014, 15:35:37

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> * Stephen Frost (sfrost@snowman.net) wrote:
> > I don't see how you can be when there hasn't been any discussion that I've
> > seen about how parallel query execution is going to change things for us.
> >
> If parallel query execution changes whole of the structure of plan nodes,
> it will also affect to the interface of custom-scan because it is a thin-
> abstraction of plan-node. However, if parallel execution feature is
> implemented as one of new plan node in addition to existing one, I cannot
> imagine a scenario that affects to the structure of another plan node.

Let's just say that I have doubts that we'll be able to implement
parallel execution *without* changing the plan node interface in some
way which will require, hopefully minor, changes to all of the nodes.
The issue is that even a minor change would break the custom-scan API
and we'd immediately be in the boat of dealing with complaints regarding
backwards compatibility.  Perhaps we can hand-wave that, and we've had
some success changing hook APIs between major releases, but such changes
may also be in ways which wouldn't break in obvious ways or even
possibly be changes which have to be introduced into back-branches.
Parallel query is going to be brand-new real soon and it's reasonable to
think we'll need to make bug-fix changes to it after it's out which
might even involve changes to the API which is developed for it.

> > The issue here is that we're going to be expected to maintain an interface
> > once we provide it and so that isn't something we should be doing lightly.
> > Particularly when it's as involved as this kind of change is with what's
> > going on in the backend where we are nearly 100% sure to be changing things
> > in the next release or two.
> >
> FDW APIs are also revised several times in the recent releases. If we can
> design "perfect" interface from the beginning, it's best but usually hard
> to design.

Sure, but FDWs also have a *much* broader set of use-cases, in my view,
which is also why I was pushing to work on join-push-down to happen
there instead of having this kind of a hook interface, which I don't
think we'd want to directly expose as part of the 'official FDW API' as
it ends up putting all the work on the FDW with little aide, making it
terribly likely to end up with a bunch of duplciated code in the FDWs
from the backend to deal with it, particularly for individuals writing
FDWs who aren't familiar with what PG already has.

> Also, custom-scan interface is almost symmetric with existing plan node
> structures, so its stability is relatively high, I think.

Perhaps it will come to pass that parallel query execution doesn't
require any changes to the plan node structure, but that's not the horse
that I'd bet on at this point.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

01 March 2014, 00:00:36

2014-03-01 0:36 GMT+09:00 Stephen Frost <sfrost@snowman.net>:
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>> * Stephen Frost (sfrost@snowman.net) wrote:
>> > I don't see how you can be when there hasn't been any discussion that I've
>> > seen about how parallel query execution is going to change things for us.
>> >
>> If parallel query execution changes whole of the structure of plan nodes,
>> it will also affect to the interface of custom-scan because it is a thin-
>> abstraction of plan-node. However, if parallel execution feature is
>> implemented as one of new plan node in addition to existing one, I cannot
>> imagine a scenario that affects to the structure of another plan node.
>
> Let's just say that I have doubts that we'll be able to implement
> parallel execution *without* changing the plan node interface in some
> way which will require, hopefully minor, changes to all of the nodes.
> The issue is that even a minor change would break the custom-scan API
> and we'd immediately be in the boat of dealing with complaints regarding
> backwards compatibility.  Perhaps we can hand-wave that, and we've had
> some success changing hook APIs between major releases, but such changes
> may also be in ways which wouldn't break in obvious ways or even
> possibly be changes which have to be introduced into back-branches.
> Parallel query is going to be brand-new real soon and it's reasonable to
> think we'll need to make bug-fix changes to it after it's out which
> might even involve changes to the API which is developed for it.
>
Even if we will change the plan-node interface in the future release,
it shall not be a change that makes the existing stuff impossible.
The custom-scan API is designed to provide alternative way to scan
or join relations, in addition to the existing logic like SeqScan or
NestLoop. If this change breaks plan-node interfaces and it couldn't
implement existing stuff, it is problematic for all the stuff, not only
custom-scan node. I don't think such a change that makes impossible
to implement existing logic will be merged. Likely, the new parallel
execution feature can work together existing sequential logic and
custom-scan interface.

BTW, this kind of discussion looks like a talk with a ghost because
we cannot see the new interface according to the parallel execution
right now, so we cannot have tangible investigation whether it becomes
really serious backward incompatibility, or not.
My bet is minor one. I cannot imagine plan-node interface that does
not support existing non-parallel SeqScan or NestLoop and so on.

>> > The issue here is that we're going to be expected to maintain an interface
>> > once we provide it and so that isn't something we should be doing lightly.
>> > Particularly when it's as involved as this kind of change is with what's
>> > going on in the backend where we are nearly 100% sure to be changing things
>> > in the next release or two.
>> >
>> FDW APIs are also revised several times in the recent releases. If we can
>> design "perfect" interface from the beginning, it's best but usually hard
>> to design.
>
> Sure, but FDWs also have a *much* broader set of use-cases, in my view,
> which is also why I was pushing to work on join-push-down to happen
> there instead of having this kind of a hook interface, which I don't
> think we'd want to directly expose as part of the 'official FDW API' as
> it ends up putting all the work on the FDW with little aide, making it
> terribly likely to end up with a bunch of duplciated code in the FDWs
> from the backend to deal with it, particularly for individuals writing
> FDWs who aren't familiar with what PG already has.
>
It might not be a good idea to use postgres_fdw as a basis of proof-
of-concept to demonstrate that custom-scan can effectively host
an alternative way to join; that fetches the result set of remote-join
as if relation scan, even though it demonstrated it is possible.
So, I never mind the part-3 portion (remote join of postgres_fdw on
top of custom-scan) being dropped from the submission.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

01 March 2014, 13:37:19

KaiGai,

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:
> BTW, this kind of discussion looks like a talk with a ghost because
> we cannot see the new interface according to the parallel execution
> right now, so we cannot have tangible investigation whether it becomes
> really serious backward incompatibility, or not.

Yeah, it would certainly be nice if we had all of the answers up-front.
What I keep hoping for is that someone who has been working on this area
(eg: Robert) would speak up...

> My bet is minor one. I cannot imagine plan-node interface that does
> not support existing non-parallel SeqScan or NestLoop and so on.

Sure you can- because once we change the interface, we're probably going
to go through and make everything use the new one rather than have to
special-case things.  That's more-or-less exactly my point here because
having an external hook like CustomScan would make that kind of
wholesale change more difficult.

That does *not* mean I'm against using GPUs and GPU optimizations.  What
it means is that I'd rather see that done in core, which would allow us
to simply change that interface along with the rest when doing wholesale
changes and not have to worry about backwards compatibility and breaking
other people's code.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

02 March 2014, 00:17:21

2014-03-01 22:38 GMT+09:00 Stephen Frost <sfrost@snowman.net>:
> KaiGai,
>
> * Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:
>> BTW, this kind of discussion looks like a talk with a ghost because
>> we cannot see the new interface according to the parallel execution
>> right now, so we cannot have tangible investigation whether it becomes
>> really serious backward incompatibility, or not.
>
> Yeah, it would certainly be nice if we had all of the answers up-front.
> What I keep hoping for is that someone who has been working on this area
> (eg: Robert) would speak up...
>
I'd also like to see his opinion.

>> My bet is minor one. I cannot imagine plan-node interface that does
>> not support existing non-parallel SeqScan or NestLoop and so on.
>
> Sure you can- because once we change the interface, we're probably going
> to go through and make everything use the new one rather than have to
> special-case things.  That's more-or-less exactly my point here because
> having an external hook like CustomScan would make that kind of
> wholesale change more difficult.
>
I think, we should follow the general rule in case of custom-scan also.
In other words, it's responsibility of extension's author to follow up the
latest specification of interfaces.
For example, I have an extension module that is unable to work on the
latest PG- code because of interface changes at ProcessUtility_hook.
Is it a matter of backward incompatibility? Definitely, no. It should be
my job.

> That does *not* mean I'm against using GPUs and GPU optimizations.  What
> it means is that I'd rather see that done in core, which would allow us
> to simply change that interface along with the rest when doing wholesale
> changes and not have to worry about backwards compatibility and breaking
> other people's code.
>
I also have to introduce a previous background discussion.
Now we have two options for GPU programming: CUDA or OpenCL.
Both of libraries and drivers are provided under the proprietary license,
so it does not fit for the core implementation of PostgreSQL, but
extensions that shall be installed on user's responsibility.
Because of the story, I brought up a discussion about pluggable
planner/executor node (as a basis of GPU acceleration) in the last
developer meeting, then has worked for custom-scan node interface.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

02 March 2014, 00:50:35

KaiGai,

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:
> Now we have two options for GPU programming: CUDA or OpenCL.
> Both of libraries and drivers are provided under the proprietary license,
> so it does not fit for the core implementation of PostgreSQL, but
> extensions that shall be installed on user's responsibility.

Being able to work with libraries which are not BSD licensed doesn't
change the license under which PostgreSQL code is released.  Nor does it
require PostgreSQL to be licensed in any different way from how it is
today.  Where it would get a bit ugly, I agree, is for the packagers who
have to decide if they want to build against those libraries or not.  We
might be able to make things a bit easier by having a startup-time
determination of if these nodes are able to be used or not.  This isn't
unlike OpenSSL which certainly isn't BSD nor is it even GPL-compatible,
a situation which causes a great deal of difficulty already due to the
whole readline nonsense- but it's difficulty for the packagers, not for
the PostgreSQL project, per se.

> Because of the story, I brought up a discussion about pluggable
> planner/executor node (as a basis of GPU acceleration) in the last
> developer meeting, then has worked for custom-scan node interface.

And I'm still unconvinced of this approach and worry that it's going to
break more often than it works.  That's my 2c on it, but I won't get in
the way if someone else wants to step up and support it.  As I mentioned
up-thread, I'd really like to see FDW join push-down, FDW aggregate
push-down, parallel query execution, and parallel remote-FDW execution
and I don't see this CustomScan approach as the right answer to any of
those.
Thanks,    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

02 March 2014, 01:18:22

2014-03-02 9:51 GMT+09:00 Stephen Frost <sfrost@snowman.net>:
> KaiGai,
>
> * Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:
>> Now we have two options for GPU programming: CUDA or OpenCL.
>> Both of libraries and drivers are provided under the proprietary license,
>> so it does not fit for the core implementation of PostgreSQL, but
>> extensions that shall be installed on user's responsibility.
>
> Being able to work with libraries which are not BSD licensed doesn't
> change the license under which PostgreSQL code is released.  Nor does it
> require PostgreSQL to be licensed in any different way from how it is
> today.  Where it would get a bit ugly, I agree, is for the packagers who
> have to decide if they want to build against those libraries or not.  We
> might be able to make things a bit easier by having a startup-time
> determination of if these nodes are able to be used or not.  This isn't
> unlike OpenSSL which certainly isn't BSD nor is it even GPL-compatible,
> a situation which causes a great deal of difficulty already due to the
> whole readline nonsense- but it's difficulty for the packagers, not for
> the PostgreSQL project, per se.
>
As you mentioned, it is a headache for packagers, and does not make
sense for us if packager disabled the feature that requires proprietary
drivers. In fact, Fedora / RHEL does not admit to distribute software
under the none open source software license. Obviously, nvidia's cuda
is a library being distributed under the proprietary license, thus out of
the scope for the Linux distributors. It also leads them to turn off the
feature that shall be linked with proprietary drivers.
All we can do is to implement these features as extension, then offer
an option for users to use or not to use.

>> Because of the story, I brought up a discussion about pluggable
>> planner/executor node (as a basis of GPU acceleration) in the last
>> developer meeting, then has worked for custom-scan node interface.
>
> And I'm still unconvinced of this approach and worry that it's going to
> break more often than it works.  That's my 2c on it, but I won't get in
> the way if someone else wants to step up and support it.  As I mentioned
> up-thread, I'd really like to see FDW join push-down, FDW aggregate
> push-down, parallel query execution, and parallel remote-FDW execution
> and I don't see this CustomScan approach as the right answer to any of
> those.
>
It's right approach for FDW functionality enhancement, I never opposed to.

What I'd like to implement is GPU acceleration that can perform on
regular tables, not only foreign tables. Also, regarding to the backlog
in the developer meeting, pluggable executor node is also required
feature by PG-XC folks to work their project with upstream.
I think custom-scan feature is capable to host these requirement,
and does not prevent enhancement FDW features.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

02 March 2014, 01:28:04

* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:
> As you mentioned, it is a headache for packagers, and does not make
> sense for us if packager disabled the feature that requires proprietary
> drivers.

No, I disagree with that.  I don't expect this use-case to be very
common to begin with and telling individuals that they have to compile
it themselves is certainly not out of the question.

> In fact, Fedora / RHEL does not admit to distribute software
> under the none open source software license.

I'm pretty confident you can get RPMs for those distributions.

> Obviously, nvidia's cuda
> is a library being distributed under the proprietary license, thus out of
> the scope for the Linux distributors.

This also doesn't make any sense to me- certainly the CUDA libraries are
available under Debian derivatives, along with open-source wrappers for
them like pycuda.

> It also leads them to turn off the
> feature that shall be linked with proprietary drivers.
> All we can do is to implement these features as extension, then offer
> an option for users to use or not to use.

No, we can tell individuals who want it that they're going to need to
build with support for it.  We don't offer OpenSSL as an extension (I
certainly wish we did- and had a way to replace it w/ GNUTLS or one of
the better licensed options).

> What I'd like to implement is GPU acceleration that can perform on
> regular tables, not only foreign tables. Also, regarding to the backlog
> in the developer meeting, pluggable executor node is also required
> feature by PG-XC folks to work their project with upstream.
> I think custom-scan feature is capable to host these requirement,
> and does not prevent enhancement FDW features.

I think you're conflating things again- while it might be possible to
use CustomScan to implement FDW join-pushdown or FDW aggregate-pushdown,
*I* don't think it's the right approach.  Regarding the PG-XC
requirement, I expect they're looking for FDW join/aggregate-pushdown
and also see that it *could* be done w/ CustomScan.

We could punt on the whole thing and drop in hooks which could be used
to replace everything done from the planner through to the executor and
then anyone *could* implement any of the above, and parallel query too.
That doesn't make it the right approach.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

02 March 2014, 01:34:57

On Wed, Feb 26, 2014 at 3:02 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> The custom-scan node is intended to perform on regular relations, not
>> only foreign tables. It means a special feature (like GPU acceleration)
>> can perform transparently for most of existing applications. Usually,
>> it defines regular tables for their work on installation, not foreign
>> tables. It is the biggest concern for me.
>
> The line between a foreign table and a local one is becoming blurred
> already, but still, if this is the goal then I really think the
> background worker is where you should be focused, not on this Custom
> Scan API.  Consider that, once we've got proper background workers,
> we're going to need new nodes which operate in parallel (or some other
> rejiggering of the nodes- I don't pretend to know exactly what Robert is
> thinking here, and I've apparently forgotten it if he's posted it
> somewhere) and those interfaces may drive changes which would impact the
> Custom Scan API- or worse, make us deprecate or regret having added it
> because now we'll need to break backwards compatibility to add in the
> parallel node capability to satisfy the more general non-GPU case.

This critique seems pretty odd to me.  I haven't had the time to look
at this patch set, but I don't see why anyone would want to use the
background worker facility for GPU acceleration, which is what
KaiGai's trying to accomplish here.  Surely you want, if possible, to
copy the data directly from the user backend into the GPU's working
memory.  What would the use of a background worker be?  We definitely
need background workers to make use of additional *CPUs*, but I don't
see what good they are in leveraging *GPUs*.

I seriously doubt there's any real conflict with parallelism here.
Parallelism may indeed add more ways to scan a relation
(ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
shouldn't have a Custom Scan node too.  Indeed, my principle concern
about this patch set isn't that it's too specialized, as you seem to
be worrying about, but that it's aiming to satisfy *too many* use
cases.  I think FDW join pushdown is a fairly different problem from
adding a custom scan type, and I doubt one patch should try to solve
both problems.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

02 March 2014, 01:38:17

On Wed, Feb 26, 2014 at 10:23 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>> IIUC, his approach was integration of join-pushdown within FDW APIs,
>> however, it does not mean the idea of remote-join is rejected.
>
> For my part, trying to consider doing remote joins *without* going
> through FDWs is just nonsensical.

That is, of course, true by definition, but I think it's putting the
focus in the wrong place.  It's possible that there are other cases
when a scan might a plausible path for a joinrel even if there are no
foreign tables in play.  For example, you could cache the joinrel
output and then inject a cache scan as a path for the joinrel.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

02 March 2014, 01:48:06

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Wed, Feb 26, 2014 at 3:02 AM, Stephen Frost <sfrost@snowman.net> wrote:
> > The line between a foreign table and a local one is becoming blurred
> > already, but still, if this is the goal then I really think the
> > background worker is where you should be focused, not on this Custom
> > Scan API.  Consider that, once we've got proper background workers,
> > we're going to need new nodes which operate in parallel (or some other
> > rejiggering of the nodes- I don't pretend to know exactly what Robert is
> > thinking here, and I've apparently forgotten it if he's posted it
> > somewhere) and those interfaces may drive changes which would impact the
> > Custom Scan API- or worse, make us deprecate or regret having added it
> > because now we'll need to break backwards compatibility to add in the
> > parallel node capability to satisfy the more general non-GPU case.
>
> This critique seems pretty odd to me.  I haven't had the time to look
> at this patch set, but I don't see why anyone would want to use the
> background worker facility for GPU acceleration, which is what
> KaiGai's trying to accomplish here.

Eh, that didn't come out quite right.  I had intended it to be more
along the lines of "look at what Robert's doing".

I was trying to point out that parallel query execution is coming soon
thanks to the work on background worker and that parallel query
execution might drive changes to the way nodes interact in the executor
driving a need to change the API.  In other words, CustomScan could
easily end up being broken by that change and I'd rather we not have to
worry about such breakage.

> I seriously doubt there's any real conflict with parallelism here.
> Parallelism may indeed add more ways to scan a relation
> (ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
> shouldn't have a Custom Scan node too.

What about parallel execution through the tree itself, rather than just
at specific end nodes like SeqScan and IndexScan?  Or parallel
aggregates?  I agree that simple parallel SeqScan/IndexScan isn't going
to change any of the interfaces, but surely we're going for more than
that.  Indeed, I'm wishing that I had found more time to spend on just
a simple select-based Append node which could parallelize I/O across
tablespaces and FDWs underneath the Append.

> Indeed, my principle concern
> about this patch set isn't that it's too specialized, as you seem to
> be worrying about, but that it's aiming to satisfy *too many* use
> cases.  I think FDW join pushdown is a fairly different problem from
> adding a custom scan type, and I doubt one patch should try to solve
> both problems.

Yeah, I've voiced those same concerns later in the thread also,
specifically that this punts on nearly everything and expects the
implementor to figure it all out.  We should be able to do better wrt
FDW join-pushdown, etc.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

02 March 2014, 01:48:26

On Fri, Feb 28, 2014 at 10:36 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>> * Stephen Frost (sfrost@snowman.net) wrote:
>> > I don't see how you can be when there hasn't been any discussion that I've
>> > seen about how parallel query execution is going to change things for us.
>> >
>> If parallel query execution changes whole of the structure of plan nodes,
>> it will also affect to the interface of custom-scan because it is a thin-
>> abstraction of plan-node. However, if parallel execution feature is
>> implemented as one of new plan node in addition to existing one, I cannot
>> imagine a scenario that affects to the structure of another plan node.
>
> Let's just say that I have doubts that we'll be able to implement
> parallel execution *without* changing the plan node interface in some
> way which will require, hopefully minor, changes to all of the nodes.
> The issue is that even a minor change would break the custom-scan API
> and we'd immediately be in the boat of dealing with complaints regarding
> backwards compatibility.  Perhaps we can hand-wave that, and we've had
> some success changing hook APIs between major releases, but such changes
> may also be in ways which wouldn't break in obvious ways or even
> possibly be changes which have to be introduced into back-branches.
> Parallel query is going to be brand-new real soon and it's reasonable to
> think we'll need to make bug-fix changes to it after it's out which
> might even involve changes to the API which is developed for it.

For what it's worth, and I can't claim to have all the answers here,
this doesn't match my expectation.  I think we'll do two kinds of
parallelism.  One will be parallelism within nodes, like parallel sort
or parallel seqscan.  Any node we parallelize this way is likely to be
heavily rewritten, or else to get a sister that looks quite different
from the original.  The other kind of parallelism will involve pushing
a whole subtree of the plan into a different node.  In this case we'll
need to pass data between nodes in some different way (this was one of
the major reasons I designed the shm_mq stuff) but the nodes
themselves should change little if at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

02 March 2014, 01:54:30

On Sat, Mar 1, 2014 at 8:49 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> This critique seems pretty odd to me.  I haven't had the time to look
>> at this patch set, but I don't see why anyone would want to use the
>> background worker facility for GPU acceleration, which is what
>> KaiGai's trying to accomplish here.
>
> Eh, that didn't come out quite right.  I had intended it to be more
> along the lines of "look at what Robert's doing".
>
> I was trying to point out that parallel query execution is coming soon
> thanks to the work on background worker and that parallel query
> execution might drive changes to the way nodes interact in the executor
> driving a need to change the API.  In other words, CustomScan could
> easily end up being broken by that change and I'd rather we not have to
> worry about such breakage.

I think the relation is pretty tangential.  I'm worried about the
possibility that the Custom Scan API is broken *ab initio*, but I'm
not worried about a conflict with parallel query.

>> I seriously doubt there's any real conflict with parallelism here.
>> Parallelism may indeed add more ways to scan a relation
>> (ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we
>> shouldn't have a Custom Scan node too.
>
> What about parallel execution through the tree itself, rather than just
> at specific end nodes like SeqScan and IndexScan?  Or parallel
> aggregates?  I agree that simple parallel SeqScan/IndexScan isn't going
> to change any of the interfaces, but surely we're going for more than
> that.  Indeed, I'm wishing that I had found more time to spend on just
> a simple select-based Append node which could parallelize I/O across
> tablespaces and FDWs underneath the Append.

Well, as I said in another recent reply that you probably got after
sending this, when you split between nodes, that mostly just has to do
with how you funnel the tuples from one node to another.  The nodes
themselves probably don't even need to know.  Or at least that's what
I'd hope.

I don't see that parallelizing Append is any easier than any other
problem in this space.  There's no parallel I/O facility, so you need
a background worker per append branch to wait on I/O.  And you have
all the problems of making sure that the workers have the same
snapshot, making sure they don't self-deadlock, etc. that you have for
any other case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

02 March 2014, 02:03:08

2014-03-02 10:29 GMT+09:00 Stephen Frost <sfrost@snowman.net>:
> * Kohei KaiGai (kaigai@kaigai.gr.jp) wrote:
>> As you mentioned, it is a headache for packagers, and does not make
>> sense for us if packager disabled the feature that requires proprietary
>> drivers.
>
> No, I disagree with that.  I don't expect this use-case to be very
> common to begin with and telling individuals that they have to compile
> it themselves is certainly not out of the question.
>
>> In fact, Fedora / RHEL does not admit to distribute software
>> under the none open source software license.
>
> I'm pretty confident you can get RPMs for those distributions.
>
>> Obviously, nvidia's cuda
>> is a library being distributed under the proprietary license, thus out of
>> the scope for the Linux distributors.
>
> This also doesn't make any sense to me- certainly the CUDA libraries are
> available under Debian derivatives, along with open-source wrappers for
> them like pycuda.
>
>> It also leads them to turn off the
>> feature that shall be linked with proprietary drivers.
>> All we can do is to implement these features as extension, then offer
>> an option for users to use or not to use.
>
> No, we can tell individuals who want it that they're going to need to
> build with support for it.  We don't offer OpenSSL as an extension (I
> certainly wish we did- and had a way to replace it w/ GNUTLS or one of
> the better licensed options).
>
I know there is some alternative ways. However, it requires users to take
additional knowledge and setting up efforts, also loses the benefit to use
distributor's Linux if alternative RPMs are required.
I don't want to recommend users such a complicated setting up procedure.

>> What I'd like to implement is GPU acceleration that can perform on
>> regular tables, not only foreign tables. Also, regarding to the backlog
>> in the developer meeting, pluggable executor node is also required
>> feature by PG-XC folks to work their project with upstream.
>> I think custom-scan feature is capable to host these requirement,
>> and does not prevent enhancement FDW features.
>
> I think you're conflating things again- while it might be possible to
> use CustomScan to implement FDW join-pushdown or FDW aggregate-pushdown,
> *I* don't think it's the right approach.  Regarding the PG-XC
> requirement, I expect they're looking for FDW join/aggregate-pushdown
> and also see that it *could* be done w/ CustomScan.
>
The reason why I submitted the part-3 patch (that enhances postgres_fdw
for remote-join using custom-scan) is easy to demonstrate the usage of
join-replacement by a special scan, with minimum scale of the patch to be
reviewed. If we have another idea to demonstrate it, I don't stick on the remot-
join feature on foreign tables.
Regarding to the PG-XC, I didn't know their exact needs because I didn't
attend the cluster meeting, but the someone mentioned about pluggable
plan/exec node in this context.

> We could punt on the whole thing and drop in hooks which could be used
> to replace everything done from the planner through to the executor and
> then anyone *could* implement any of the above, and parallel query too.
> That doesn't make it the right approach.
>
That is a problem I pointed out in the last developer meeting. Because we
have no way to enhance a part of plan / exec logic by extension, extension
has to replace whole of the planner / executor using hooks. It is painful for
authors of extensions.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

02 March 2014, 02:03:23

* Robert Haas (robertmhaas@gmail.com) wrote:
> I don't see that parallelizing Append is any easier than any other
> problem in this space.  There's no parallel I/O facility, so you need
> a background worker per append branch to wait on I/O.  And you have
> all the problems of making sure that the workers have the same
> snapshot, making sure they don't self-deadlock, etc. that you have for
> any other case.

Erm, my thought was to use a select() loop which sends out I/O requests
and then loops around waiting to see who finishes it.  It doesn't
parallelize the CPU cost of getting the rows back to the caller, but
it'd at least parallelize the I/O, and if what's underneath is actually
a remote FDW running a complex query (because the other side is actually
a view), it would be a massive win to have all the remote FDWs executing
concurrently instead of serially as we have today.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

02 March 2014, 02:07:13

* Robert Haas (robertmhaas@gmail.com) wrote:
> For what it's worth, and I can't claim to have all the answers here,
> this doesn't match my expectation.  I think we'll do two kinds of
> parallelism.  One will be parallelism within nodes, like parallel sort
> or parallel seqscan.  Any node we parallelize this way is likely to be
> heavily rewritten, or else to get a sister that looks quite different
> from the original.

Sure.

> The other kind of parallelism will involve pushing
> a whole subtree of the plan into a different node.  In this case we'll
> need to pass data between nodes in some different way (this was one of
> the major reasons I designed the shm_mq stuff) but the nodes
> themselves should change little if at all.

It's that "some different way" of passing data between the nodes that
makes me worry, but I hope you're right and we won't actually need to
change the interfaces or the nodes very much.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

02 March 2014, 02:36:31

2014-03-02 10:38 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> On Wed, Feb 26, 2014 at 10:23 AM, Stephen Frost <sfrost@snowman.net> wrote:
>> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>>> IIUC, his approach was integration of join-pushdown within FDW APIs,
>>> however, it does not mean the idea of remote-join is rejected.
>>
>> For my part, trying to consider doing remote joins *without* going
>> through FDWs is just nonsensical.
>
> That is, of course, true by definition, but I think it's putting the
> focus in the wrong place.  It's possible that there are other cases
> when a scan might a plausible path for a joinrel even if there are no
> foreign tables in play.  For example, you could cache the joinrel
> output and then inject a cache scan as a path for the joinrel.
>
That might be an idea to demonstrate usage of custom-scan node,
rather than the (ad-hoc) enhancement of postgres_fdw.
As I have discussed in another thread, it is available to switch heap
reference by cache reference on the fly, it shall be a possible use-
case for custom-scan node.

So, I'm inclined to drop the portion for postgres_fdw in my submission
to focus on custom-scan capability.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

03 March 2014, 15:31:12

On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> I don't see that parallelizing Append is any easier than any other
>> problem in this space.  There's no parallel I/O facility, so you need
>> a background worker per append branch to wait on I/O.  And you have
>> all the problems of making sure that the workers have the same
>> snapshot, making sure they don't self-deadlock, etc. that you have for
>> any other case.
>
> Erm, my thought was to use a select() loop which sends out I/O requests
> and then loops around waiting to see who finishes it.  It doesn't
> parallelize the CPU cost of getting the rows back to the caller, but
> it'd at least parallelize the I/O, and if what's underneath is actually
> a remote FDW running a complex query (because the other side is actually
> a view), it would be a massive win to have all the remote FDWs executing
> concurrently instead of serially as we have today.

I can't really make sense of this.  In general, what's under each
branch of an append node is an arbitrary plan tree, and the only
operation you can count on being able to do for each is "get me the
next tuple" (i.e. ExecProcNode).  Append has no idea whether that
involves disk I/O or for what blocks.  But even if it did, there's no
standard API for issuing an asynchronous read(), which is how we get
blocks into shared buffers.  We do have an API for requesting the
prefetch of a block on platforms with posix_fadvise(), but can't find
out whether it's completed using select(), and even if you could you
still have to do the actual read() afterwards.

For FDWs, one idea might be to kick off the remote query at
ExecInitNode() time rather than ExecProcNode() time, at least if the
remote query doesn't depend on parameters that aren't available until
run time.  That actually would allow multiple remote queries to run
simultaneously or in parallel with local work.  It would also run them
in cases where the relevant plan node is never executed, which would
be bad but perhaps rare enough not to worry about.  Or we could add a
new API like ExecPrefetchNode() that tells nodes to prepare to have
tuples pulled, and they can do things like kick off asynchronous
queries.  But I still don't see any clean way for the Append node to
find out which one is ready to return results first.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

03 March 2014, 15:42:38

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > Erm, my thought was to use a select() loop which sends out I/O requests
> > and then loops around waiting to see who finishes it.  It doesn't
> > parallelize the CPU cost of getting the rows back to the caller, but
> > it'd at least parallelize the I/O, and if what's underneath is actually
> > a remote FDW running a complex query (because the other side is actually
> > a view), it would be a massive win to have all the remote FDWs executing
> > concurrently instead of serially as we have today.
>
> I can't really make sense of this.

Sorry, that was a bit hand-wavey since I had posted about it previously
here:
http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net

It'd clearly be more involved than "just build a select() loop" and
would require adding an async mechanism.  I had been thinking about this
primairly with the idea of FDWs and you're right that it'd require more
thought to deal with getting data into/through shared_buffers.  Still,
we seqscan into a ring buffer, I'd think we could make it work but it
would require additional work.

> For FDWs, one idea might be to kick off the remote query at
> ExecInitNode() time rather than ExecProcNode() time, at least if the
> remote query doesn't depend on parameters that aren't available until
> run time.

Right, I had speculated about that also (option #2 in my earlier email).

> That actually would allow multiple remote queries to run
> simultaneously or in parallel with local work.  It would also run them
> in cases where the relevant plan node is never executed, which would
> be bad but perhaps rare enough not to worry about.

This was my primary concern, along with the fact that we explicitly says
"don't do that" in the docs for the FDW API.

> Or we could add a
> new API like ExecPrefetchNode() that tells nodes to prepare to have
> tuples pulled, and they can do things like kick off asynchronous
> queries.  But I still don't see any clean way for the Append node to
> find out which one is ready to return results first.

Yeah, that's tricky.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

03 March 2014, 15:55:02

On Mon, Mar 3, 2014 at 10:43 AM, Stephen Frost <sfrost@snowman.net> wrote:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > Erm, my thought was to use a select() loop which sends out I/O requests
>> > and then loops around waiting to see who finishes it.  It doesn't
>> > parallelize the CPU cost of getting the rows back to the caller, but
>> > it'd at least parallelize the I/O, and if what's underneath is actually
>> > a remote FDW running a complex query (because the other side is actually
>> > a view), it would be a massive win to have all the remote FDWs executing
>> > concurrently instead of serially as we have today.
>>
>> I can't really make sense of this.
>
> Sorry, that was a bit hand-wavey since I had posted about it previously
> here:
> http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net

Huh, somehow I can't remember reading that... but I didn't think I had
missed any posts, either.  Evidently I did.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

03 March 2014, 16:08:03

* Robert Haas (robertmhaas@gmail.com) wrote:
> > http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net
>
> Huh, somehow I can't remember reading that... but I didn't think I had
> missed any posts, either.  Evidently I did.

You and everyone else- you'll note it got exactly zero responses..

Ah well. :)
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

03 March 2014, 22:14:49

KaiGai,

* Stephen Frost (sfrost@snowman.net) wrote:
> And I'm still unconvinced of this approach and worry that it's going to
> break more often than it works.  That's my 2c on it, but I won't get in
> the way if someone else wants to step up and support it.

Alright, having heard from Robert (thanks!) regarding his thoughts
(which are pretty similar to my own, though he doesn't anticipate issues
with API changes), I'm going to step back a bit form the above position.

> As I mentioned
> up-thread, I'd really like to see FDW join push-down, FDW aggregate
> push-down, parallel query execution, and parallel remote-FDW execution
> and I don't see this CustomScan approach as the right answer to any of
> those.

In accordance with the above, what I'd like to see with this patch is
removal of the postgres_fdw changes and any changes which were for that
support.  In addition, I'd like to understand why 'ctidscan' makes any
sense to have as an example of what to use this for- if that's valuable,
why wouldn't we simply implement that in core?  I do want an example in
contrib of how to properly use this capability, but I don't think that's
it.

For one thing, an example where you could have this CustomScan node
calling other nodes underneath would be interesting.  I realize the CTID
scan can't do that directly but I would think your GPU-based system
could; after all, if you're running a join or an aggregate with the GPU,
the rows could come from nearly anything.  Have you considered that, or
is the expectation that users will just go off and access the heap
and/or whatever indexes directly, like ctidscan does?  How would such a
requirement be handled?
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

03 March 2014, 23:26:34

> > As I mentioned
> > up-thread, I'd really like to see FDW join push-down, FDW aggregate
> > push-down, parallel query execution, and parallel remote-FDW execution
> > and I don't see this CustomScan approach as the right answer to any of
> > those.
>
> In accordance with the above, what I'd like to see with this patch is removal
> of the postgres_fdw changes and any changes which were for that support.
>
I don't argue this approach. It might be useful to *demonstrate* how custom-
scan node works as replacement of join, however,

> In addition, I'd like to understand why 'ctidscan' makes any sense to have
> as an example of what to use this for- if that's valuable, why wouldn't
> we simply implement that in core?  I do want an example in contrib of how
> to properly use this capability, but I don't think that's it.
>
Do you think it makes sense if my submission was only interface portion
without working example? The purpose of ctidscan module is, similar to
postgres_fdw, to demonstrate the usage of custom-scan interface with
enough small code scale. If tons of code example were attached, nobody
will want to review the patch.
The "cache_scan" module that I and Haribabu are discussing in another
thread also might be a good demonstration for custom-scan interface,
however, its code scale is a bit larger than ctidscan.

> For one thing, an example where you could have this CustomScan node calling
> other nodes underneath would be interesting.  I realize the CTID scan can't
> do that directly but I would think your GPU-based system could; after all,
> if you're running a join or an aggregate with the GPU, the rows could come
> from nearly anything.  Have you considered that, or is the expectation that
> users will just go off and access the heap and/or whatever indexes directly,
> like ctidscan does?  How would such a requirement be handled?
>
In case when custom-scan node has underlying nodes, it shall be invoked using
ExecProcNode as built-in node doing, then it will be able to fetch tuples
come from underlying nodes. Of course, custom-scan provider can perform the
tuples come from somewhere as if it came from underlying relation. It is
responsibility of extension module. In some cases, it shall be required to
return junk system attribute, like ctid, for row-level locks or table updating.
It is also responsibility of the extension module (or, should not add custom-
path if this custom-scan provider cannot perform as required).

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Ashutosh Bapat

Date:

04 March 2014, 04:11:31

On Mon, Mar 3, 2014 at 9:13 PM, Stephen Frost <sfrost@snowman.net> wrote:

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > Erm, my thought was to use a select() loop which sends out I/O requests
> > and then loops around waiting to see who finishes it. It doesn't
> > parallelize the CPU cost of getting the rows back to the caller, but
> > it'd at least parallelize the I/O, and if what's underneath is actually
> > a remote FDW running a complex query (because the other side is actually
> > a view), it would be a massive win to have all the remote FDWs executing
> > concurrently instead of serially as we have today.
>
> I can't really make sense of this.

Sorry, that was a bit hand-wavey since I had posted about it previously
here:
http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net

It'd clearly be more involved than "just build a select() loop" and
would require adding an async mechanism. I had been thinking about this
primairly with the idea of FDWs and you're right that it'd require more
thought to deal with getting data into/through shared_buffers. Still,
we seqscan into a ring buffer, I'd think we could make it work but it
would require additional work.

> For FDWs, one idea might be to kick off the remote query at
> ExecInitNode() time rather than ExecProcNode() time, at least if the
> remote query doesn't depend on parameters that aren't available until
> run time.

Right, I had speculated about that also (option #2 in my earlier email).

During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries to foreign servers, those would be fired while EXPLAINing a query as well. We want to avoid that. Instead, we can run EXPLAIN on that query at foreign server. But again, not all foreign servers would be able to EXPLAIN the query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(), if it's for EXPLAIN (except for ANALYSE may be).

> That actually would allow multiple remote queries to run
> simultaneously or in parallel with local work. It would also run them
> in cases where the relevant plan node is never executed, which would
> be bad but perhaps rare enough not to worry about.

This was my primary concern, along with the fact that we explicitly says
"don't do that" in the docs for the FDW API.

> Or we could add a
> new API like ExecPrefetchNode() that tells nodes to prepare to have
> tuples pulled, and they can do things like kick off asynchronous
> queries. But I still don't see any clean way for the Append node to
> find out which one is ready to return results first.

Yeah, that's tricky.

Thanks,

Stephen

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

04 March 2014, 04:28:14

According to the suggestion, I dropped the enhancement of postgres_fdw on
top of custom-scan interface from my submission, and also move the supplemental
functions into the part-1 portion (implementation of custom-scan interface).
Even though the ctidscan module is under discussion, I still include this module
because of its usefulness as demonstration / example of custom-scan node.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Kaigai Kouhei(海外 浩平)
> Sent: Tuesday, March 04, 2014 8:26 AM
> To: 'Stephen Frost'; Kohei KaiGai
> Cc: Shigeru Hanada; Jim Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter
> Eisentraut
> Subject: RE: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> > > As I mentioned
> > > up-thread, I'd really like to see FDW join push-down, FDW aggregate
> > > push-down, parallel query execution, and parallel remote-FDW
> > > execution and I don't see this CustomScan approach as the right
> > > answer to any of those.
> >
> > In accordance with the above, what I'd like to see with this patch is
> > removal of the postgres_fdw changes and any changes which were for that
> support.
> >
> I don't argue this approach. It might be useful to *demonstrate* how custom-
> scan node works as replacement of join, however,
>
> > In addition, I'd like to understand why 'ctidscan' makes any sense to
> > have as an example of what to use this for- if that's valuable, why
> > wouldn't we simply implement that in core?  I do want an example in
> > contrib of how to properly use this capability, but I don't think that's
> it.
> >
> Do you think it makes sense if my submission was only interface portion
> without working example? The purpose of ctidscan module is, similar to
> postgres_fdw, to demonstrate the usage of custom-scan interface with enough
> small code scale. If tons of code example were attached, nobody will want
> to review the patch.
> The "cache_scan" module that I and Haribabu are discussing in another thread
> also might be a good demonstration for custom-scan interface, however, its
> code scale is a bit larger than ctidscan.
>
> > For one thing, an example where you could have this CustomScan node
> > calling other nodes underneath would be interesting.  I realize the
> > CTID scan can't do that directly but I would think your GPU-based
> > system could; after all, if you're running a join or an aggregate with
> > the GPU, the rows could come from nearly anything.  Have you
> > considered that, or is the expectation that users will just go off and
> > access the heap and/or whatever indexes directly, like ctidscan does?
> How would such a requirement be handled?
> >
> In case when custom-scan node has underlying nodes, it shall be invoked
> using ExecProcNode as built-in node doing, then it will be able to fetch
> tuples come from underlying nodes. Of course, custom-scan provider can
> perform the tuples come from somewhere as if it came from underlying relation.
> It is responsibility of extension module. In some cases, it shall be required
> to return junk system attribute, like ctid, for row-level locks or table
> updating.
> It is also responsibility of the extension module (or, should not add custom-
> path if this custom-scan provider cannot perform as required).
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> <kaigai@ak.jp.nec.com>

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

04 March 2014, 14:09:31

On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> As I mentioned
>> up-thread, I'd really like to see FDW join push-down, FDW aggregate
>> push-down, parallel query execution, and parallel remote-FDW execution
>> and I don't see this CustomScan approach as the right answer to any of
>> those.
>
> In accordance with the above, what I'd like to see with this patch is
> removal of the postgres_fdw changes and any changes which were for that
> support.  In addition, I'd like to understand why 'ctidscan' makes any
> sense to have as an example of what to use this for- if that's valuable,
> why wouldn't we simply implement that in core?  I do want an example in
> contrib of how to properly use this capability, but I don't think that's
> it.

I suggested that example to KaiGai at last year's PGCon.  It may
indeed be something we want to have in core, but right now we don't.

More generally, I think this discussion is focusing on the wrong set
of issues.  The threshold issue for this patch is whether there is a
set of hook points that enable a workable custom-scan functionality,
and whether KaiGai has correctly identified them.  In other words, I
think we should be worrying about whether KaiGai's found all of the
places that need to be modified to support a custom scan, and whether
the modifications he's made to each of those places are correct and
adequate.  Whether he's picked the best possible example does not
strike me as a matter of principal concern, and it's far too late to
tell him he's got to go pick a different one at this point anyway.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

04 March 2014, 14:09:34

* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> Do you think it makes sense if my submission was only interface portion
> without working example?

No, we're pretty strongly against putting in interfaces which don't have
working examples in contrib- for one thing, we want to know when we
break it.

> The purpose of ctidscan module is, similar to
> postgres_fdw, to demonstrate the usage of custom-scan interface with
> enough small code scale. If tons of code example were attached, nobody
> will want to review the patch.

I gathered that's why it was included.  Is the plan to eventually submit
something larger to go into -contrib which will use this interface?  Or
will it always be external?

> The "cache_scan" module that I and Haribabu are discussing in another
> thread also might be a good demonstration for custom-scan interface,
> however, its code scale is a bit larger than ctidscan.

That does sound interesting though I'm curious about the specifics...

> > For one thing, an example where you could have this CustomScan node calling
> > other nodes underneath would be interesting.  I realize the CTID scan can't
> > do that directly but I would think your GPU-based system could; after all,
> > if you're running a join or an aggregate with the GPU, the rows could come
> > from nearly anything.  Have you considered that, or is the expectation that
> > users will just go off and access the heap and/or whatever indexes directly,
> > like ctidscan does?  How would such a requirement be handled?
> >
> In case when custom-scan node has underlying nodes, it shall be invoked using
> ExecProcNode as built-in node doing, then it will be able to fetch tuples
> come from underlying nodes. Of course, custom-scan provider can perform the
> tuples come from somewhere as if it came from underlying relation. It is
> responsibility of extension module. In some cases, it shall be required to
> return junk system attribute, like ctid, for row-level locks or table updating.
> It is also responsibility of the extension module (or, should not add custom-
> path if this custom-scan provider cannot perform as required).

Right, tons of work to do to make it all fit together and play nice-
what I was trying to get at is: has this actually been done?  Is the GPU
extension that you're talking about as the use-case for this been
written?  How does it handle all of the above?  Or are we going through
all these gyrations in vain hope that it'll actually all work when
someone tries to use it for something real?
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Ashutosh Bapat

Date:

04 March 2014, 14:19:07

On Tue, Mar 4, 2014 at 7:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:

On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> As I mentioned
>> up-thread, I'd really like to see FDW join push-down, FDW aggregate
>> push-down, parallel query execution, and parallel remote-FDW execution
>> and I don't see this CustomScan approach as the right answer to any of
>> those.
>
> In accordance with the above, what I'd like to see with this patch is
> removal of the postgres_fdw changes and any changes which were for that
> support. In addition, I'd like to understand why 'ctidscan' makes any
> sense to have as an example of what to use this for- if that's valuable,
> why wouldn't we simply implement that in core? I do want an example in
> contrib of how to properly use this capability, but I don't think that's
> it.

I suggested that example to KaiGai at last year's PGCon. It may
indeed be something we want to have in core, but right now we don't.

More generally, I think this discussion is focusing on the wrong set
of issues. The threshold issue for this patch is whether there is a
set of hook points that enable a workable custom-scan functionality,
and whether KaiGai has correctly identified them. In other words, I
think we should be worrying about whether KaiGai's found all of the
places that need to be modified to support a custom scan, and whether
the modifications he's made to each of those places are correct and
adequate. Whether he's picked the best possible example does not
strike me as a matter of principal concern, and it's far too late to
tell him he's got to go pick a different one at this point anyway.

There are so many places in the planner and optimizer code, where we create various types of paths and the number of such paths is again significant, if not large. If we want the custom scan contrib module to work in all those cases (which seems to be the intention here), then we have to expose so many hooks. I don't think all of those hooks have been identified. Second problem is, the functions which create those paths have signatures difficult enough to be exposed as hooks. Take example of the join hook that was exposed. These function signatures do get changed from time to time and thus corresponding hooks need to be changed to. This is going to be a maintenance burden.

So, unless we have some way of exposing these hooks such that the definitions of the hooks are independent of the internal function signatures, supporting custom scan looks difficult.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

04 March 2014, 14:22:36

* Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote:
> During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
> to foreign servers, those would be fired while EXPLAINing a query as well.
> We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
> server. But again, not all foreign servers would be able to EXPLAIN the
> query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
> if it's for EXPLAIN (except for ANALYSE may be).

Agreed that we wouldn't want to actually run a query when it's just
being explain'd.  If the FDW can't tell the difference then we'd need to
address that, of course.  A similar issue would, presumably, be around
prepare/execute, though I haven't looked yet.  These kinds of issues are
why it was option '#2' instead of '#1'. :)  I'm not sure they're able to
be addressed. :/
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

04 March 2014, 14:24:26

2014-03-04 23:09 GMT+09:00 Robert Haas <robertmhaas@gmail.com>:
> On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:
>>> As I mentioned
>>> up-thread, I'd really like to see FDW join push-down, FDW aggregate
>>> push-down, parallel query execution, and parallel remote-FDW execution
>>> and I don't see this CustomScan approach as the right answer to any of
>>> those.
>>
>> In accordance with the above, what I'd like to see with this patch is
>> removal of the postgres_fdw changes and any changes which were for that
>> support.  In addition, I'd like to understand why 'ctidscan' makes any
>> sense to have as an example of what to use this for- if that's valuable,
>> why wouldn't we simply implement that in core?  I do want an example in
>> contrib of how to properly use this capability, but I don't think that's
>> it.
>
> I suggested that example to KaiGai at last year's PGCon.  It may
> indeed be something we want to have in core, but right now we don't.
>
> More generally, I think this discussion is focusing on the wrong set
> of issues.  The threshold issue for this patch is whether there is a
> set of hook points that enable a workable custom-scan functionality,
> and whether KaiGai has correctly identified them.  In other words, I
> think we should be worrying about whether KaiGai's found all of the
> places that need to be modified to support a custom scan, and whether
> the modifications he's made to each of those places are correct and
> adequate.  Whether he's picked the best possible example does not
> strike me as a matter of principal concern, and it's far too late to
> tell him he's got to go pick a different one at this point anyway.
>
That is definitely the point to be discussed here. Even though I *believe*
I could put the callbacks needed to implement alternative join / scan,
it may lead different conclusion from other person's viewpoint.

At least, I could implement a custom-scan as an alternative of join
using postgres_fdw, however, it's uncertain whether I could cover
all the possible case we should care about.
So, I'd like to see comments from others.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

04 March 2014, 14:24:45

* Robert Haas (robertmhaas@gmail.com) wrote:
> More generally, I think this discussion is focusing on the wrong set
> of issues.  The threshold issue for this patch is whether there is a
> set of hook points that enable a workable custom-scan functionality,
> and whether KaiGai has correctly identified them.

Right- I was trying to hit on that in my email this morning.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Tom Lane

Date:

04 March 2014, 14:49:02

Stephen Frost <sfrost@snowman.net> writes:
> * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote:
>> During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
>> to foreign servers, those would be fired while EXPLAINing a query as well.
>> We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
>> server. But again, not all foreign servers would be able to EXPLAIN the
>> query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
>> if it's for EXPLAIN (except for ANALYSE may be).

> Agreed that we wouldn't want to actually run a query when it's just
> being explain'd.  If the FDW can't tell the difference then we'd need to
> address that, of course.

EXEC_FLAG_EXPLAIN_ONLY ...
        regards, tom lane

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

04 March 2014, 15:00:58

2014-03-04 23:10 GMT+09:00 Stephen Frost <sfrost@snowman.net>:
>> The "cache_scan" module that I and Haribabu are discussing in another
>> thread also might be a good demonstration for custom-scan interface,
>> however, its code scale is a bit larger than ctidscan.
>
> That does sound interesting though I'm curious about the specifics...
>
This module caches a part of columns, but not all, thus allows to hold
much larger number of records for a particular amount of RAM than the
standard buffer cache.
It is constructed on top of custom-scan node, and also performs a new
hook for a callback on page vacuuming to invalidate its cache entry.
(I originally designed this module for demonstration of on-vacuum hook
because I already made ctidscan and postgres_fdw enhancement for
custom-scan node, by the way.)

>> > For one thing, an example where you could have this CustomScan node calling
>> > other nodes underneath would be interesting.  I realize the CTID scan can't
>> > do that directly but I would think your GPU-based system could; after all,
>> > if you're running a join or an aggregate with the GPU, the rows could come
>> > from nearly anything.  Have you considered that, or is the expectation that
>> > users will just go off and access the heap and/or whatever indexes directly,
>> > like ctidscan does?  How would such a requirement be handled?
>> >
>> In case when custom-scan node has underlying nodes, it shall be invoked using
>> ExecProcNode as built-in node doing, then it will be able to fetch tuples
>> come from underlying nodes. Of course, custom-scan provider can perform the
>> tuples come from somewhere as if it came from underlying relation. It is
>> responsibility of extension module. In some cases, it shall be required to
>> return junk system attribute, like ctid, for row-level locks or table updating.
>> It is also responsibility of the extension module (or, should not add custom-
>> path if this custom-scan provider cannot perform as required).
>
> Right, tons of work to do to make it all fit together and play nice-
> what I was trying to get at is: has this actually been done?  Is the GPU
> extension that you're talking about as the use-case for this been
> written?
>
Its chicken-and-egg problem, because implementation of the extension module
fully depends on the interface from the backend. Unlike commit-fest, here is no
deadline for my extension module, so I put higher priority on the submission of
custom-scan node, than the extension.
However, GPU extension is not fully theoretical stuff. I had implemented
a prototype using FDW APIs, and it allowed to accelerate sequential scan if
query has enough complicated qualifiers.

See the movie (from 2:45). The table t1 is a regular table, and t2 is a foreign
table. Both of them has same contents, however, response time of the query
is much faster, if GPU acceleration is working.   http://www.youtube.com/watch?v=xrUBffs9aJ0
So, I'm confident that GPU acceleration will have performance gain once it
can run regular tables, not only foreign tables.

> How does it handle all of the above?  Or are we going through
> all these gyrations in vain hope that it'll actually all work when
> someone tries to use it for something real?
>
I don't talk something difficult. If junk attribute requires to return "ctid" of
the tuple, custom-scan provider reads a tuple of underlying relation then
includes a correct item pointer. If this custom-scan is designed to run on
the cache, all it needs to do is reconstruct a tuple with correct item-pointer
(thus this cache needs to have ctid also). It's all I did in the cache_scan
module.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

04 March 2014, 15:51:41

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote:
> >> During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries
> >> to foreign servers, those would be fired while EXPLAINing a query as well.
> >> We want to avoid that. Instead, we can run EXPLAIN on that query at foreign
> >> server. But again, not all foreign servers would be able to EXPLAIN the
> >> query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(),
> >> if it's for EXPLAIN (except for ANALYSE may be).
>
> > Agreed that we wouldn't want to actually run a query when it's just
> > being explain'd.  If the FDW can't tell the difference then we'd need to
> > address that, of course.
>
> EXEC_FLAG_EXPLAIN_ONLY ...

Yeah, figured there should be a way.  Still not sure that kicking the
query off from ExecInitNode() is a good idea though.  Perhaps it could
be optional somehow.  I really like the idea of being able to make
Append work in an async mode where it's pulling data from multiple
sources at the same time, but it's a fair bit of work.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

04 March 2014, 19:33:18

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > In accordance with the above, what I'd like to see with this patch is
> > removal of the postgres_fdw changes and any changes which were for that
> > support.  In addition, I'd like to understand why 'ctidscan' makes any
> > sense to have as an example of what to use this for- if that's valuable,
> > why wouldn't we simply implement that in core?  I do want an example in
> > contrib of how to properly use this capability, but I don't think that's
> > it.
>
> I suggested that example to KaiGai at last year's PGCon.  It may
> indeed be something we want to have in core, but right now we don't.

Alright- so do you feel that the simple ctidscan use-case is a
sufficient justification and example of how this can be generally
useful that we should be adding these hooks to core..?  I'm willing to
work through the patch and clean it up this weekend if we agree that
it's useful and unlikely to immediately be broken by expected changes..
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Tom Lane

Date:

04 March 2014, 19:35:19

I apologize for not having paid much attention to this thread so far.
It kept getting stuck on my "to look at later" queue.  Anyway, I've
taken a preliminary look at the v7 patch now.

While the patch seems roughly along the lines of what we talked about
last PGCon, I share Stephen's unease about a lot of the details.  It's
not entirely clear that these hooks are really good for anything, and
it's even less clear what APIs the hook functions should be expected
to depend on.  I really do not like the approach embodied in the later
patches of "oh, we'll just expose whatever static planner functions seem
convenient".  That's not an approach that's available when doing actual
external development of an extension, and even if it were that doesn't
make it a good idea.  The larger the exposed surface of functions the
harder it is to know what's safe to change.

Anyway, on to specifics:

* Please drop the whole register_custom_provider/get_custom_provider API.
There is no reason other than debugging for a provider to have a name at
all, and if we expect providers to have unique names then that creates a
collision risk for independently-created extensions.  AFAICS, it's
sufficient for a function hooked into one of the add-a-path hooks to
include a pointer to a struct-of-function-pointers in the Path object it
returns, and similarly the CustomScan Plan object can contain a pointer
inserted when it's created.  I don't object to having a name field in the
function pointer structs for debugging reasons, but I don't want any
lookups being done on it.

* The function-struct pointers should be marked const in the referencing
nodes, to indicate that the core code won't be modifying or copying them.
In practice they'd probably be statically allocated constants in the
extensions anyway.

* The patch does lots of violence to the separation between planner and
executor, starting with the decision to include nodes/relation.h in
executor.h.  That will not do at all.  I see that you did that because you
wanted to make ExecSupportsMarkRestore take a Path, but we need some other
answer.  One slightly grotty answer is to invent two different customscan
Plan types, one that supports mark/restore and one that doesn't, so that
ExecSupportsMarkRestore could still just look at the Plan type tag.
(BTW, path->pathtype is supposed to contain the node tag of the Plan node
that the path would produce.  Putting T_CustomPath in it is entirely
tone-deaf.)  Another way would be to remove ExecSupportsMarkRestore in
favor of some new function in the planner; but it's good to keep it in
execAmi.c since that has other knowledge of which plan types support
mark/restore.

* More generally, I'm not convinced that exactly one Path type and exactly
one Plan type is going to get us very far.  It seems rather ugly to use
the same Plan type for both scan and join nodes, and what will happen if
somebody wants to build a custom Append node, or something else that has
neither zero nor two subplans?

* nodeCustom.h is being completely abused.  That should only export the
functions in nodeCustom.c, which are going to be pretty much one-liners
anyway.  The right place to put the function pointer struct definitions
is someplace else.  I'd be inclined to start by separating the function
pointers into two structs, one for functions needed for a Path and one for
functions needed for a Plan, so that you don't have this problem of having
to import everything the planner knows into an executor header or vice
versa.  Most likely you could just put the Path function pointer struct
declaration next to CustomPath in relation.h, and the one for Plans next
to CustomPlan (or the variants thereof) in plannodes.h.

* The set of fields provided in CustomScan seems nonsensical.  I'm not
even sure that it should be derived from Scan; that's okay for plan types
that actually are scans of a base relation, but it's confusing overhead
for anything that's say a join, or a custom sort node, or anything like
that.  Maybe one argument for multiple plan node types is that one would
be derived from Scan and one directly from Plan.

* More generally, what this definition for CustomScan exposes is that we
have no idea whatever what fields a custom plan node might need.  I'm
inclined to think that what we should be assuming is that any custom path
or plan node is really an object of a struct type known only to its
providing extension, whose first field is the CustomPath or CustomPlan
struct known to the core backend.  (Think C++ subclassing.)  This would
imply that copyfuncs/equalfuncs/outfuncs support would have to be provided
by the extension, which is in principle possible if we add function
pointers for those operations to the struct linked to from the path/plan
object.  (Notationally this might be a bit of a pain, since the macros
that we use in the functions in copyfuncs.c etc aren't public.  Not sure
if it's worth exposing those somewhere, or if people should just
copy/paste them.)  This approach would probably also avoid the need for
the klugy bitmapset representation you propose in patch 3.

* This also implies that create_customscan_plan is completely bogus.
A custom plan provider *must* provide a callback function for that,
because only it will know how big a node to palloc.  There are far too
many assumptions in create_customscan_plan anyway; I think there is
probably nothing at all in that function that shouldn't be getting done
by the custom provider instead.

* Likewise, there is way too much hard-wired stuff in explain.c.  It
should not be optional whether a custom plan provider provides an explain
support function, and that function should be doing pretty much everything
involved in printing the node.

* I don't believe in the hard-wired implementation in setrefs.c either.

* Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).
That's at best badly thought out, and at worst a source of new bugs.
Under what circumstances is a custom plan node going to contain any Vars
that don't reduce to either scan or input-plan variables?  None, because
that would imply that it was doing something unrelated to the requested
query.

* The API for add_join_path_hook seems overcomplex, as well as too full
of implementation details that should remain private to joinpath.c.
I particularly object to passing the mergeclause lists, which seem
unlikely to be of interest for non-mergejoin plan types anyway.
More generally, it seems likely that this hook is at the wrong level of
abstraction; it forces the hook function to concern itself with a lot of
stuff about join legality and parameterization (which I note your patch3
code fails to do; but that's not an optional thing).

* After a bit of reflection I think the best thing to do might be to ditch
add_join_path_hook for 9.4, on these grounds:
1. You've got enough to do to get the rest of the patch committable.
2. Like Stephen, I feel that the proposed usage as a substitute for
FDW-based foreign join planning is not the direction we want to travel.
3. Without that, the use case for new join nodes seems pretty marginal,
as opposed to say alternative sort-node implementations (a case not
supported by this patch, except to the extent that you could use them in
explicit-sort mergejoins if you duplicated large parts of joinpath.c).

* Getting nitpicky ... what is the rationale for the doubled underscore
in the CUSTOM__ flag names?  That's just a typo waiting to happen IMO.

* Why is there more than one call site for add_scan_path_hook?  I don't
see the need for the calling macro with the randomly inconsistent name,
either.

* The test arrangement for contrib/ctidscan is needlessly complex, and
testing it in the core tests is a bogus idea anyway.  Why not just
let it contain its own test script like most other contrib modules?

That's all I've got for now.  I've not really looked at the extension code
in either patch2 or patch3, just at the changes in the core code.
        regards, tom lane

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

04 March 2014, 20:50:24

On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote:
> Alright- so do you feel that the simple ctidscan use-case is a
> sufficient justification and example of how this can be generally
> useful that we should be adding these hooks to core..?  I'm willing to
> work through the patch and clean it up this weekend if we agree that
> it's useful and unlikely to immediately be broken by expected changes..

Yeah, I think it's useful.  But based on Tom's concurrently-posted
review, I think there's probably a good deal of work left here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

04 March 2014, 20:50:50

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote:
> > Alright- so do you feel that the simple ctidscan use-case is a
> > sufficient justification and example of how this can be generally
> > useful that we should be adding these hooks to core..?  I'm willing to
> > work through the patch and clean it up this weekend if we agree that
> > it's useful and unlikely to immediately be broken by expected changes..
>
> Yeah, I think it's useful.  But based on Tom's concurrently-posted
> review, I think there's probably a good deal of work left here.

Yeah, it certainly looks like it.

KaiGai- will you have time to go over and address Tom's concerns..?
Thanks!
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

04 March 2014, 21:34:44

2014-03-05 5:52 GMT+09:00 Stephen Frost <sfrost@snowman.net>:
> * Robert Haas (robertmhaas@gmail.com) wrote:
>> On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote:
>> > Alright- so do you feel that the simple ctidscan use-case is a
>> > sufficient justification and example of how this can be generally
>> > useful that we should be adding these hooks to core..?  I'm willing to
>> > work through the patch and clean it up this weekend if we agree that
>> > it's useful and unlikely to immediately be broken by expected changes..
>>
>> Yeah, I think it's useful.  But based on Tom's concurrently-posted
>> review, I think there's probably a good deal of work left here.
>
> Yeah, it certainly looks like it.
>
> KaiGai- will you have time to go over and address Tom's concerns..?
>
Yes, I need to do. Let me take it through the later half of this week and
the weekend. So, I'd like to submit revised one by next Monday.

Thanks,
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

05 March 2014, 08:22:29

Tom, thanks for your detailed comments.

> I apologize for not having paid much attention to this thread so far.
> It kept getting stuck on my "to look at later" queue.  Anyway, I've taken
> a preliminary look at the v7 patch now.
>
> While the patch seems roughly along the lines of what we talked about last
> PGCon, I share Stephen's unease about a lot of the details.  It's not
> entirely clear that these hooks are really good for anything, and it's even
> less clear what APIs the hook functions should be expected to depend on.
> I really do not like the approach embodied in the later patches of "oh,
> we'll just expose whatever static planner functions seem convenient".
> That's not an approach that's available when doing actual external
> development of an extension, and even if it were that doesn't make it a
> good idea.  The larger the exposed surface of functions the harder it is
> to know what's safe to change.
>
Hmm. It needs a clear reasoning to expose the function rather than
its convenience.

> Anyway, on to specifics:
>
> * Please drop the whole register_custom_provider/get_custom_provider API.
> There is no reason other than debugging for a provider to have a name at
> all, and if we expect providers to have unique names then that creates a
> collision risk for independently-created extensions.  AFAICS, it's
> sufficient for a function hooked into one of the add-a-path hooks to include
> a pointer to a struct-of-function-pointers in the Path object it returns,
> and similarly the CustomScan Plan object can contain a pointer inserted
> when it's created.  I don't object to having a name field in the function
> pointer structs for debugging reasons, but I don't want any lookups being
> done on it.
>
One thing I was worrying about is how copyObject() and nodeToString() support
set of function pointer tables around custom-scan node, however, you suggested
to change the assumption here. So, I think these functions become unnecessary.

> * The function-struct pointers should be marked const in the referencing
> nodes, to indicate that the core code won't be modifying or copying them.
> In practice they'd probably be statically allocated constants in the
> extensions anyway.
>
OK,

> * The patch does lots of violence to the separation between planner and
> executor, starting with the decision to include nodes/relation.h in
> executor.h.  That will not do at all.  I see that you did that because you
> wanted to make ExecSupportsMarkRestore take a Path, but we need some other
> answer.  One slightly grotty answer is to invent two different customscan
> Plan types, one that supports mark/restore and one that doesn't, so that
> ExecSupportsMarkRestore could still just look at the Plan type tag.
> (BTW, path->pathtype is supposed to contain the node tag of the Plan node
> that the path would produce.  Putting T_CustomPath in it is entirely
> tone-deaf.)  Another way would be to remove ExecSupportsMarkRestore in
> favor of some new function in the planner; but it's good to keep it in
> execAmi.c since that has other knowledge of which plan types support
> mark/restore.
>
OK, I'll add one derivative node delivertive plan node type,
CustomScanMarkRestore for instance.
Probably, it shall be populated on the create_customscan_plan()
according to the flag being set on the CustomPath.

> * More generally, I'm not convinced that exactly one Path type and exactly
> one Plan type is going to get us very far.  It seems rather ugly to use
> the same Plan type for both scan and join nodes, and what will happen if
> somebody wants to build a custom Append node, or something else that has
> neither zero nor two subplans?
>
In the previous discussion, CustomJoin will be nonsense because we know
limited number of join algorithms: nest-loop, hash-join and merge-join, unlike
variation of logic to scan relations. Also, IIUC, someone didn't want to add
custom- something node types for each built-in types.
So, we concluded to put CustomScan node to replace built-in join / scan at
that time.
Regarding to the Append node, it probably needs to be enhanced to have
list of subplans on CustomScan, or add individual CustomAppend node, or
opaque "CustomPlan" may be sufficient if it handles EXPLAIN by itself.

> * nodeCustom.h is being completely abused.  That should only export the
> functions in nodeCustom.c, which are going to be pretty much one-liners
> anyway.  The right place to put the function pointer struct definitions
> is someplace else.  I'd be inclined to start by separating the function
> pointers into two structs, one for functions needed for a Path and one for
> functions needed for a Plan, so that you don't have this problem of having
> to import everything the planner knows into an executor header or vice versa.
> Most likely you could just put the Path function pointer struct declaration
> next to CustomPath in relation.h, and the one for Plans next to CustomPlan
> (or the variants thereof) in plannodes.h.
>
Yes. I didn't have clear idea where we should put the definition of interfaces.
Probably, InitCustomScanPlan (maybe, CreateCustomScanPlan) shall be moved to
relation.h, and rest of callbacks shall be moved to plannodes.h.

> * The set of fields provided in CustomScan seems nonsensical.  I'm not even
> sure that it should be derived from Scan; that's okay for plan types that
> actually are scans of a base relation, but it's confusing overhead for
> anything that's say a join, or a custom sort node, or anything like that.
> Maybe one argument for multiple plan node types is that one would be derived
> from Scan and one directly from Plan.
>
The reason why CustomScan is derived from Scan is, some of backend code
wants to know rtindex of the relation to be referenced by this CustomScan.
The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c and
explain.c. The usage in nodeCustom.c is just for service routines, and the
usage in setrefs.c can be moved to the extension according to your suggestion.
We need to investigate the usage in explain.c; ExplainPreScanNode() walks
around the nodes to collect relids referenced in this query. If we don't
want to put a callback for this specific usage, it is a reasonable choice
to show the backend the associated scanrelid of CustomScan.
Is it a confusable rule, if extension has to set 0 when a particular relation
is not scanned in this CustomScan.

> * More generally, what this definition for CustomScan exposes is that we
> have no idea whatever what fields a custom plan node might need.  I'm
> inclined to think that what we should be assuming is that any custom path
> or plan node is really an object of a struct type known only to its providing
> extension, whose first field is the CustomPath or CustomPlan struct known
> to the core backend.  (Think C++ subclassing.)  This would imply that
> copyfuncs/equalfuncs/outfuncs support would have to be provided by the
> extension, which is in principle possible if we add function pointers for
> those operations to the struct linked to from the path/plan object.
> (Notationally this might be a bit of a pain, since the macros that we use
> in the functions in copyfuncs.c etc aren't public.  Not sure if it's worth
> exposing those somewhere, or if people should just copy/paste them.)  This
> approach would probably also avoid the need for the klugy bitmapset
> representation you propose in patch 3.
>
It's a breakthrough!
Probably, Path node needs to have a callback on outfuncs.c.
Also, Plan node needs to have a callback on copyfuncs.c and outfuncs.c.
I think, prototype of the callback functions are not specific to custom-
scan node, so it should be declared in the nodes/nodes.h.

> * This also implies that create_customscan_plan is completely bogus.
> A custom plan provider *must* provide a callback function for that, because
> only it will know how big a node to palloc.  There are far too many
> assumptions in create_customscan_plan anyway; I think there is probably
> nothing at all in that function that shouldn't be getting done by the custom
> provider instead.
>
OK, InitCustomScanPlan shall become CreateCustomScanPlan probably, to
return an object being palloc()'ed with arbitrary size.

> * Likewise, there is way too much hard-wired stuff in explain.c.  It should
> not be optional whether a custom plan provider provides an explain support
> function, and that function should be doing pretty much everything involved
> in printing the node.
>
Probably, the hunk around show_customscan_info() call can be entirely moved
to the extension side. If so, I want ExplainNode() being an extern function,
because it allows extensions to print underlying plan-nodes.

> * I don't believe in the hard-wired implementation in setrefs.c either.
>
Are you saying the hard-wired portion in setrefs.c can be moved to the
extension side? If fix_scan_expr() become extern function, I think it
is feasible.

> * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).
> That's at best badly thought out, and at worst a source of new bugs.
> Under what circumstances is a custom plan node going to contain any Vars
> that don't reduce to either scan or input-plan variables?  None, because
> that would imply that it was doing something unrelated to the requested
> query.
>
I also want to rid it, if we have alternative idea to solve the issue.
The varno of Var-node originally has rt-index of the relation being referenced,
then setrefs.c adjusts it according to the offset and relations type to be
referenced.
In case of Var-node that references joined-relations, it shall be replaced to
either INNER_VAR or OUTER_VAR according the location of underlying
relations. It eventually makes ExecEvalScalarVar() to reference either
ecxt_innertuple or ecxt_outertuple, however, it is problematic if we already
consolidated tuples come from the both side into one.
For example, the enhanced postgres_fdw fetches the result set of remote
join query, thus a tuple contains the fields come from both side.
In this case, what varno shall be suitable to put?

> * The API for add_join_path_hook seems overcomplex, as well as too full
> of implementation details that should remain private to joinpath.c.
> I particularly object to passing the mergeclause lists, which seem unlikely
> to be of interest for non-mergejoin plan types anyway.
> More generally, it seems likely that this hook is at the wrong level of
> abstraction; it forces the hook function to concern itself with a lot of
> stuff about join legality and parameterization (which I note your patch3
> code fails to do; but that's not an optional thing).
>
I'd like to see if you have idea where the hook shall be located, and which
kind of abstraction is suitable.

> * After a bit of reflection I think the best thing to do might be to ditch
> add_join_path_hook for 9.4, on these grounds:
> 1. You've got enough to do to get the rest of the patch committable.
> 2. Like Stephen, I feel that the proposed usage as a substitute for FDW-based
> foreign join planning is not the direction we want to travel.
> 3. Without that, the use case for new join nodes seems pretty marginal,
> as opposed to say alternative sort-node implementations (a case not
> supported by this patch, except to the extent that you could use them in
> explicit-sort mergejoins if you duplicated large parts of joinpath.c).
>
Are you suggesting an alternative merge join path that uses sort-node
on top of custom-scan node, aren't you?
Probably, this path shall be constructed using existing MergeJoin path
that has one or two CustomScan node for sorting or scan and sorting.
Can I share your suggestion correctly?

> * Getting nitpicky ... what is the rationale for the doubled underscore
> in the CUSTOM__ flag names?  That's just a typo waiting to happen IMO.
>
Sorry, I intended the double underline as a separator towards prefix portion
of this label.

> * Why is there more than one call site for add_scan_path_hook?  I don't
> see the need for the calling macro with the randomly inconsistent name,
> either.
>
Where is the best place to do? Even though I cannot imagine the situation
to run sub-query or cte by extensions, its path is constructed during
set_rel_size(), so I had to put the hook for each set_xxxx_pathlist()
functions.

> * The test arrangement for contrib/ctidscan is needlessly complex, and
> testing it in the core tests is a bogus idea anyway.  Why not just let it
> contain its own test script like most other contrib modules?
>
I thought the regression test of CustomScan interface is useful in the
core test. However, it seems to me a reasonable suggestion to implement
this test as usual contrib regression test. So, I'll adjust it.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Tom Lane

Date:

05 March 2014, 19:22:55

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
>> * Please drop the whole register_custom_provider/get_custom_provider API.

> One thing I was worrying about is how copyObject() and nodeToString() support
> set of function pointer tables around custom-scan node, however, you suggested
> to change the assumption here. So, I think these functions become unnecessary.

If we allow the extension to control copyObject behavior, it can do what
it likes with the function-struct pointer.  I think the typical case would
be that it's a simple pointer to a never-copied static constant.  But you
could imagine that it's a pointer to a struct embedded further down in the
custom path or plan node, if the extension wants different functions for
different plans.

If we had to support stringToNode() for paths or plans, things would get
much more complicated, but we don't (and there are already lots of other
things that would be difficult for that).

> The reason why CustomScan is derived from Scan is, some of backend code
> wants to know rtindex of the relation to be referenced by this CustomScan.
> The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c and
> explain.c. The usage in nodeCustom.c is just for service routines, and the
> usage in setrefs.c can be moved to the extension according to your suggestion.
> We need to investigate the usage in explain.c; ExplainPreScanNode() walks
> around the nodes to collect relids referenced in this query. If we don't
> want to put a callback for this specific usage, it is a reasonable choice
> to show the backend the associated scanrelid of CustomScan.

I think we have to add another callback for that :-(.  It's a pain since
it's such a trivial point; but the existing code cannot support a custom
node referencing more than one RTE, which seems possible for join or
append type cases.

> Probably, the hunk around show_customscan_info() call can be entirely moved
> to the extension side. If so, I want ExplainNode() being an extern function,
> because it allows extensions to print underlying plan-nodes.

I haven't looked at what explain.c would have to expose to make this
workable, but yeah, we will probably have to export a few things.

> Are you saying the hard-wired portion in setrefs.c can be moved to the
> extension side? If fix_scan_expr() become extern function, I think it
> is feasible.

My recollection is that fix_scan_expr() is a bit specialized.  Not sure
exactly what we'd have to export there --- but we'd have to do it anyway.
What you had in the patch was a hook that could be called, but no way
for it to do what it would likely need to do.

>> * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).

> In case of Var-node that references joined-relations, it shall be replaced to
> either INNER_VAR or OUTER_VAR according the location of underlying
> relations. It eventually makes ExecEvalScalarVar() to reference either
> ecxt_innertuple or ecxt_outertuple, however, it is problematic if we already
> consolidated tuples come from the both side into one.

So?  If you did that, then you wouldn't have renumbered the Vars as
INNER/OUTER.  I don't believe that CUSTOM_VAR is necessary at all;
if it is needed, then there would also be a need for an additional
tuple slot in executor contexts, which you haven't provided.

> For example, the enhanced postgres_fdw fetches the result set of remote
> join query, thus a tuple contains the fields come from both side.
> In this case, what varno shall be suitable to put?

That would be a scan situation, and the vars could reference the scan
tuple slot.  Which in fact was the implementation you were using, so
how is CUSTOM_VAR adding anything?

>> * Why is there more than one call site for add_scan_path_hook?  I don't
>> see the need for the calling macro with the randomly inconsistent name,
>> either.

> Where is the best place to do? Even though I cannot imagine the situation
> to run sub-query or cte by extensions, its path is constructed during
> set_rel_size(), so I had to put the hook for each set_xxxx_pathlist()
> functions.

Hm.  We could still call the hook in set_rel_pathlist, if we were to
get rid of the individual calls to set_cheapest and do that in one
spot at the bottom of set_rel_pathlist (after the hook call).  Calling
set_cheapest in one place seems more consistent anyway.
        regards, tom lane

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

06 March 2014, 04:22:48

> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> >> * Please drop the whole register_custom_provider/get_custom_provider
> API.
>
> > One thing I was worrying about is how copyObject() and nodeToString()
> > support set of function pointer tables around custom-scan node,
> > however, you suggested to change the assumption here. So, I think these
> functions become unnecessary.
>
> If we allow the extension to control copyObject behavior, it can do what
> it likes with the function-struct pointer.  I think the typical case would
> be that it's a simple pointer to a never-copied static constant.  But you
> could imagine that it's a pointer to a struct embedded further down in the
> custom path or plan node, if the extension wants different functions for
> different plans.
>
> If we had to support stringToNode() for paths or plans, things would get
> much more complicated, but we don't (and there are already lots of other
> things that would be difficult for that).
>
I expected to include simple function pointers for copying and text-output
as follows:
 typedef struct {     Plan    plan;       :     NodeCopy_function    node_copy;     NodeTextOut_function node_textout;
}Custom; 

Then, sub-routine in copyfuncs.c shall be: static Custom * _copyCustom(const Custom *from) {     return
from->node_copy(from);} 

Can I share same image for this? It allows Custom node to have polymorphism
on the node it enhanced.

Sorry, I got little bit confused. Is the function-struct pointer you
mentioned something different from usual function pointer?


> > The reason why CustomScan is derived from Scan is, some of backend
> > code wants to know rtindex of the relation to be referenced by this
> CustomScan.
> > The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c
> > and explain.c. The usage in nodeCustom.c is just for service routines,
> > and the usage in setrefs.c can be moved to the extension according to
> your suggestion.
> > We need to investigate the usage in explain.c; ExplainPreScanNode()
> > walks around the nodes to collect relids referenced in this query. If
> > we don't want to put a callback for this specific usage, it is a
> > reasonable choice to show the backend the associated scanrelid of
> CustomScan.
>
> I think we have to add another callback for that :-(.  It's a pain since
> it's such a trivial point; but the existing code cannot support a custom
> node referencing more than one RTE, which seems possible for join or append
> type cases.
>
It's more generic approach, I like this.
Probably, it can kill the characteristic as Scan of CustomScan from the
view of core backend. It shall perform as an opaque Plan node that may
scan, join, sort or something, so more appropriate its name may be
CustomPlan or simply Custom.

> > Probably, the hunk around show_customscan_info() call can be entirely
> > moved to the extension side. If so, I want ExplainNode() being an
> > extern function, because it allows extensions to print underlying
> plan-nodes.
>
> I haven't looked at what explain.c would have to expose to make this workable,
> but yeah, we will probably have to export a few things.
>
OK,

> > Are you saying the hard-wired portion in setrefs.c can be moved to the
> > extension side? If fix_scan_expr() become extern function, I think it
> > is feasible.
>
> My recollection is that fix_scan_expr() is a bit specialized.  Not sure
> exactly what we'd have to export there --- but we'd have to do it anyway.
> What you had in the patch was a hook that could be called, but no way for
> it to do what it would likely need to do.
>
It probably needs to be exported. It walks on the supplied node tree and
eventually calls record_plan_function_dependency() for each functions being
found. It should not be invented in the extension again.
Anyway, my reworking on the patch will make clear which static functions
need to be exposed. Please wait for a while.

> >> * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc).
>
> > In case of Var-node that references joined-relations, it shall be
> > replaced to either INNER_VAR or OUTER_VAR according the location of
> > underlying relations. It eventually makes ExecEvalScalarVar() to
> > reference either ecxt_innertuple or ecxt_outertuple, however, it is
> > problematic if we already consolidated tuples come from the both side
> into one.
>
> So?  If you did that, then you wouldn't have renumbered the Vars as
> INNER/OUTER.  I don't believe that CUSTOM_VAR is necessary at all; if it
> is needed, then there would also be a need for an additional tuple slot
> in executor contexts, which you haven't provided.
>
> > For example, the enhanced postgres_fdw fetches the result set of
> > remote join query, thus a tuple contains the fields come from both side.
> > In this case, what varno shall be suitable to put?
>
> That would be a scan situation, and the vars could reference the scan tuple
> slot.  Which in fact was the implementation you were using, so how is
> CUSTOM_VAR adding anything?
>
Let me sort out the points.
If custom-node performs as join node with two underlying relations thus
it could retrieve two tuples, here is no matter because INNER/OUTER_VAR
can reference individual tuple slot.
Also, custom-node performs as scan node with one underlying relations,
here is also no matter because all the Var nodes in the target-list
references attributes of a particular relation.
An confusing scenarios is that custom-node performs as scan node as an
alternative of built-in join, thus Var-nodes in the target-list may
reference multiple relations, however, it will have only one tupleslot
as like remote-join in postgres_fdw doing.
In this case, we may be able to use existing INNER/OUTER/INDEX var if
we renumber the varattno and use an appropriate slot, instead of adding
a special varno for this.

I'd like to investigate it little more, but I'm inclined to conclude
CUSTOM_VAR might not be necessary, as you suggested.

> >> * Why is there more than one call site for add_scan_path_hook?  I
> >> don't see the need for the calling macro with the randomly
> >> inconsistent name, either.
>
> > Where is the best place to do? Even though I cannot imagine the
> > situation to run sub-query or cte by extensions, its path is
> > constructed during set_rel_size(), so I had to put the hook for each
> > set_xxxx_pathlist() functions.
>
> Hm.  We could still call the hook in set_rel_pathlist, if we were to get
> rid of the individual calls to set_cheapest and do that in one spot at the
> bottom of set_rel_pathlist (after the hook call).  Calling set_cheapest
> in one place seems more consistent anyway.
>
OK, I'll try to move the set_cheapest() to set_rel_pathlist from the
current positions; being distributed to several functions.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Tom Lane

Date:

06 March 2014, 18:09:02

Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> I expected to include simple function pointers for copying and text-output
> as follows:

>   typedef struct {
>       Plan    plan;
>         :
>       NodeCopy_function    node_copy;
>       NodeTextOut_function node_textout;
>   } Custom;

I was thinking more like

typedef struct CustomPathFuncs {const char *name;    /* used for debugging purposes only */NodeCopy_function
node_copy;NodeTextOut_functionnode_textout;... etc etc etc ...

} CustomPathFuncs;

typedef struct CustomPath {Path path;const CustomPathFuncs *funcs;... maybe a few more fields here, but not too darn
many...

} CustomPath;

and similarly for CustomPlan.

The advantage of this way is it's very cheap for (what I expect will be)
the common case where an extension has a fixed set of support functions
for its custom paths and plans.  It just declares a static constant
CustomPathFuncs struct, and puts a pointer to that into its paths.

If an extension really needs to set the support functions on a per-object
basis, it can do this:

typdef struct MyCustomPath {      CustomPath cpath;      CustomPathFuncs funcs;      ... more fields ...
} MyCustomPath;

and then initialization of a MyCustomPath would include
mypath->cpath.funcs = &mypath->funcs;mypath->funcs.node_copy = MyCustomPathCopy;... etc etc ...

In this case we're arguably wasting one pointer worth of space in the
path, but considering the number of function pointers such a path will be
carrying, I don't think that's much of an objection.

>> So?  If you did that, then you wouldn't have renumbered the Vars as
>> INNER/OUTER.  I don't believe that CUSTOM_VAR is necessary at all; if it
>> is needed, then there would also be a need for an additional tuple slot
>> in executor contexts, which you haven't provided.

> For example, the enhanced postgres_fdw fetches the result set of
> remote join query, thus a tuple contains the fields come from both side.
> In this case, what varno shall be suitable to put?

Not sure what we'd do for the general case, but CUSTOM_VAR isn't the
solution.  Consider for example a join where both tables supply columns
named "id" --- if you put them both in one tupledesc then there's no
non-kluge way to identify them.

Possibly the route to a solution involves adding another plan-node
callback function that ruleutils.c would use for printing Vars in custom
join nodes.  Or maybe we could let the Vars keep their original RTE
numbers, though that would complicate life at execution time.

Anyway, if we're going to punt on add_join_path_hook for the time
being, this problem can probably be left to solve later.  It won't
arise for simple table-scan cases, nor for single-input plan nodes
such as sorts.
        regards, tom lane

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

07 March 2014, 01:33:02

> I was thinking more like
>
> typedef struct CustomPathFuncs {
>     const char *name;    /* used for debugging purposes only */
>     NodeCopy_function node_copy;
>     NodeTextOut_function node_textout;
>     ... etc etc etc ...
> } CustomPathFuncs;
>
> typedef struct CustomPath {
>     Path path;
>     const CustomPathFuncs *funcs;
>     ... maybe a few more fields here, but not too darn many ...
> } CustomPath;
>
> and similarly for CustomPlan.
>
> The advantage of this way is it's very cheap for (what I expect will be)
> the common case where an extension has a fixed set of support functions
> for its custom paths and plans.  It just declares a static constant
> CustomPathFuncs struct, and puts a pointer to that into its paths.
>
> If an extension really needs to set the support functions on a per-object
> basis, it can do this:
>
> typdef struct MyCustomPath {
>        CustomPath cpath;
>        CustomPathFuncs funcs;
>        ... more fields ...
> } MyCustomPath;
>
> and then initialization of a MyCustomPath would include
>
>     mypath->cpath.funcs = &mypath->funcs;
>     mypath->funcs.node_copy = MyCustomPathCopy;
>     ... etc etc ...
>
> In this case we're arguably wasting one pointer worth of space in the path,
> but considering the number of function pointers such a path will be carrying,
> I don't think that's much of an objection.
>
That is exactly same as my expectation, and no objection here.
Thanks for your clarification.


> >> So?  If you did that, then you wouldn't have renumbered the Vars as
> >> INNER/OUTER.  I don't believe that CUSTOM_VAR is necessary at all; if
> >> it is needed, then there would also be a need for an additional tuple
> >> slot in executor contexts, which you haven't provided.
>
> > For example, the enhanced postgres_fdw fetches the result set of
> > remote join query, thus a tuple contains the fields come from both side.
> > In this case, what varno shall be suitable to put?
>
> Not sure what we'd do for the general case, but CUSTOM_VAR isn't the solution.
> Consider for example a join where both tables supply columns named "id"
> --- if you put them both in one tupledesc then there's no non-kluge way
> to identify them.
>
> Possibly the route to a solution involves adding another plan-node callback
> function that ruleutils.c would use for printing Vars in custom join nodes.
> Or maybe we could let the Vars keep their original RTE numbers, though that
> would complicate life at execution time.
>
My preference is earlier one, because complication in execution time may
make performance degradation.
Once two tuples get joined in custom-node, only extension can know which
relation is the origin of a particular attribute in the unified tuple.
So, it seems to me reasonable extension has to provide a hint to resolve
the Var naming.
Probably, another callback that provides a translation table from a Var
node that reference custom-plan but originated from either of subtree.
(It looks like a translated_vars in prepunion.c)

For example, let's assume here is a Var node with INDEX_VAR in the tlist
of custom-plan. It eventually references ecxt_scantuple in the execution
time, and this tuple-slot will keep a joined tuple being originated from
two relations. If its varattno=9 came from the column varno=1/varatno=3,
we like to print its original name. If we can have a translation table
like translated_vars, it allows to translate an attribute number on the
custom-plan into its original ones.
Even it might be abuse of INDEX_VAR, it seems to me a good idea.
Also, I don't like to re-define the meaning of INNER_VAR/OUTER_VAR
because custom-plan may have both of left-/right-subtree, thus it makes
sense to support a case when both of tupleslots are available.

> Anyway, if we're going to punt on add_join_path_hook for the time being,
> this problem can probably be left to solve later.  It won't arise for simple
> table-scan cases, nor for single-input plan nodes such as sorts.
>
Yes, it is a problem if number of input plans is larger then 1.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

12 March 2014, 04:55:55

Hello,

The attached two patches are the revised custom-plan interface
and example usage that implements existing MergeJoin on top of
this interface.

According to the discussion last week, I revised the portion
where custom-node is expected to perform a particular kind of
task, like scanning a relation, by putting polymorphism with
a set of callbacks set by custom-plan provider.
So, the core backend can handle this custom-plan node just
an abstracted plan-node with no anticipation.
Even though the subject of this message says "custom-scan",
I'd like to name the interface "custom-plan" instead, because
it became fully arbitrary of extension whether it scan on
a particular relation.

Definition of CustomXXXX data types were simplified:

    typedef struct CustomPath
    {
        Path        path;
        const struct CustomPathMethods   *methods;
    } CustomPath;

    typedef struct CustomPlan
    {
        Plan        plan;
        const struct CustomPlanMethods *methods;
    } CustomPlan;

    typedef struct CustomPlanState
    {
        PlanState   ps;
        const CustomPlanMethods    *methods;
    } CustomPlanState;

Each types have a base class and a set of function pointers that
characterize the behavior of this custom-plan node.
In usual use-cases, extension is expected to extend these classes
to keep their private data fields needed to implement its own
functionalities.

Most of the methods are designed to work as a thin layer towards
existing planner / executor functions, so custom-plan provides
has to be responsible to implement its method to communicate with
core backend as built-in ones doing.

Regarding to the topic we discussed last week,

* CUSTOM_VAR has gone.
The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
command output (including column names being referenced) even if
a custom-plan node replaced a join but has no underlying subplans
on left/right subtrees.
A typical situation like this is a remote-join implementation that
I tried to extend postgres_fdw on top of the previous interface.
It retrieves a flat result set of the remote join execution, thus
has no subplan locally. On the other hand, EXPLAIN tries to find
out "actual" Var node from the underlying subplan if a Var node
has special varno (INNER/OUTER/INDEX).
I put a special method to solve the problem. GetSpecialCustomVar
method is called if a certain Var node of custom-plan has a special
varno, then custom-plan provider can inform the core backend
an expression node to be referenced by this Var node.
It allows to solve the column name without recursive walking on
the subtrees, so it enables a custom-plan node that replaces
a part of plan-tree.
This method is optional, so available to adopt existing way if
custom-plan provider does not do anything special.

* Functions to be exposed, from static declaration

Right now, static functions are randomly exposed on demand.
So, we need more investigation which functions are needed, and
which others are not.
According to my trial, the part-2 patch that is MergeJoin on top
of the custom-plan interface, class of functions that recursively
walk on subplan tree have to be exposed. Like, ExplainPreScanNode,
create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan.
In case when custom-plan performs like built-in Append node, it
keeps a list of sub-plans in its private field, so the core backend
cannot know existence of sub-plans, thus its unavailable to make
subplan, unavailable to output EXPLAIN and so on.
It does not make sense to reworking on the extension side again.
Also, createplan.c has many useful functions to construct plan-node,
however, most of them are static because all the built-in plan-node
are constructed by the routines in this file, we didn't need to
expose them to others. I think, functions in createplan.c being
called by create_xxxx_plan() functions to construct plan-node should
be exposed for extension's convenient.

* Definition of add_join_path_hook

I didn't have idea to improve the definition and location of this
hook, so it is still on the tail of the add_paths_to_joinrel().
Its definition was a bit adjusted according to the feedback on
the pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
from the argument list. Indeed, these are specific to the built-in
MergeJoin logic and easy to reproduce.

* Hook location of add_scan_path_hook

I moved the add_scan_path_hook and set_cheapest() into
set_base_rel_pathlists() from various caller locations;
set_xxxx_pathlist() functions typically.
It enabled to consolidate the location to add custom-path for base
relations.

* CustomMergeJoin as a proof-of-concept

The contrib module in the part-2 portion is, a merge-join implementation
on top of custom-plan interface, even though 99% of its implementation is
identical with built-in ones.
Its purpose is to demonstrate a custom join logic can be implemented using
custom-plan interface, even if custom-plan node has underlying sub-plans
unlike previous my examples.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, March 07, 2014 3:09 AM
> To: Kaigai Kouhei(海外 浩平)
> Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
> Haas; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > I expected to include simple function pointers for copying and
> > text-output as follows:
>
> >   typedef struct {
> >       Plan    plan;
> >         :
> >       NodeCopy_function    node_copy;
> >       NodeTextOut_function node_textout;
> >   } Custom;
>
> I was thinking more like
>
> typedef struct CustomPathFuncs {
>     const char *name;    /* used for debugging purposes only */
>     NodeCopy_function node_copy;
>     NodeTextOut_function node_textout;
>     ... etc etc etc ...
> } CustomPathFuncs;
>
> typedef struct CustomPath {
>     Path path;
>     const CustomPathFuncs *funcs;
>     ... maybe a few more fields here, but not too darn many ...
> } CustomPath;
>
> and similarly for CustomPlan.
>
> The advantage of this way is it's very cheap for (what I expect will be)
> the common case where an extension has a fixed set of support functions
> for its custom paths and plans.  It just declares a static constant
> CustomPathFuncs struct, and puts a pointer to that into its paths.
>
> If an extension really needs to set the support functions on a per-object
> basis, it can do this:
>
> typdef struct MyCustomPath {
>        CustomPath cpath;
>        CustomPathFuncs funcs;
>        ... more fields ...
> } MyCustomPath;
>
> and then initialization of a MyCustomPath would include
>
>     mypath->cpath.funcs = &mypath->funcs;
>     mypath->funcs.node_copy = MyCustomPathCopy;
>     ... etc etc ...
>
> In this case we're arguably wasting one pointer worth of space in the path,
> but considering the number of function pointers such a path will be carrying,
> I don't think that's much of an objection.
>
> >> So?  If you did that, then you wouldn't have renumbered the Vars as
> >> INNER/OUTER.  I don't believe that CUSTOM_VAR is necessary at all; if
> >> it is needed, then there would also be a need for an additional tuple
> >> slot in executor contexts, which you haven't provided.
>
> > For example, the enhanced postgres_fdw fetches the result set of
> > remote join query, thus a tuple contains the fields come from both side.
> > In this case, what varno shall be suitable to put?
>
> Not sure what we'd do for the general case, but CUSTOM_VAR isn't the solution.
> Consider for example a join where both tables supply columns named "id"
> --- if you put them both in one tupledesc then there's no non-kluge way
> to identify them.
>
> Possibly the route to a solution involves adding another plan-node callback
> function that ruleutils.c would use for printing Vars in custom join nodes.
> Or maybe we could let the Vars keep their original RTE numbers, though that
> would complicate life at execution time.
>
> Anyway, if we're going to punt on add_join_path_hook for the time being,
> this problem can probably be left to solve later.  It won't arise for simple
> table-scan cases, nor for single-input plan nodes such as sorts.
>
>             regards, tom lane

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

17 March 2014, 00:30:37

Hello,

I adjusted the custom-plan interface patch little bit for the cache-only
scan patch; that is a demonstration module for vacuum-page hook on top of
the custom-plan interface.

fix_scan_expr() looks to me useful for custom-plan providers that want to
implement its own relation scan logic, even though they can implement it
using fix_expr_common() being already exposed.

Also, I removed the hardcoded portion from the nodeCustom.c although, it
may make sense to provide a few template functions to be called by custom-
plan providers, that performs usual common jobs like construction of expr-
context, assignment of result-slot, open relations, and so on.
I though the idea during implementation of BeginCustomPlan handler.
(These template functions are not in the attached patch yet.)
How about your opinion?

The major portion of this patch is not changed from v10.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Wednesday, March 12, 2014 1:55 PM
> To: Tom Lane
> Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
> Haas; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hello,
>
> The attached two patches are the revised custom-plan interface and example
> usage that implements existing MergeJoin on top of this interface.
>
> According to the discussion last week, I revised the portion where
> custom-node is expected to perform a particular kind of task, like scanning
> a relation, by putting polymorphism with a set of callbacks set by
> custom-plan provider.
> So, the core backend can handle this custom-plan node just an abstracted
> plan-node with no anticipation.
> Even though the subject of this message says "custom-scan", I'd like to
> name the interface "custom-plan" instead, because it became fully arbitrary
> of extension whether it scan on a particular relation.
>
> Definition of CustomXXXX data types were simplified:
>
>     typedef struct CustomPath
>     {
>         Path        path;
>         const struct CustomPathMethods   *methods;
>     } CustomPath;
>
>     typedef struct CustomPlan
>     {
>         Plan        plan;
>         const struct CustomPlanMethods *methods;
>     } CustomPlan;
>
>     typedef struct CustomPlanState
>     {
>         PlanState   ps;
>         const CustomPlanMethods    *methods;
>     } CustomPlanState;
>
> Each types have a base class and a set of function pointers that characterize
> the behavior of this custom-plan node.
> In usual use-cases, extension is expected to extend these classes to keep
> their private data fields needed to implement its own functionalities.
>
> Most of the methods are designed to work as a thin layer towards existing
> planner / executor functions, so custom-plan provides has to be responsible
> to implement its method to communicate with core backend as built-in ones
> doing.
>
> Regarding to the topic we discussed last week,
>
> * CUSTOM_VAR has gone.
> The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN command
> output (including column names being referenced) even if a custom-plan node
> replaced a join but has no underlying subplans on left/right subtrees.
> A typical situation like this is a remote-join implementation that I tried
> to extend postgres_fdw on top of the previous interface.
> It retrieves a flat result set of the remote join execution, thus has no
> subplan locally. On the other hand, EXPLAIN tries to find out "actual" Var
> node from the underlying subplan if a Var node has special varno
> (INNER/OUTER/INDEX).
> I put a special method to solve the problem. GetSpecialCustomVar method
> is called if a certain Var node of custom-plan has a special varno, then
> custom-plan provider can inform the core backend an expression node to be
> referenced by this Var node.
> It allows to solve the column name without recursive walking on the subtrees,
> so it enables a custom-plan node that replaces a part of plan-tree.
> This method is optional, so available to adopt existing way if custom-plan
> provider does not do anything special.
>
>
> * Functions to be exposed, from static declaration
>
> Right now, static functions are randomly exposed on demand.
> So, we need more investigation which functions are needed, and which others
> are not.
> According to my trial, the part-2 patch that is MergeJoin on top of the
> custom-plan interface, class of functions that recursively walk on subplan
> tree have to be exposed. Like, ExplainPreScanNode, create_plan_recurse,
> set_plan_refs, fix_expr_common or finalize_plan.
> In case when custom-plan performs like built-in Append node, it keeps a
> list of sub-plans in its private field, so the core backend cannot know
> existence of sub-plans, thus its unavailable to make subplan, unavailable
> to output EXPLAIN and so on.
> It does not make sense to reworking on the extension side again.
> Also, createplan.c has many useful functions to construct plan-node,
> however, most of them are static because all the built-in plan-node are
> constructed by the routines in this file, we didn't need to expose them
> to others. I think, functions in createplan.c being called by
> create_xxxx_plan() functions to construct plan-node should be exposed for
> extension's convenient.
>
>
> * Definition of add_join_path_hook
>
> I didn't have idea to improve the definition and location of this hook,
> so it is still on the tail of the add_paths_to_joinrel().
> Its definition was a bit adjusted according to the feedback on the
> pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> from the argument list. Indeed, these are specific to the built-in MergeJoin
> logic and easy to reproduce.
>
>
> * Hook location of add_scan_path_hook
>
> I moved the add_scan_path_hook and set_cheapest() into
> set_base_rel_pathlists() from various caller locations;
> set_xxxx_pathlist() functions typically.
> It enabled to consolidate the location to add custom-path for base
> relations.
>
>
> * CustomMergeJoin as a proof-of-concept
>
> The contrib module in the part-2 portion is, a merge-join implementation
> on top of custom-plan interface, even though 99% of its implementation is
> identical with built-in ones.
> Its purpose is to demonstrate a custom join logic can be implemented using
> custom-plan interface, even if custom-plan node has underlying sub-plans
> unlike previous my examples.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> <kaigai@ak.jp.nec.com>
>
>
> > -----Original Message-----
> > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > Sent: Friday, March 07, 2014 3:09 AM
> > To: Kaigai Kouhei(海外 浩平)
> > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > Robert Haas; PgHacker; Peter Eisentraut
> > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> >
> > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > > I expected to include simple function pointers for copying and
> > > text-output as follows:
> >
> > >   typedef struct {
> > >       Plan    plan;
> > >         :
> > >       NodeCopy_function    node_copy;
> > >       NodeTextOut_function node_textout;
> > >   } Custom;
> >
> > I was thinking more like
> >
> > typedef struct CustomPathFuncs {
> >     const char *name;    /* used for debugging purposes only */
> >     NodeCopy_function node_copy;
> >     NodeTextOut_function node_textout;
> >     ... etc etc etc ...
> > } CustomPathFuncs;
> >
> > typedef struct CustomPath {
> >     Path path;
> >     const CustomPathFuncs *funcs;
> >     ... maybe a few more fields here, but not too darn many ...
> > } CustomPath;
> >
> > and similarly for CustomPlan.
> >
> > The advantage of this way is it's very cheap for (what I expect will
> > be) the common case where an extension has a fixed set of support
> > functions for its custom paths and plans.  It just declares a static
> > constant CustomPathFuncs struct, and puts a pointer to that into its paths.
> >
> > If an extension really needs to set the support functions on a
> > per-object basis, it can do this:
> >
> > typdef struct MyCustomPath {
> >        CustomPath cpath;
> >        CustomPathFuncs funcs;
> >        ... more fields ...
> > } MyCustomPath;
> >
> > and then initialization of a MyCustomPath would include
> >
> >     mypath->cpath.funcs = &mypath->funcs;
> >     mypath->funcs.node_copy = MyCustomPathCopy;
> >     ... etc etc ...
> >
> > In this case we're arguably wasting one pointer worth of space in the
> > path, but considering the number of function pointers such a path will
> > be carrying, I don't think that's much of an objection.
> >
> > >> So?  If you did that, then you wouldn't have renumbered the Vars as
> > >> INNER/OUTER.  I don't believe that CUSTOM_VAR is necessary at all;
> > >> if it is needed, then there would also be a need for an additional
> > >> tuple slot in executor contexts, which you haven't provided.
> >
> > > For example, the enhanced postgres_fdw fetches the result set of
> > > remote join query, thus a tuple contains the fields come from both side.
> > > In this case, what varno shall be suitable to put?
> >
> > Not sure what we'd do for the general case, but CUSTOM_VAR isn't the
> solution.
> > Consider for example a join where both tables supply columns named "id"
> > --- if you put them both in one tupledesc then there's no non-kluge
> > way to identify them.
> >
> > Possibly the route to a solution involves adding another plan-node
> > callback function that ruleutils.c would use for printing Vars in custom
> join nodes.
> > Or maybe we could let the Vars keep their original RTE numbers, though
> > that would complicate life at execution time.
> >
> > Anyway, if we're going to punt on add_join_path_hook for the time
> > being, this problem can probably be left to solve later.  It won't
> > arise for simple table-scan cases, nor for single-input plan nodes such
> as sorts.
> >
> >             regards, tom lane

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

20 March 2014, 01:46:56

Hello,

> > * Definition of add_join_path_hook
> >
> > I didn't have idea to improve the definition and location of this
> > hook, so it is still on the tail of the add_paths_to_joinrel().
> > Its definition was a bit adjusted according to the feedback on the
> > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> > from the argument list. Indeed, these are specific to the built-in
> > MergeJoin logic and easy to reproduce.
> >
>
After the submission, I'm still investigating better way to put a hook
to add custom join paths.

Regarding to the comment from Tom:
| * The API for add_join_path_hook seems overcomplex, as well as too full
| of implementation details that should remain private to joinpath.c.
| I particularly object to passing the mergeclause lists, which seem unlikely
| to be of interest for non-mergejoin plan types anyway.
| More generally, it seems likely that this hook is at the wrong level of
| abstraction; it forces the hook function to concern itself with a lot of
| stuff about join legality and parameterization (which I note your patch3
| code fails to do; but that's not an optional thing).
|
The earlier half was already done. My trouble is the later portion.

The overall jobs of add_join_path_hook are below:
1. construction of parameterized path information; being saved at  param_source_rel and extra_lateral_rels.
2. Try to add mergejoin paths with underlying Sort node
3. Try to add mergejoin/nestloop paths without underlying Sort node
4. Try to add hashjoin paths

It seems to me the check for join legality and parameterization are
built within individual routines for each join algorithm.
(what does the "join legality check" actually mean?)

Probably, it makes sense to provide a common utility function to be
called back from the extension if we can find out a common part for
all the join logics, however, I don't have clear idea to cut off the
common portion. What's jobs can be done independent from the join
algorithm??

I'd like to see ideas around this issue. Of course, I also think it
is still an option to handle it by extension on the initial version.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
> Sent: Monday, March 17, 2014 9:29 AM
> To: Kaigai Kouhei(海外 浩平); Tom Lane
> Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
> Haas; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hello,
>
> I adjusted the custom-plan interface patch little bit for the cache-only
> scan patch; that is a demonstration module for vacuum-page hook on top of
> the custom-plan interface.
>
> fix_scan_expr() looks to me useful for custom-plan providers that want to
> implement its own relation scan logic, even though they can implement it
> using fix_expr_common() being already exposed.
>
> Also, I removed the hardcoded portion from the nodeCustom.c although, it
> may make sense to provide a few template functions to be called by custom-
> plan providers, that performs usual common jobs like construction of expr-
> context, assignment of result-slot, open relations, and so on.
> I though the idea during implementation of BeginCustomPlan handler.
> (These template functions are not in the attached patch yet.) How about
> your opinion?
>
> The major portion of this patch is not changed from v10.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> <kaigai@ak.jp.nec.com>
>
>
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org
> > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
> > Sent: Wednesday, March 12, 2014 1:55 PM
> > To: Tom Lane
> > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > Robert Haas; PgHacker; Peter Eisentraut
> > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> >
> > Hello,
> >
> > The attached two patches are the revised custom-plan interface and
> > example usage that implements existing MergeJoin on top of this interface.
> >
> > According to the discussion last week, I revised the portion where
> > custom-node is expected to perform a particular kind of task, like
> > scanning a relation, by putting polymorphism with a set of callbacks
> > set by custom-plan provider.
> > So, the core backend can handle this custom-plan node just an
> > abstracted plan-node with no anticipation.
> > Even though the subject of this message says "custom-scan", I'd like
> > to name the interface "custom-plan" instead, because it became fully
> > arbitrary of extension whether it scan on a particular relation.
> >
> > Definition of CustomXXXX data types were simplified:
> >
> >     typedef struct CustomPath
> >     {
> >         Path        path;
> >         const struct CustomPathMethods   *methods;
> >     } CustomPath;
> >
> >     typedef struct CustomPlan
> >     {
> >         Plan        plan;
> >         const struct CustomPlanMethods *methods;
> >     } CustomPlan;
> >
> >     typedef struct CustomPlanState
> >     {
> >         PlanState   ps;
> >         const CustomPlanMethods    *methods;
> >     } CustomPlanState;
> >
> > Each types have a base class and a set of function pointers that
> > characterize the behavior of this custom-plan node.
> > In usual use-cases, extension is expected to extend these classes to
> > keep their private data fields needed to implement its own
> functionalities.
> >
> > Most of the methods are designed to work as a thin layer towards
> > existing planner / executor functions, so custom-plan provides has to
> > be responsible to implement its method to communicate with core
> > backend as built-in ones doing.
> >
> > Regarding to the topic we discussed last week,
> >
> > * CUSTOM_VAR has gone.
> > The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
> > command output (including column names being referenced) even if a
> > custom-plan node replaced a join but has no underlying subplans on
> left/right subtrees.
> > A typical situation like this is a remote-join implementation that I
> > tried to extend postgres_fdw on top of the previous interface.
> > It retrieves a flat result set of the remote join execution, thus has
> > no subplan locally. On the other hand, EXPLAIN tries to find out
> > "actual" Var node from the underlying subplan if a Var node has
> > special varno (INNER/OUTER/INDEX).
> > I put a special method to solve the problem. GetSpecialCustomVar
> > method is called if a certain Var node of custom-plan has a special
> > varno, then custom-plan provider can inform the core backend an
> > expression node to be referenced by this Var node.
> > It allows to solve the column name without recursive walking on the
> > subtrees, so it enables a custom-plan node that replaces a part of
> plan-tree.
> > This method is optional, so available to adopt existing way if
> > custom-plan provider does not do anything special.
> >
> >
> > * Functions to be exposed, from static declaration
> >
> > Right now, static functions are randomly exposed on demand.
> > So, we need more investigation which functions are needed, and which
> > others are not.
> > According to my trial, the part-2 patch that is MergeJoin on top of
> > the custom-plan interface, class of functions that recursively walk on
> > subplan tree have to be exposed. Like, ExplainPreScanNode,
> > create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan.
> > In case when custom-plan performs like built-in Append node, it keeps
> > a list of sub-plans in its private field, so the core backend cannot
> > know existence of sub-plans, thus its unavailable to make subplan,
> > unavailable to output EXPLAIN and so on.
> > It does not make sense to reworking on the extension side again.
> > Also, createplan.c has many useful functions to construct plan-node,
> > however, most of them are static because all the built-in plan-node
> > are constructed by the routines in this file, we didn't need to expose
> > them to others. I think, functions in createplan.c being called by
> > create_xxxx_plan() functions to construct plan-node should be exposed
> > for extension's convenient.
> >
> >
> > * Definition of add_join_path_hook
> >
> > I didn't have idea to improve the definition and location of this
> > hook, so it is still on the tail of the add_paths_to_joinrel().
> > Its definition was a bit adjusted according to the feedback on the
> > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> > from the argument list. Indeed, these are specific to the built-in
> > MergeJoin logic and easy to reproduce.
> >
> >
> > * Hook location of add_scan_path_hook
> >
> > I moved the add_scan_path_hook and set_cheapest() into
> > set_base_rel_pathlists() from various caller locations;
> > set_xxxx_pathlist() functions typically.
> > It enabled to consolidate the location to add custom-path for base
> > relations.
> >
> >
> > * CustomMergeJoin as a proof-of-concept
> >
> > The contrib module in the part-2 portion is, a merge-join
> > implementation on top of custom-plan interface, even though 99% of its
> > implementation is identical with built-in ones.
> > Its purpose is to demonstrate a custom join logic can be implemented
> > using custom-plan interface, even if custom-plan node has underlying
> > sub-plans unlike previous my examples.
> >
> > Thanks,
> > --
> > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> > <kaigai@ak.jp.nec.com>
> >
> >
> > > -----Original Message-----
> > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > > Sent: Friday, March 07, 2014 3:09 AM
> > > To: Kaigai Kouhei(海外 浩平)
> > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > > Robert Haas; PgHacker; Peter Eisentraut
> > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> > >
> > > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > > > I expected to include simple function pointers for copying and
> > > > text-output as follows:
> > >
> > > >   typedef struct {
> > > >       Plan    plan;
> > > >         :
> > > >       NodeCopy_function    node_copy;
> > > >       NodeTextOut_function node_textout;
> > > >   } Custom;
> > >
> > > I was thinking more like
> > >
> > > typedef struct CustomPathFuncs {
> > >     const char *name;    /* used for debugging purposes only */
> > >     NodeCopy_function node_copy;
> > >     NodeTextOut_function node_textout;
> > >     ... etc etc etc ...
> > > } CustomPathFuncs;
> > >
> > > typedef struct CustomPath {
> > >     Path path;
> > >     const CustomPathFuncs *funcs;
> > >     ... maybe a few more fields here, but not too darn many ...
> > > } CustomPath;
> > >
> > > and similarly for CustomPlan.
> > >
> > > The advantage of this way is it's very cheap for (what I expect will
> > > be) the common case where an extension has a fixed set of support
> > > functions for its custom paths and plans.  It just declares a static
> > > constant CustomPathFuncs struct, and puts a pointer to that into its
> paths.
> > >
> > > If an extension really needs to set the support functions on a
> > > per-object basis, it can do this:
> > >
> > > typdef struct MyCustomPath {
> > >        CustomPath cpath;
> > >        CustomPathFuncs funcs;
> > >        ... more fields ...
> > > } MyCustomPath;
> > >
> > > and then initialization of a MyCustomPath would include
> > >
> > >     mypath->cpath.funcs = &mypath->funcs;
> > >     mypath->funcs.node_copy = MyCustomPathCopy;
> > >     ... etc etc ...
> > >
> > > In this case we're arguably wasting one pointer worth of space in
> > > the path, but considering the number of function pointers such a
> > > path will be carrying, I don't think that's much of an objection.
> > >
> > > >> So?  If you did that, then you wouldn't have renumbered the Vars
> > > >> as INNER/OUTER.  I don't believe that CUSTOM_VAR is necessary at
> > > >> all; if it is needed, then there would also be a need for an
> > > >> additional tuple slot in executor contexts, which you haven't
> provided.
> > >
> > > > For example, the enhanced postgres_fdw fetches the result set of
> > > > remote join query, thus a tuple contains the fields come from both
> side.
> > > > In this case, what varno shall be suitable to put?
> > >
> > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't the
> > solution.
> > > Consider for example a join where both tables supply columns named "id"
> > > --- if you put them both in one tupledesc then there's no non-kluge
> > > way to identify them.
> > >
> > > Possibly the route to a solution involves adding another plan-node
> > > callback function that ruleutils.c would use for printing Vars in
> > > custom
> > join nodes.
> > > Or maybe we could let the Vars keep their original RTE numbers,
> > > though that would complicate life at execution time.
> > >
> > > Anyway, if we're going to punt on add_join_path_hook for the time
> > > being, this problem can probably be left to solve later.  It won't
> > > arise for simple table-scan cases, nor for single-input plan nodes
> > > such
> > as sorts.
> > >
> > >             regards, tom lane

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

24 March 2014, 10:26:30

Hello,

Because of patch conflict towards the latest master branch, I rebased
the custom-plan interface patches; no functional difference from the v11.

Brief summary of the current approach that has been revised from my
original submission through the discussion on pgsql-hackers:

The plannode was renamed to CustomPlan, instead of CustomScan, because
it dropped all the hardcoded portion that assumes the custom-node shall
perform as alternative scan or join method, because it prevents this
custom-node to perform as other stuff; like sort or append potentially.
According to the suggestion by Tom, I put a structure that contains
several function pointers on the new CustomPlan node, and extension will
allocate an object that extends CustomPlan node.
It looks like polymorphism in object oriented programming language.
The core backend knows abstracted set of methods defined in the
tables of function pointers, and extension can implement its own logic
on the callback, using private state on the extended object.

Some issues are still under discussion:
* Design of add_join_path_hook
Tom suggested that core backend can support to check join legality and
parameterization stuff, however, it looks to me existing code handles
join legality checks within the function of individual join logic.
So, I'm not certain whether we can have a common legality check for
all the (potential) alternative join implementation.
The part-2 is a demonstration module that implemented existing merge-
join logic, but on top of this interface.

* Functions to be exposed for extensions
Some utility functions useful to implement plan node are declared as
static functions, because most of the stuff are implemented within
createplan.c, setrefs.c and so on, thus these were not needed to
expose other stuff. However, extension will become a caller for these
functions due to custom-plan interface, even though they are implemented
out of the core.
In my investigation, class of functions that recursively walk on subplan
tree have to be exposed, like ExplainPreScanNode, create_plan_recurse,
set_plan_refs, fix_expr_common or finalize_plan.
Do we have other criteria to determine what function shall be exposed?

Any help and feedback are welcome.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>


> -----Original Message-----
> From: Kaigai Kouhei(海外 浩平)
> Sent: Thursday, March 20, 2014 10:46 AM
> To: Kaigai Kouhei(海外 浩平); Tom Lane
> Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert
> Haas; PgHacker; Peter Eisentraut
> Subject: RE: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
> Hello,
>
> > > * Definition of add_join_path_hook
> > >
> > > I didn't have idea to improve the definition and location of this
> > > hook, so it is still on the tail of the add_paths_to_joinrel().
> > > Its definition was a bit adjusted according to the feedback on the
> > > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> > > from the argument list. Indeed, these are specific to the built-in
> > > MergeJoin logic and easy to reproduce.
> > >
> >
> After the submission, I'm still investigating better way to put a hook to
> add custom join paths.
>
> Regarding to the comment from Tom:
> | * The API for add_join_path_hook seems overcomplex, as well as too
> | full of implementation details that should remain private to joinpath.c.
> | I particularly object to passing the mergeclause lists, which seem
> | unlikely to be of interest for non-mergejoin plan types anyway.
> | More generally, it seems likely that this hook is at the wrong level
> | of abstraction; it forces the hook function to concern itself with a
> | lot of stuff about join legality and parameterization (which I note
> | your patch3 code fails to do; but that's not an optional thing).
> |
> The earlier half was already done. My trouble is the later portion.
>
> The overall jobs of add_join_path_hook are below:
> 1. construction of parameterized path information; being saved at
>    param_source_rel and extra_lateral_rels.
> 2. Try to add mergejoin paths with underlying Sort node 3. Try to add
> mergejoin/nestloop paths without underlying Sort node 4. Try to add hashjoin
> paths
>
> It seems to me the check for join legality and parameterization are built
> within individual routines for each join algorithm.
> (what does the "join legality check" actually mean?)
>
> Probably, it makes sense to provide a common utility function to be called
> back from the extension if we can find out a common part for all the join
> logics, however, I don't have clear idea to cut off the common portion.
> What's jobs can be done independent from the join algorithm??
>
> I'd like to see ideas around this issue. Of course, I also think it is still
> an option to handle it by extension on the initial version.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> <kaigai@ak.jp.nec.com>
>
>
> > -----Original Message-----
> > From: pgsql-hackers-owner@postgresql.org
> > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai
> > Sent: Monday, March 17, 2014 9:29 AM
> > To: Kaigai Kouhei(海外 浩平); Tom Lane
> > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > Robert Haas; PgHacker; Peter Eisentraut
> > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> >
> > Hello,
> >
> > I adjusted the custom-plan interface patch little bit for the
> > cache-only scan patch; that is a demonstration module for vacuum-page
> > hook on top of the custom-plan interface.
> >
> > fix_scan_expr() looks to me useful for custom-plan providers that want
> > to implement its own relation scan logic, even though they can
> > implement it using fix_expr_common() being already exposed.
> >
> > Also, I removed the hardcoded portion from the nodeCustom.c although,
> > it may make sense to provide a few template functions to be called by
> > custom- plan providers, that performs usual common jobs like
> > construction of expr- context, assignment of result-slot, open relations,
> and so on.
> > I though the idea during implementation of BeginCustomPlan handler.
> > (These template functions are not in the attached patch yet.) How
> > about your opinion?
> >
> > The major portion of this patch is not changed from v10.
> >
> > Thanks,
> > --
> > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> > <kaigai@ak.jp.nec.com>
> >
> >
> > > -----Original Message-----
> > > From: pgsql-hackers-owner@postgresql.org
> > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei
> > > Kaigai
> > > Sent: Wednesday, March 12, 2014 1:55 PM
> > > To: Tom Lane
> > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > > Robert Haas; PgHacker; Peter Eisentraut
> > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> > >
> > > Hello,
> > >
> > > The attached two patches are the revised custom-plan interface and
> > > example usage that implements existing MergeJoin on top of this
> interface.
> > >
> > > According to the discussion last week, I revised the portion where
> > > custom-node is expected to perform a particular kind of task, like
> > > scanning a relation, by putting polymorphism with a set of callbacks
> > > set by custom-plan provider.
> > > So, the core backend can handle this custom-plan node just an
> > > abstracted plan-node with no anticipation.
> > > Even though the subject of this message says "custom-scan", I'd like
> > > to name the interface "custom-plan" instead, because it became fully
> > > arbitrary of extension whether it scan on a particular relation.
> > >
> > > Definition of CustomXXXX data types were simplified:
> > >
> > >     typedef struct CustomPath
> > >     {
> > >         Path        path;
> > >         const struct CustomPathMethods   *methods;
> > >     } CustomPath;
> > >
> > >     typedef struct CustomPlan
> > >     {
> > >         Plan        plan;
> > >         const struct CustomPlanMethods *methods;
> > >     } CustomPlan;
> > >
> > >     typedef struct CustomPlanState
> > >     {
> > >         PlanState   ps;
> > >         const CustomPlanMethods    *methods;
> > >     } CustomPlanState;
> > >
> > > Each types have a base class and a set of function pointers that
> > > characterize the behavior of this custom-plan node.
> > > In usual use-cases, extension is expected to extend these classes to
> > > keep their private data fields needed to implement its own
> > functionalities.
> > >
> > > Most of the methods are designed to work as a thin layer towards
> > > existing planner / executor functions, so custom-plan provides has
> > > to be responsible to implement its method to communicate with core
> > > backend as built-in ones doing.
> > >
> > > Regarding to the topic we discussed last week,
> > >
> > > * CUSTOM_VAR has gone.
> > > The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN
> > > command output (including column names being referenced) even if a
> > > custom-plan node replaced a join but has no underlying subplans on
> > left/right subtrees.
> > > A typical situation like this is a remote-join implementation that I
> > > tried to extend postgres_fdw on top of the previous interface.
> > > It retrieves a flat result set of the remote join execution, thus
> > > has no subplan locally. On the other hand, EXPLAIN tries to find out
> > > "actual" Var node from the underlying subplan if a Var node has
> > > special varno (INNER/OUTER/INDEX).
> > > I put a special method to solve the problem. GetSpecialCustomVar
> > > method is called if a certain Var node of custom-plan has a special
> > > varno, then custom-plan provider can inform the core backend an
> > > expression node to be referenced by this Var node.
> > > It allows to solve the column name without recursive walking on the
> > > subtrees, so it enables a custom-plan node that replaces a part of
> > plan-tree.
> > > This method is optional, so available to adopt existing way if
> > > custom-plan provider does not do anything special.
> > >
> > >
> > > * Functions to be exposed, from static declaration
> > >
> > > Right now, static functions are randomly exposed on demand.
> > > So, we need more investigation which functions are needed, and which
> > > others are not.
> > > According to my trial, the part-2 patch that is MergeJoin on top of
> > > the custom-plan interface, class of functions that recursively walk
> > > on subplan tree have to be exposed. Like, ExplainPreScanNode,
> > > create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan.
> > > In case when custom-plan performs like built-in Append node, it
> > > keeps a list of sub-plans in its private field, so the core backend
> > > cannot know existence of sub-plans, thus its unavailable to make
> > > subplan, unavailable to output EXPLAIN and so on.
> > > It does not make sense to reworking on the extension side again.
> > > Also, createplan.c has many useful functions to construct plan-node,
> > > however, most of them are static because all the built-in plan-node
> > > are constructed by the routines in this file, we didn't need to
> > > expose them to others. I think, functions in createplan.c being
> > > called by
> > > create_xxxx_plan() functions to construct plan-node should be
> > > exposed for extension's convenient.
> > >
> > >
> > > * Definition of add_join_path_hook
> > >
> > > I didn't have idea to improve the definition and location of this
> > > hook, so it is still on the tail of the add_paths_to_joinrel().
> > > Its definition was a bit adjusted according to the feedback on the
> > > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors"
> > > from the argument list. Indeed, these are specific to the built-in
> > > MergeJoin logic and easy to reproduce.
> > >
> > >
> > > * Hook location of add_scan_path_hook
> > >
> > > I moved the add_scan_path_hook and set_cheapest() into
> > > set_base_rel_pathlists() from various caller locations;
> > > set_xxxx_pathlist() functions typically.
> > > It enabled to consolidate the location to add custom-path for base
> > > relations.
> > >
> > >
> > > * CustomMergeJoin as a proof-of-concept
> > >
> > > The contrib module in the part-2 portion is, a merge-join
> > > implementation on top of custom-plan interface, even though 99% of
> > > its implementation is identical with built-in ones.
> > > Its purpose is to demonstrate a custom join logic can be implemented
> > > using custom-plan interface, even if custom-plan node has underlying
> > > sub-plans unlike previous my examples.
> > >
> > > Thanks,
> > > --
> > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei
> > > <kaigai@ak.jp.nec.com>
> > >
> > >
> > > > -----Original Message-----
> > > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> > > > Sent: Friday, March 07, 2014 3:09 AM
> > > > To: Kaigai Kouhei(海外 浩平)
> > > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski;
> > > > Robert Haas; PgHacker; Peter Eisentraut
> > > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
> > > >
> > > > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes:
> > > > > I expected to include simple function pointers for copying and
> > > > > text-output as follows:
> > > >
> > > > >   typedef struct {
> > > > >       Plan    plan;
> > > > >         :
> > > > >       NodeCopy_function    node_copy;
> > > > >       NodeTextOut_function node_textout;
> > > > >   } Custom;
> > > >
> > > > I was thinking more like
> > > >
> > > > typedef struct CustomPathFuncs {
> > > >     const char *name;    /* used for debugging purposes only */
> > > >     NodeCopy_function node_copy;
> > > >     NodeTextOut_function node_textout;
> > > >     ... etc etc etc ...
> > > > } CustomPathFuncs;
> > > >
> > > > typedef struct CustomPath {
> > > >     Path path;
> > > >     const CustomPathFuncs *funcs;
> > > >     ... maybe a few more fields here, but not too darn many ...
> > > > } CustomPath;
> > > >
> > > > and similarly for CustomPlan.
> > > >
> > > > The advantage of this way is it's very cheap for (what I expect
> > > > will
> > > > be) the common case where an extension has a fixed set of support
> > > > functions for its custom paths and plans.  It just declares a
> > > > static constant CustomPathFuncs struct, and puts a pointer to that
> > > > into its
> > paths.
> > > >
> > > > If an extension really needs to set the support functions on a
> > > > per-object basis, it can do this:
> > > >
> > > > typdef struct MyCustomPath {
> > > >        CustomPath cpath;
> > > >        CustomPathFuncs funcs;
> > > >        ... more fields ...
> > > > } MyCustomPath;
> > > >
> > > > and then initialization of a MyCustomPath would include
> > > >
> > > >     mypath->cpath.funcs = &mypath->funcs;
> > > >     mypath->funcs.node_copy = MyCustomPathCopy;
> > > >     ... etc etc ...
> > > >
> > > > In this case we're arguably wasting one pointer worth of space in
> > > > the path, but considering the number of function pointers such a
> > > > path will be carrying, I don't think that's much of an objection.
> > > >
> > > > >> So?  If you did that, then you wouldn't have renumbered the
> > > > >> Vars as INNER/OUTER.  I don't believe that CUSTOM_VAR is
> > > > >> necessary at all; if it is needed, then there would also be a
> > > > >> need for an additional tuple slot in executor contexts, which
> > > > >> you haven't
> > provided.
> > > >
> > > > > For example, the enhanced postgres_fdw fetches the result set of
> > > > > remote join query, thus a tuple contains the fields come from
> > > > > both
> > side.
> > > > > In this case, what varno shall be suitable to put?
> > > >
> > > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't
> > > > the
> > > solution.
> > > > Consider for example a join where both tables supply columns named
> "id"
> > > > --- if you put them both in one tupledesc then there's no
> > > > non-kluge way to identify them.
> > > >
> > > > Possibly the route to a solution involves adding another plan-node
> > > > callback function that ruleutils.c would use for printing Vars in
> > > > custom
> > > join nodes.
> > > > Or maybe we could let the Vars keep their original RTE numbers,
> > > > though that would complicate life at execution time.
> > > >
> > > > Anyway, if we're going to punt on add_join_path_hook for the time
> > > > being, this problem can probably be left to solve later.  It won't
> > > > arise for simple table-scan cases, nor for single-input plan nodes
> > > > such
> > > as sorts.
> > > >
> > > >             regards, tom lane

Attachment

Re: Custom Scan APIs (Re: Custom Plan node)

From

Simon Riggs

Date:

14 April 2014, 09:47:08

On 24 March 2014 10:25, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:

> Brief summary of the current approach that has been revised from my
> original submission through the discussion on pgsql-hackers:
>
> The plannode was renamed to CustomPlan, instead of CustomScan, because
> it dropped all the hardcoded portion that assumes the custom-node shall
> perform as alternative scan or join method, because it prevents this
> custom-node to perform as other stuff; like sort or append potentially.
> According to the suggestion by Tom, I put a structure that contains
> several function pointers on the new CustomPlan node, and extension will
> allocate an object that extends CustomPlan node.
> It looks like polymorphism in object oriented programming language.
> The core backend knows abstracted set of methods defined in the
> tables of function pointers, and extension can implement its own logic
> on the callback, using private state on the extended object.

I just wanted to add some review comments here. I also apologise for
not reviewing this earlier; I had misunderstood the maturity of the
patch and had assumed it was a request for comments/WIP.

Overall, I very much support the concept of providing for alternate
scans. I like the placement of calls in the optimizer and we'll be
able to do much with that. Other comments in order that I consider
them important.

* There is no declarative structure for this at all. I was expecting
to see a way to declare that a specific table might have an alternate
scan path, but we just call the plugin always and it has to separately
make a cache lookup to see if anything extra is needed. The Index AM
allows us to perform scans, yet indexes are very clearly declared and
easily and clearly identifiable. We need the same thing for alternate
plans.

* There is no mechanism at all for maintaining other data structures.
Are we supposed to use the Index AM? Triggers? Or? The lack of clarity
there is disturbing, though I could be simply missing something big
and obvious.

* There is no catalog support. Complex APIs in Postgres typically have
a structure like pg_am which allows the features to be clearly
identified. I'd be worried about our ability to keep track of so many
calls in such pivotal places without that. I want to be able to know
what a plugin is doing, especially when it will likely come in binary
form. I don't see an easy way to have plugins partially override each
other or work together. What happens when I want to use Mr.X's clever
new join plugin at the same time as Mr.Y's GPU accelerator?

* How do we control security? What stops the Custom Scan API from
overriding privileges? Shouldn't the alternate data structures be
recognised as objects so we can grant privileges? Or do we simply say
if an alternate data structure is linked to a heap then has implied
privileges. It would be a shame to implement better security in one
patch and then ignore it in another (from the same author).

All of the above I can let pass in this release, but in the longer
term we need to look for more structure around these ideas so we can
manage and control what happens. The way this is now is quite raw -
suitable for R&D but not for longer term production usage by a wider
audience, IMHO. I wouldn't like to make commitments about the
longevity of this API either; if we accept it, it should have a big
"may change radically" sign hanging on it. Having said that, I am
interested in progress here and I accept that things will look like
this at this stage of the ideas process, so these are not things to
cause delay.

Some things I would like to see change on in this release are...

* It's not clear to me when we load/create the alternate data
structures. That can't happen at _init time. I was expecting this to
look like an infrastructure for unlogged indexes, but it doesn't look
like that either.

* The name Custom makes me nervous. It sounds too generic, as if the
design or objectives for this is are a little unclear. AlternateScan
sounds like a better name since its clearer that we are scanning an
alternate data structure rather than the main heap.

* The prune hook makes me feel very uneasy. It seems weirdly specific
implementation detail, made stranger by the otherwise lack of data
maintenance API calls. Calling that for every dirty page sounds like
an issue and my patch rejection indicator is flashing red around that.

Two additional use cases I will be looking to explore will be

* Code to make Mat Views recognised as alternate scan targets
* Code to allow queries access to sampled data rather the fully
detailed data, if the result would be within acceptable tolerance for
user

-- Simon Riggs                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: Custom Scan APIs (Re: Custom Plan node)

From

Tom Lane

Date:

14 April 2014, 20:43:42

Simon Riggs <simon@2ndQuadrant.com> writes:
> [ assorted comments about custom-scan patch, but particularly ]

> * The prune hook makes me feel very uneasy. It seems weirdly specific
> implementation detail, made stranger by the otherwise lack of data
> maintenance API calls. Calling that for every dirty page sounds like
> an issue and my patch rejection indicator is flashing red around that.

Yeah.  After a fast review of the custom-scan and cache-scan patches, it
seems to me that my original fears are largely confirmed: the custom scan
patch is not going to be sufficient to allow development of any truly new
plan type.  Yeah, you can plug in some new execution node types, but
actually doing anything interesting is going to require patching other
parts of the system.  Are we going to say to all comers, "sure, we'll put
a hook call anywhere you like, just ask"?  I can't see this as being the
way to go.

Another way of describing the problem is that it's not clear where the API
boundaries are for potential users of a custom-scan feature.  (Simon said
several things that are closely related to this point.)  One thing I don't
like at all about the patch is its willingness to turn anything whatsoever
into a publicly exported function, which basically says that the design
attitude is there *are* no boundaries.  But that's not going to lead to
anything maintainable.  We're certainly not going to want to guarantee
that these suddenly-exported functions will all now have stable APIs
forevermore.

Overall I concur with Simon's conclusion that this might be of interest
for R&D purposes, but it's hard to see anyone wanting to support a
production feature built on this.  It would be only marginally less
painful than supporting a patch that just adds the equivalent code
to the backend in the traditional way.

So I'm feeling that this was kind of a dead end.  It was worth doing
the legwork to see if this sort of approach could be useful, but the
answer seems like "no".
        regards, tom lane

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

15 April 2014, 00:22:39

> On 24 March 2014 10:25, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
>
> > Brief summary of the current approach that has been revised from my
> > original submission through the discussion on pgsql-hackers:
> >
> > The plannode was renamed to CustomPlan, instead of CustomScan, because
> > it dropped all the hardcoded portion that assumes the custom-node
> > shall perform as alternative scan or join method, because it prevents
> > this custom-node to perform as other stuff; like sort or append
> potentially.
> > According to the suggestion by Tom, I put a structure that contains
> > several function pointers on the new CustomPlan node, and extension
> > will allocate an object that extends CustomPlan node.
> > It looks like polymorphism in object oriented programming language.
> > The core backend knows abstracted set of methods defined in the tables
> > of function pointers, and extension can implement its own logic on the
> > callback, using private state on the extended object.
>
> I just wanted to add some review comments here. I also apologise for not
> reviewing this earlier; I had misunderstood the maturity of the patch and
> had assumed it was a request for comments/WIP.
>
Thanks for your interest and many comments.

> Overall, I very much support the concept of providing for alternate scans.
> I like the placement of calls in the optimizer and we'll be able to do much
> with that. Other comments in order that I consider them important.
>
> * There is no declarative structure for this at all. I was expecting to
> see a way to declare that a specific table might have an alternate scan
> path, but we just call the plugin always and it has to separately make a
> cache lookup to see if anything extra is needed. The Index AM allows us
> to perform scans, yet indexes are very clearly declared and easily and
> clearly identifiable. We need the same thing for alternate plans.
>
> * There is no mechanism at all for maintaining other data structures.
> Are we supposed to use the Index AM? Triggers? Or? The lack of clarity there
> is disturbing, though I could be simply missing something big and obvious.
>
> * There is no catalog support. Complex APIs in Postgres typically have a
> structure like pg_am which allows the features to be clearly identified.
> I'd be worried about our ability to keep track of so many calls in such
> pivotal places without that. I want to be able to know what a plugin is
> doing, especially when it will likely come in binary form. I don't see an
> easy way to have plugins partially override each other or work together.
> What happens when I want to use Mr.X's clever new join plugin at the same
> time as Mr.Y's GPU accelerator?
>
It was a choice on implementation. I just followed usual PG's hook manner;
that expects loaded extension saves the original function pointer and
has secondary call towards the function on its invocation. Thus, it needs
to walk on chain of extensions if multiple custom providers are loaded.

Even though I initially chose this design, it is an option to have catalog
support to track registered custom-scan providers and its metadata; what
function generate paths, what flags are turned on or what kind of relations
are supported...etc. Probably, optimizer skip some extensions that don't
support the target relation obviously.


> * How do we control security? What stops the Custom Scan API from overriding
> privileges? Shouldn't the alternate data structures be recognised as
> objects so we can grant privileges? Or do we simply say if an alternate
> data structure is linked to a heap then has implied privileges. It would
> be a shame to implement better security in one patch and then ignore it
> in another (from the same author).
>
In general, we have no mechanism to prevent overriding privilege mechanism
by c-binary extensions. Extension can override (existing) hooks and modify
requiredPerms bits of RangeTblEntry; that eventually cause privilege bypass.
But it is neutral for custom-scan API itself. Even though we implements
an alternative scan logic on the API, the core backend still checks required
privileges on ExecCheckRTPerms being called on the head of executor (unless
author of extension does not do something strange).


> All of the above I can let pass in this release, but in the longer term
> we need to look for more structure around these ideas so we can manage and
> control what happens. The way this is now is quite raw - suitable for R&D
> but not for longer term production usage by a wider audience, IMHO. I
> wouldn't like to make commitments about the longevity of this API either;
> if we accept it, it should have a big "may change radically" sign hanging
> on it. Having said that, I am interested in progress here and I accept that
> things will look like this at this stage of the ideas process, so these
> are not things to cause delay.
>
> Some things I would like to see change on in this release are...
>
> * It's not clear to me when we load/create the alternate data structures.
> That can't happen at _init time. I was expecting this to look like an
> infrastructure for unlogged indexes, but it doesn't look like that either.
>
I expected *_preload_libraries GUCs to load extensions.
If we have catalog support, extension shall be loaded prior to the first
invocation when optimizer asks the registered one capability of alternative
scan. I love the idea.

> * The name Custom makes me nervous. It sounds too generic, as if the design
> or objectives for this is are a little unclear. AlternateScan sounds like
> a better name since its clearer that we are scanning an alternate data
> structure rather than the main heap.
>
I don't have special preference on its name.

> * The prune hook makes me feel very uneasy. It seems weirdly specific
> implementation detail, made stranger by the otherwise lack of data
> maintenance API calls. Calling that for every dirty page sounds like an
> issue and my patch rejection indicator is flashing red around that.
>
All I want to do is cache-invalidation on the timing when vacuum is
running, but the proposed prune hook might not be an only answer.
In case when extension manages its cache data structure, which way
can we have to invalidate it? I never stick on existing proposition.


> Two additional use cases I will be looking to explore will be
>
> * Code to make Mat Views recognised as alternate scan targets
> * Code to allow queries access to sampled data rather the fully detailed
> data, if the result would be within acceptable tolerance for user
>
Let me investigate how to implement it. Probably, the idea around
materialized-view is more simple to do.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

15 April 2014, 01:07:15

> Simon Riggs <simon@2ndQuadrant.com> writes:
> > [ assorted comments about custom-scan patch, but particularly ]
>
> > * The prune hook makes me feel very uneasy. It seems weirdly specific
> > implementation detail, made stranger by the otherwise lack of data
> > maintenance API calls. Calling that for every dirty page sounds like
> > an issue and my patch rejection indicator is flashing red around that.
>
> Yeah.  After a fast review of the custom-scan and cache-scan patches, it
> seems to me that my original fears are largely confirmed: the custom scan
> patch is not going to be sufficient to allow development of any truly new
> plan type.  Yeah, you can plug in some new execution node types, but actually
> doing anything interesting is going to require patching other parts of the
> system.  Are we going to say to all comers, "sure, we'll put a hook call
> anywhere you like, just ask"?  I can't see this as being the way to go.
>
Here is two different points to be discussed; one is generic to the custom-
plan API, and other is specific to my cache-only scan implementation.

Because existing plan/exec nodes are all built-in and some functional stuffs
are consolidated to a particular source file (like createplan.c, setrefs.c),
so it does not make problems if commonly called functions are declared as
static functions.
Custom-plan API changes this assumption, in other words, it allows to have
some portion of jobs in createplan.c or setrefs.c externally, so it needs
to have the commonly used functions being external.
Because I had try & error during development, I could not list up all the
functions to be public at once. However, it is not a fundamental matter,
should be solved during the discussion on pgsql-hackers.

Regarding to the specific portion in the cache-only scan, it may happen
if we want to create an extension that tracks vacuuming, independent from
custom-scan.
Usually, extension utilizes multiple hooks and interfaces to implement
the feature they want to do. In case of cache-only scan, unfortunately,
PG lacks a way to track heap vacuuming even though it needed to invalidate
cached data. It is unrelated issue from the custom-scan API. We may see
same problem if I tried to create an extension to count number of records
being vacuumed.

> Another way of describing the problem is that it's not clear where the API
> boundaries are for potential users of a custom-scan feature.  (Simon said
> several things that are closely related to this point.)  One thing I don't
> like at all about the patch is its willingness to turn anything whatsoever
> into a publicly exported function, which basically says that the design
> attitude is there *are* no boundaries.  But that's not going to lead to
> anything maintainable.  We're certainly not going to want to guarantee that
> these suddenly-exported functions will all now have stable APIs
> forevermore.
>
I'd like to have *several* existing static functions as (almost) stable
APIs, but not all. Indeed, my patch randomly might pick up static functions
to redefine as external functions, however, it does not mean custom-plan
eventually requires all the functions being external.
According to my investigation, here is two types of functions to be exposed.
- A function that walks on plan/exec node tree recursively (Eg: create_plan_recurse)
- A function that adjusts internal state of the core backend (Eg: fix_expr_common)

At least, these functions are not majority. I don't think it should be
a strong blocker of this new feature.
(I may have oversights of course, please point out.)

> Overall I concur with Simon's conclusion that this might be of interest
> for R&D purposes, but it's hard to see anyone wanting to support a production
> feature built on this.  It would be only marginally less painful than
> supporting a patch that just adds the equivalent code to the backend in
> the traditional way.
>
As we adjusted FDW APIs through the first several releases, in general,
any kind of interfaces takes time to stabilize. Even though it *initially*
sticks on R&D purpose (I don't deny), it shall be brushed up to production
stage. I think a feature for R&D purpose is a good start-point.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

15 April 2014, 14:22:25

On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Simon Riggs <simon@2ndQuadrant.com> writes:
>> [ assorted comments about custom-scan patch, but particularly ]
>
>> * The prune hook makes me feel very uneasy. It seems weirdly specific
>> implementation detail, made stranger by the otherwise lack of data
>> maintenance API calls. Calling that for every dirty page sounds like
>> an issue and my patch rejection indicator is flashing red around that.
>
> Yeah.  After a fast review of the custom-scan and cache-scan patches, it
> seems to me that my original fears are largely confirmed: the custom scan
> patch is not going to be sufficient to allow development of any truly new
> plan type.  Yeah, you can plug in some new execution node types, but
> actually doing anything interesting is going to require patching other
> parts of the system.  Are we going to say to all comers, "sure, we'll put
> a hook call anywhere you like, just ask"?  I can't see this as being the
> way to go.

Without prejudice to the rest of what you said, this argument doesn't
hold much water with me.  I mean, anything that our extensibility
mechanism doesn't support today will require new hooks, but does that
mean we're never going to add any more hooks?  I sure hope not.  When
hooks are proposed here, we evaluate on them on their merits and
attempt to judge the likelihood that a hook in a particular place will
be useful, but generally we're not averse to adding them, and as long
as the paths aren't too performance-critical, I don't think we should
be averse to adding them.

We have a great system today for letting people add new data types and
things of that sort, but anything that penetrates more deeply into the
heart of the system pretty much can't be done; this is why various
companies, such as our respective employers, have developed and
maintained forks of the PostgreSQL code base instead of just hooking
in to the existing code.  We probably can't solve that problem
completely, but that doesn't mean we should throw in the towel.

And in particular, I think it's pretty normal that a new facility like
custom scans might create additional demand for new hooks.  If
something was completely impossible before, and the new facility makes
it almost-possible, then why shouldn't someone ask for a hook there?
A prune hook probably has no business in the custom scan patch proper,
but whether it's a good idea or a bad one should be decided on the
merits.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Tom Lane

Date:

15 April 2014, 14:45:48

Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Yeah.  After a fast review of the custom-scan and cache-scan patches, it
>> seems to me that my original fears are largely confirmed: the custom scan
>> patch is not going to be sufficient to allow development of any truly new
>> plan type.  Yeah, you can plug in some new execution node types, but
>> actually doing anything interesting is going to require patching other
>> parts of the system.

> Without prejudice to the rest of what you said, this argument doesn't
> hold much water with me.  I mean, anything that our extensibility
> mechanism doesn't support today will require new hooks, but does that
> mean we're never going to add any more hooks?  I sure hope not.

No, that's not what I said.  ISTM that the argument for the custom-scan
API is that it allows interesting new things to be done *without further
modifying the core code*.  But the example application (cache_scan) fails
to demonstrate that, and indeed seems to be a counterexample.  Whether
we'd accept cache_scan on its own merits is a separate question.  The
problem for me is that custom-scan isn't showing that it can support what
was claimed without doing serious damage to modularity and maintainability
of the core code.

What this may mean is that we need more attention to refactoring of the
core code.  But just removing "static" from any function that looks like
it might be handy isn't my idea of well-considered refactoring.  More the
opposite in fact: if those things turn into APIs that we have to support,
it's going to kill any ability to do such refactoring.

A concrete example here is setrefs.c, whose responsibilities tend to
change from release to release.  I think if we committed custom-scan
as is, we'd have great difficulty changing setrefs.c's transformations
ever again, at least if we hoped to not break users of the custom-scan
API.  I'm not sure what the solution is --- but turning setrefs into
a white box instead of a black box isn't it.
        regards, tom lane

Re: Custom Scan APIs (Re: Custom Plan node)

From

Stephen Frost

Date:

15 April 2014, 14:54:05

* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> A concrete example here is setrefs.c, whose responsibilities tend to
> change from release to release.  I think if we committed custom-scan
> as is, we'd have great difficulty changing setrefs.c's transformations
> ever again, at least if we hoped to not break users of the custom-scan
> API.  I'm not sure what the solution is --- but turning setrefs into
> a white box instead of a black box isn't it.

Yeah, this was my (general) complaint as well and the answer that I kept
getting back is "well, it's ok, you can still break it between major
releases and the custom scan users will just have to deal with it".

I'm a bit on the fence about that, itself, but the other half of that
coin is that we could end up with parts of the *core* code that think
it's ok to go pulling in these functions, once they're exposed, and that
could end up making things quite ugly and difficult to maintain going
forward.
Thanks,
    Stephen

Re: Custom Scan APIs (Re: Custom Plan node)

From

Robert Haas

Date:

15 April 2014, 15:07:20

On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Yeah.  After a fast review of the custom-scan and cache-scan patches, it
>>> seems to me that my original fears are largely confirmed: the custom scan
>>> patch is not going to be sufficient to allow development of any truly new
>>> plan type.  Yeah, you can plug in some new execution node types, but
>>> actually doing anything interesting is going to require patching other
>>> parts of the system.
>
>> Without prejudice to the rest of what you said, this argument doesn't
>> hold much water with me.  I mean, anything that our extensibility
>> mechanism doesn't support today will require new hooks, but does that
>> mean we're never going to add any more hooks?  I sure hope not.
>
> No, that's not what I said.  ISTM that the argument for the custom-scan
> API is that it allows interesting new things to be done *without further
> modifying the core code*.  But the example application (cache_scan) fails
> to demonstrate that, and indeed seems to be a counterexample.  Whether
> we'd accept cache_scan on its own merits is a separate question.  The
> problem for me is that custom-scan isn't showing that it can support what
> was claimed without doing serious damage to modularity and maintainability
> of the core code.

I think there's two separate things in there, one of which I agree
with and one of which I disagree with.  I agree that we must avoid
damaging the modularity and maintainability of the core code; I don't
agree that custom-scan needs to be able to do interesting things with
zero additional changes to the core code.  If we come up with three
interesting applications for custom scan that require 5 new hooks
between them, I'll consider that a major success - assuming those
hooks don't unduly limit future changes we may wish to make in the
core code.  I think your concern about exposing APIs that may not be
terribly stable is well-founded, but I don't think that means we
shouldn't expose *anything*.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Custom Scan APIs (Re: Custom Plan node)

From

Andres Freund

Date:

15 April 2014, 15:14:40

Hi,

On 2014-04-15 11:07:11 -0400, Robert Haas wrote:
> On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> [ discussion ]

What I think this discussion shows that this patch isn't ready for
9.4. The first iteration of the patch came in 2013-11-06. Imo that's
pretty damn late for a relatively complex patch. And obviously we don't
have agreement on the course forward.
I don't think we need to stop discussing, but I think it's pretty clear
that this isn't 9.4 material. And that it's far from "Ready for Committer".

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: Custom Scan APIs (Re: Custom Plan node)

From

Tom Lane

Date:

15 April 2014, 15:34:19

Andres Freund <andres@2ndquadrant.com> writes:
> What I think this discussion shows that this patch isn't ready for
> 9.4. The first iteration of the patch came in 2013-11-06. Imo that's
> pretty damn late for a relatively complex patch. And obviously we don't
> have agreement on the course forward.
> I don't think we need to stop discussing, but I think it's pretty clear
> that this isn't 9.4 material. And that it's far from "Ready for Committer".

Yeah.  I'm still not exactly convinced that custom-scan will ever allow
independent development of new plan types (which, with all due respect to
Robert, is what it was being sold as last year in Ottawa).  But I'm not
opposed in principle to committing it, if we can find a way to have a
cleaner API for things like setrefs.c.  It seems like late-stage planner
processing in general is an issue for this patch (createplan.c and
subselect.c are also looking messy).  EXPLAIN isn't too great either.

I'm not sure exactly what to do about those cases, but I wonder
whether things would get better if we had the equivalent of
expression_tree_walker/mutator capability for plan nodes.  The state
of affairs in setrefs and subselect, at least, is a bit reminiscent
of the bad old days when we had lots of different bespoke code for 
traversing expression trees.
        regards, tom lane

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

16 April 2014, 04:12:16

> On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Robert Haas <robertmhaas@gmail.com> writes:
> >> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >>> Yeah.  After a fast review of the custom-scan and cache-scan
> >>> patches, it seems to me that my original fears are largely
> >>> confirmed: the custom scan patch is not going to be sufficient to
> >>> allow development of any truly new plan type.  Yeah, you can plug in
> >>> some new execution node types, but actually doing anything
> >>> interesting is going to require patching other parts of the system.
> >
> >> Without prejudice to the rest of what you said, this argument doesn't
> >> hold much water with me.  I mean, anything that our extensibility
> >> mechanism doesn't support today will require new hooks, but does that
> >> mean we're never going to add any more hooks?  I sure hope not.
> >
> > No, that's not what I said.  ISTM that the argument for the
> > custom-scan API is that it allows interesting new things to be done
> > *without further modifying the core code*.  But the example
> > application (cache_scan) fails to demonstrate that, and indeed seems
> > to be a counterexample.  Whether we'd accept cache_scan on its own
> > merits is a separate question.  The problem for me is that custom-scan
> > isn't showing that it can support what was claimed without doing
> > serious damage to modularity and maintainability of the core code.
> 
> I think there's two separate things in there, one of which I agree with
> and one of which I disagree with.  I agree that we must avoid damaging the
> modularity and maintainability of the core code; I don't agree that
> custom-scan needs to be able to do interesting things with zero additional
> changes to the core code.  If we come up with three interesting applications
> for custom scan that require 5 new hooks between them, I'll consider that
> a major success - assuming those hooks don't unduly limit future changes
> we may wish to make in the core code.  I think your concern about exposing
> APIs that may not be terribly stable is well-founded, but I don't think
> that means we shouldn't expose *anything*.
> 
I agree 100%.

We usually change hook definition release-by-release, and it is author's
responsibility to follow the newer interface if he continues to maintain
his extension on the newer release also.
Probably, it is a gray stuff neither black nor white. If we can design
a perfect interface, it might be good but has no evolution further.
Of course, it does not justify poor designed interface, but an important
stuff is to find out a best way at this moment. It may take core
refactoring, not just exposing static functions. What I tried to implement
is the only way to implement it.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kouhei Kaigai

Date:

16 April 2014, 04:12:24

> Andres Freund <andres@2ndquadrant.com> writes:
> > What I think this discussion shows that this patch isn't ready for
> > 9.4. The first iteration of the patch came in 2013-11-06. Imo that's
> > pretty damn late for a relatively complex patch. And obviously we
> > don't have agreement on the course forward.
> > I don't think we need to stop discussing, but I think it's pretty
> > clear that this isn't 9.4 material. And that it's far from "Ready for
> Committer".
>
Yep, today is the expected feature freeze date towards v9.4.
It is little bit late to include v9.4 features, unfortunately.

> Yeah.  I'm still not exactly convinced that custom-scan will ever allow
> independent development of new plan types (which, with all due respect to
> Robert, is what it was being sold as last year in Ottawa).  But I'm not
> opposed in principle to committing it, if we can find a way to have a cleaner
> API for things like setrefs.c.  It seems like late-stage planner processing
> in general is an issue for this patch (createplan.c and subselect.c are
> also looking messy).  EXPLAIN isn't too great either.
>
> I'm not sure exactly what to do about those cases, but I wonder whether
> things would get better if we had the equivalent of
> expression_tree_walker/mutator capability for plan nodes.  The state of
> affairs in setrefs and subselect, at least, is a bit reminiscent of the
> bad old days when we had lots of different bespoke code for traversing
> expression trees.
>
Hmm. If we have something like expression_tree_walker/mutator for plan nodes,
we can pass a walker/mutator function's pointer instead of exposing static
functions that takes recursive jobs.
If custom-plan provider (that has sub-plans) got a callback with walker/
mutator pointer, all it has to do for sub-plans are calling this new
plan-tree walking support routine with supplied walker/mutator.
It seems to me more simple design than what I did.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai@ak.jp.nec.com>

Re: Custom Scan APIs (Re: Custom Plan node)

From

Kohei KaiGai

Date:

29 April 2014, 01:07:42

>> Yeah.  I'm still not exactly convinced that custom-scan will ever allow
>> independent development of new plan types (which, with all due respect to
>> Robert, is what it was being sold as last year in Ottawa).  But I'm not
>> opposed in principle to committing it, if we can find a way to have a cleaner
>> API for things like setrefs.c.  It seems like late-stage planner processing
>> in general is an issue for this patch (createplan.c and subselect.c are
>> also looking messy).  EXPLAIN isn't too great either.
>>
>> I'm not sure exactly what to do about those cases, but I wonder whether
>> things would get better if we had the equivalent of
>> expression_tree_walker/mutator capability for plan nodes.  The state of
>> affairs in setrefs and subselect, at least, is a bit reminiscent of the
>> bad old days when we had lots of different bespoke code for traversing
>> expression trees.
>>
> Hmm. If we have something like expression_tree_walker/mutator for plan nodes,
> we can pass a walker/mutator function's pointer instead of exposing static
> functions that takes recursive jobs.
> If custom-plan provider (that has sub-plans) got a callback with walker/
> mutator pointer, all it has to do for sub-plans are calling this new
> plan-tree walking support routine with supplied walker/mutator.
> It seems to me more simple design than what I did.
>
I tried to code the similar walker/mutator functions on plan-node tree,
however, it was not available to implement these routines enough
simple, because the job of walker/mutator functions are not uniform
thus caller side also must have a large switch-case branches.

I picked up setrefs.c for my investigation.
The set_plan_refs() applies fix_scan_list() on the expression tree being
appeared in the plan node if it is delivered from Scan, however, it also
applies set_join_references() for subclass of Join, or
set_dummy_tlist_references() for some other plan nodes.
It implies that the walker/mutator functions of Plan node has to apply
different operation according to the type of Plan node. I'm not certain
how much different forms are needed.
(In addition, set_plan_refs() performs usually like a walker, but
often performs as a mutator if trivial subquery....)

I'm expecting the function like below. It allows to call plan_walker
function for each plan-node and also allows to call expr_walker
function for each expression-node on the plan node.
   bool   plan_tree_walker(Plan *plan,                    bool (*plan_walker) (),                    bool
(*expr_walker)(),                    void *context)
 

I'd like to see if something other form to implement this routine.


One alternative idea to give custom-plan provider a chance to
handle its subplans is, to give function pointers (1) to handle
recursion of plan-tree and (2) to set up backend's internal
state.
In case of setrefs.c, set_plan_refs() and fix_expr_common()
are minimum necessity for extensions. It also kills necessity
to export static functions.

How about your thought?
-- 
KaiGai Kohei <kaigai@kaigai.gr.jp>