Thread: Custom Scan APIs (Re: Custom Plan node)
The attached patches provide a feature to implement custom scan node that allows extension to replace a part of plan tree with its own code instead of the built-in logic. In addition to the previous proposition, it enables us to integrate custom scan as a part of candidate paths to be chosen by optimizer. Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers a set of API stuff and a simple demonstration module that implement regular table scan using inequality operator on ctid system column. The second one (pgsql-v9.4-custom-scan-remote-join) enhances postgres_fdw to support remote join capability. Below is an example to show how does custom-scan work. We usually run sequential scan even if clause has inequality operator that references ctid system column. postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; QUERY PLAN -------------------------------------------------------- Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43) Filter: (ctid > '(10,0)'::tid) (2 rows) An extension that performs as custom-scan provider suggests an alternative path, and its cost was less than sequential scan, thus optimizer choose it. postgres=# LOAD 'ctidscan'; LOAD postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; QUERY PLAN ---------------------------------------------------------------------- Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43) Filter: (ctid > '(10,0)'::tid) (2 rows) Of course, more cost effective plan will win if exists. postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200; QUERY PLAN ------------------------------------------------------------------- Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43) Index Cond: (a = 200) Filter: (ctid > '(10,0)'::tid) (3 rows) One other worthwhile example is remote-join enhancement on the postgres_fdw as follows. Both of ft1 and ft2 are foreign table being managed by same foreign server. postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x WHERE f_leak(b) AND y like '%aaa%'; QUERY PLAN ------------------------------------------------------------------------------------------------------ Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72) Output: a, b, x, y Filter: f_leak(b) Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text)) (4 rows) --------------------------- How does it works --------------------------- This patch adds two hooks (for base and join relations) around allpaths.c and joinpaths.c. It allows extensions to add alternative paths to handle scanning on the base relation or join of two relations. Its callback routine can add CustomPath using add_path() to inform optimizer this alternative scan path. Every custom-scan provider is identified by its name being registered preliminary using the following function. void register_custom_provider(const CustomProvider *provider); CustomProvider is a set of name string and function pointers of callbacks. Once CustomPath got chosen, create_scan_plan() construct a custom- scan plan and calls back extension to initialize the node. Rest of portions are similar to foreign scan, however, some of detailed portions are different. For example, foreign scan is assumed to return a tuple being formed according to table definition. On the other hand, custom-scan does not have such assumption, so extension needs to set tuple-descriptor on the scan tuple slot of ScanState structure by itself. In case of join, custom-scan performs as like a regular scan but it returns tuples being already joined on underlying relations. The patched postgres_fdw utilizes a hook at joinpaths.c to run remote join. ------------ Issues ------------ I'm not 100% certain whether arguments of add_join_path_hook is reasonable. I guess the first 7 arguments are minimum necessity. The mergeclause_list and semifactors might be useful if someone tries to implement its own mergejoin or semijoin. Also, I'm not good at usage of path parameterization, but the last two arguments are related to. Where is the best code to learn about its usage? +/* Hook for plugins to add custom join path, in addition to default ones */ +typedef void (*add_join_path_hook_type)(PlannerInfo *root, + RelOptInfo *joinrel, + RelOptInfo *outerrel, + RelOptInfo *innerrel, + JoinType jointype, + SpecialJoinInfo *sjinfo, + List *restrictlist, + List *mergeclause_list, + SemiAntiJoinFactors *semifactors, + Relids param_source_rels, + Relids extra_lateral_rels); +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; When we replace a join by a custom scan, where is the best target for Var node that referenced relations under the join? Usually, Var->varno is given as rtindex of tables being joined, then, it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references(). It eventually determines the slot to be fetched on ExecEvalScalarVar(). On the other hand, we want Var-node to reference scan-tuple-slot neither outer-slot nor inner-slot, if we replaced a join. I tried to add a new CUSTOM_VAR that references scan-tuple-slot. Probably, it is a straightforward way to run remote join as like a scan, but I'm not certain whether it is the best way. I was concerned about FDW callback of postgres_fdw is designed to take ForeignState argument. Because of this, remote join code did not available to call these routines, even though most of custom-join portions are similar. So, I'd like to rework postgres_fdw first to put a common routine that can be called from FDW portion and remote join portions. However, I thought it makes reviewing hard due to the large scale of changeset. So, I'd like to have a code reworking first. ---------------- Jobs to do ---------------- * SGML documentation like fdwhandler.sgml is still under construction. * Probably, a wikipage may help people to understand it well. * Postgres_fdw needs reworking to share common code for both of FDW and remote join portions. Thanks, 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>: > 2013/10/3 Robert Haas <robertmhaas@gmail.com>: >> Well, there were a lot of problems with your demonstration, which have >> already been pointed out upthread. I'm skeptical about the idea of >> simply replacing planner nodes wholesale, and Tom is outright opposed. >> I think you'll do better to focus on a narrower case - I'd suggest >> custom scan nodes - and leave the rest as a project for another time. >> > Thanks, it makes me clear what we should target on v9.4 development. > Towards the next commitfest, I'm planning to develop the following > features: > * CustomScan node that can run custom code instead of built-in > scan nodes. > * Join-pushdown of postgres_fdw using the hook to be located on > the add_paths_to_joinrel(), for demonstration purpose. > * Something new way to scan a relation; probably, your suggested > ctid scan with less or bigger qualifier is a good example, also for > demonstration purpose. > > Probably, above set of jobs will be the first chunk of this feature. > Then, let's do other stuff like Append, Sort, Aggregate and so on > later. It seems to me a reasonable strategy. > -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Hi, I tried to write up a wikipage to introduce how custom-scan works. https://wiki.postgresql.org/wiki/CustomScanAPI Any comments please. 2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>: > The attached patches provide a feature to implement custom scan node > that allows extension to replace a part of plan tree with its own code > instead of the built-in logic. > In addition to the previous proposition, it enables us to integrate custom > scan as a part of candidate paths to be chosen by optimizer. > Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers > a set of API stuff and a simple demonstration module that implement > regular table scan using inequality operator on ctid system column. > The second one (pgsql-v9.4-custom-scan-remote-join) enhances > postgres_fdw to support remote join capability. > > Below is an example to show how does custom-scan work. > > We usually run sequential scan even if clause has inequality operator > that references ctid system column. > > postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; > QUERY PLAN > -------------------------------------------------------- > Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43) > Filter: (ctid > '(10,0)'::tid) > (2 rows) > > An extension that performs as custom-scan provider suggests > an alternative path, and its cost was less than sequential scan, > thus optimizer choose it. > > postgres=# LOAD 'ctidscan'; > LOAD > postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; > QUERY PLAN > ---------------------------------------------------------------------- > Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43) > Filter: (ctid > '(10,0)'::tid) > (2 rows) > > Of course, more cost effective plan will win if exists. > > postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200; > QUERY PLAN > ------------------------------------------------------------------- > Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43) > Index Cond: (a = 200) > Filter: (ctid > '(10,0)'::tid) > (3 rows) > > One other worthwhile example is remote-join enhancement on the > postgres_fdw as follows. Both of ft1 and ft2 are foreign table being > managed by same foreign server. > > postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x > WHERE f_leak(b) AND y > like '%aaa%'; > QUERY PLAN > ------------------------------------------------------------------------------------------------------ > Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72) > Output: a, b, x, y > Filter: f_leak(b) > Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN > public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text)) > (4 rows) > > --------------------------- > How does it works > --------------------------- > This patch adds two hooks (for base and join relations) around allpaths.c > and joinpaths.c. It allows extensions to add alternative paths to handle > scanning on the base relation or join of two relations. > > Its callback routine can add CustomPath using add_path() to inform > optimizer this alternative scan path. Every custom-scan provider is > identified by its name being registered preliminary using the following > function. > > void register_custom_provider(const CustomProvider *provider); > > CustomProvider is a set of name string and function pointers of callbacks. > > Once CustomPath got chosen, create_scan_plan() construct a custom- > scan plan and calls back extension to initialize the node. > Rest of portions are similar to foreign scan, however, some of detailed > portions are different. For example, foreign scan is assumed to return > a tuple being formed according to table definition. On the other hand, > custom-scan does not have such assumption, so extension needs to > set tuple-descriptor on the scan tuple slot of ScanState structure by > itself. > > In case of join, custom-scan performs as like a regular scan but it > returns tuples being already joined on underlying relations. > The patched postgres_fdw utilizes a hook at joinpaths.c to run > remote join. > > ------------ > Issues > ------------ > I'm not 100% certain whether arguments of add_join_path_hook is > reasonable. I guess the first 7 arguments are minimum necessity. > The mergeclause_list and semifactors might be useful if someone > tries to implement its own mergejoin or semijoin. Also, I'm not > good at usage of path parameterization, but the last two arguments > are related to. Where is the best code to learn about its usage? > > +/* Hook for plugins to add custom join path, in addition to default ones */ > +typedef void (*add_join_path_hook_type)(PlannerInfo *root, > + RelOptInfo *joinrel, > + RelOptInfo *outerrel, > + RelOptInfo *innerrel, > + JoinType jointype, > + SpecialJoinInfo *sjinfo, > + List *restrictlist, > + List *mergeclause_list, > + SemiAntiJoinFactors *semifactors, > + Relids param_source_rels, > + Relids extra_lateral_rels); > +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; > > > When we replace a join by a custom scan, where is the best target > for Var node that referenced relations under the join? > Usually, Var->varno is given as rtindex of tables being joined, then, > it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references(). > It eventually determines the slot to be fetched on ExecEvalScalarVar(). > On the other hand, we want Var-node to reference scan-tuple-slot > neither outer-slot nor inner-slot, if we replaced a join. > I tried to add a new CUSTOM_VAR that references scan-tuple-slot. > Probably, it is a straightforward way to run remote join as like a scan, > but I'm not certain whether it is the best way. > > > I was concerned about FDW callback of postgres_fdw is designed to > take ForeignState argument. Because of this, remote join code did > not available to call these routines, even though most of custom-join > portions are similar. > So, I'd like to rework postgres_fdw first to put a common routine that > can be called from FDW portion and remote join portions. > However, I thought it makes reviewing hard due to the large scale of > changeset. So, I'd like to have a code reworking first. > > > ---------------- > Jobs to do > ---------------- > * SGML documentation like fdwhandler.sgml is still under construction. > * Probably, a wikipage may help people to understand it well. > * Postgres_fdw needs reworking to share common code for both of > FDW and remote join portions. > > Thanks, > > 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> 2013/10/3 Robert Haas <robertmhaas@gmail.com>: >>> Well, there were a lot of problems with your demonstration, which have >>> already been pointed out upthread. I'm skeptical about the idea of >>> simply replacing planner nodes wholesale, and Tom is outright opposed. >>> I think you'll do better to focus on a narrower case - I'd suggest >>> custom scan nodes - and leave the rest as a project for another time. >>> >> Thanks, it makes me clear what we should target on v9.4 development. >> Towards the next commitfest, I'm planning to develop the following >> features: >> * CustomScan node that can run custom code instead of built-in >> scan nodes. >> * Join-pushdown of postgres_fdw using the hook to be located on >> the add_paths_to_joinrel(), for demonstration purpose. >> * Something new way to scan a relation; probably, your suggested >> ctid scan with less or bigger qualifier is a good example, also for >> demonstration purpose. >> >> Probably, above set of jobs will be the first chunk of this feature. >> Then, let's do other stuff like Append, Sort, Aggregate and so on >> later. It seems to me a reasonable strategy. >> > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
The attached patches are the revised custom-scan APIs. - Custom-scan.sgml was added to introduce the way to write custom-scan provider in the official documentation. - Much code duplication in postgres_fdw.c was eliminated. I split some fdw- handlers into two parts; common portion and fdw specific one. Executor callbacks of custom-scan code utilizes the common portion above because most of its implementations are equivalent. I'd like to see comments regarding to the way to handle Var reference onto a custom-scan that replaced relations join. A varno of Var that references a join relation is rtindex of either right or left relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be set instead. However, it does not work well if a custom-scan that just references result of remote join query was chosen instead of local join, because its result shall be usually set in the ps_ResultTupleSlot of PlanState, thus ExecEvalScalarVar does not reference neither inner nor outer slot. Instead of existing solution, I added one more special varno; CUSTOM_VARNO that just references result-tuple-slot of the target relation. If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from the TupleDesc of underlying ps_ResultTupleSlot. I'm not 100% certain whether it is the best approach for us, but it works well. Also, I'm uncertain for usage of param_info in Path structure, even though I followed the manner in other portion. So, please point out if my usage was not applicable well. Thanks, 2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>: > Hi, > > I tried to write up a wikipage to introduce how custom-scan works. > > https://wiki.postgresql.org/wiki/CustomScanAPI > > Any comments please. > > 2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> The attached patches provide a feature to implement custom scan node >> that allows extension to replace a part of plan tree with its own code >> instead of the built-in logic. >> In addition to the previous proposition, it enables us to integrate custom >> scan as a part of candidate paths to be chosen by optimizer. >> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers >> a set of API stuff and a simple demonstration module that implement >> regular table scan using inequality operator on ctid system column. >> The second one (pgsql-v9.4-custom-scan-remote-join) enhances >> postgres_fdw to support remote join capability. >> >> Below is an example to show how does custom-scan work. >> >> We usually run sequential scan even if clause has inequality operator >> that references ctid system column. >> >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; >> QUERY PLAN >> -------------------------------------------------------- >> Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43) >> Filter: (ctid > '(10,0)'::tid) >> (2 rows) >> >> An extension that performs as custom-scan provider suggests >> an alternative path, and its cost was less than sequential scan, >> thus optimizer choose it. >> >> postgres=# LOAD 'ctidscan'; >> LOAD >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid; >> QUERY PLAN >> ---------------------------------------------------------------------- >> Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43) >> Filter: (ctid > '(10,0)'::tid) >> (2 rows) >> >> Of course, more cost effective plan will win if exists. >> >> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200; >> QUERY PLAN >> ------------------------------------------------------------------- >> Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43) >> Index Cond: (a = 200) >> Filter: (ctid > '(10,0)'::tid) >> (3 rows) >> >> One other worthwhile example is remote-join enhancement on the >> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being >> managed by same foreign server. >> >> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x >> WHERE f_leak(b) AND y >> like '%aaa%'; >> QUERY PLAN >> ------------------------------------------------------------------------------------------------------ >> Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72) >> Output: a, b, x, y >> Filter: f_leak(b) >> Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN >> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text)) >> (4 rows) >> >> --------------------------- >> How does it works >> --------------------------- >> This patch adds two hooks (for base and join relations) around allpaths.c >> and joinpaths.c. It allows extensions to add alternative paths to handle >> scanning on the base relation or join of two relations. >> >> Its callback routine can add CustomPath using add_path() to inform >> optimizer this alternative scan path. Every custom-scan provider is >> identified by its name being registered preliminary using the following >> function. >> >> void register_custom_provider(const CustomProvider *provider); >> >> CustomProvider is a set of name string and function pointers of callbacks. >> >> Once CustomPath got chosen, create_scan_plan() construct a custom- >> scan plan and calls back extension to initialize the node. >> Rest of portions are similar to foreign scan, however, some of detailed >> portions are different. For example, foreign scan is assumed to return >> a tuple being formed according to table definition. On the other hand, >> custom-scan does not have such assumption, so extension needs to >> set tuple-descriptor on the scan tuple slot of ScanState structure by >> itself. >> >> In case of join, custom-scan performs as like a regular scan but it >> returns tuples being already joined on underlying relations. >> The patched postgres_fdw utilizes a hook at joinpaths.c to run >> remote join. >> >> ------------ >> Issues >> ------------ >> I'm not 100% certain whether arguments of add_join_path_hook is >> reasonable. I guess the first 7 arguments are minimum necessity. >> The mergeclause_list and semifactors might be useful if someone >> tries to implement its own mergejoin or semijoin. Also, I'm not >> good at usage of path parameterization, but the last two arguments >> are related to. Where is the best code to learn about its usage? >> >> +/* Hook for plugins to add custom join path, in addition to default ones */ >> +typedef void (*add_join_path_hook_type)(PlannerInfo *root, >> + RelOptInfo *joinrel, >> + RelOptInfo *outerrel, >> + RelOptInfo *innerrel, >> + JoinType jointype, >> + SpecialJoinInfo *sjinfo, >> + List *restrictlist, >> + List *mergeclause_list, >> + SemiAntiJoinFactors *semifactors, >> + Relids param_source_rels, >> + Relids extra_lateral_rels); >> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; >> >> >> When we replace a join by a custom scan, where is the best target >> for Var node that referenced relations under the join? >> Usually, Var->varno is given as rtindex of tables being joined, then, >> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references(). >> It eventually determines the slot to be fetched on ExecEvalScalarVar(). >> On the other hand, we want Var-node to reference scan-tuple-slot >> neither outer-slot nor inner-slot, if we replaced a join. >> I tried to add a new CUSTOM_VAR that references scan-tuple-slot. >> Probably, it is a straightforward way to run remote join as like a scan, >> but I'm not certain whether it is the best way. >> >> >> I was concerned about FDW callback of postgres_fdw is designed to >> take ForeignState argument. Because of this, remote join code did >> not available to call these routines, even though most of custom-join >> portions are similar. >> So, I'd like to rework postgres_fdw first to put a common routine that >> can be called from FDW portion and remote join portions. >> However, I thought it makes reviewing hard due to the large scale of >> changeset. So, I'd like to have a code reworking first. >> >> >> ---------------- >> Jobs to do >> ---------------- >> * SGML documentation like fdwhandler.sgml is still under construction. >> * Probably, a wikipage may help people to understand it well. >> * Postgres_fdw needs reworking to share common code for both of >> FDW and remote join portions. >> >> Thanks, >> >> 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>: >>> 2013/10/3 Robert Haas <robertmhaas@gmail.com>: >>>> Well, there were a lot of problems with your demonstration, which have >>>> already been pointed out upthread. I'm skeptical about the idea of >>>> simply replacing planner nodes wholesale, and Tom is outright opposed. >>>> I think you'll do better to focus on a narrower case - I'd suggest >>>> custom scan nodes - and leave the rest as a project for another time. >>>> >>> Thanks, it makes me clear what we should target on v9.4 development. >>> Towards the next commitfest, I'm planning to develop the following >>> features: >>> * CustomScan node that can run custom code instead of built-in >>> scan nodes. >>> * Join-pushdown of postgres_fdw using the hook to be located on >>> the add_paths_to_joinrel(), for demonstration purpose. >>> * Something new way to scan a relation; probably, your suggested >>> ctid scan with less or bigger qualifier is a good example, also for >>> demonstration purpose. >>> >>> Probably, above set of jobs will be the first chunk of this feature. >>> Then, let's do other stuff like Append, Sort, Aggregate and so on >>> later. It seems to me a reasonable strategy. >>> >> >> -- >> KaiGai Kohei <kaigai@kaigai.gr.jp> > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
On Mon, Nov 18, 2013 at 7:25 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
The attached patches are the revised custom-scan APIs.
My initial review on this feature:
- The patches apply and build, but it produces a warning:
ctidscan.c: In function ‘CTidInitCustomScanPlan’:
ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]
I'd recommend that you split the part1 patch containing the ctidscan contrib into its own patch. It is more than half of the patch and its certainly stands on its own. IMO, I think ctidscan fits a very specific use case and would be better off being an extension instead of in contrib.
- Custom-scan.sgml was added to introduce the way to write custom-scan
provider in the official documentation.
- Much code duplication in postgres_fdw.c was eliminated. I split some fdw-
handlers into two parts; common portion and fdw specific one.
Executor callbacks of custom-scan code utilizes the common portion above
because most of its implementations are equivalent.
I'd like to see comments regarding to the way to handle Var reference onto
a custom-scan that replaced relations join.
A varno of Var that references a join relation is rtindex of either
right or left
relation, then setrefs.c adjust it well; INNER_VAR or OUTER_VAR shall be
set instead.
However, it does not work well if a custom-scan that just references result
of remote join query was chosen instead of local join, because its result
shall be usually set in the ps_ResultTupleSlot of PlanState, thus
ExecEvalScalarVar does not reference neither inner nor outer slot.
Instead of existing solution, I added one more special varno; CUSTOM_VARNO
that just references result-tuple-slot of the target relation.
If CUSTOM_VARNO is given, EXPLAIN(verbose) generates column name from
the TupleDesc of underlying ps_ResultTupleSlot.
I'm not 100% certain whether it is the best approach for us, but it works well.
Also, I'm uncertain for usage of param_info in Path structure, even though
I followed the manner in other portion. So, please point out if my usage
was not applicable well.
Thanks,
2013/11/11 Kohei KaiGai <kaigai@kaigai.gr.jp>:KaiGai Kohei <kaigai@kaigai.gr.jp>> Hi,
>
> I tried to write up a wikipage to introduce how custom-scan works.
>
> https://wiki.postgresql.org/wiki/CustomScanAPI
>
> Any comments please.
>
> 2013/11/6 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>> The attached patches provide a feature to implement custom scan node
>> that allows extension to replace a part of plan tree with its own code
>> instead of the built-in logic.
>> In addition to the previous proposition, it enables us to integrate custom
>> scan as a part of candidate paths to be chosen by optimizer.
>> Here is two patches. The first one (pgsql-v9.4-custom-scan-apis) offers
>> a set of API stuff and a simple demonstration module that implement
>> regular table scan using inequality operator on ctid system column.
>> The second one (pgsql-v9.4-custom-scan-remote-join) enhances
>> postgres_fdw to support remote join capability.
>>
>> Below is an example to show how does custom-scan work.
>>
>> We usually run sequential scan even if clause has inequality operator
>> that references ctid system column.
>>
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>> QUERY PLAN
>> --------------------------------------------------------
>> Seq Scan on t1 (cost=0.00..209.00 rows=3333 width=43)
>> Filter: (ctid > '(10,0)'::tid)
>> (2 rows)
>>
>> An extension that performs as custom-scan provider suggests
>> an alternative path, and its cost was less than sequential scan,
>> thus optimizer choose it.
>>
>> postgres=# LOAD 'ctidscan';
>> LOAD
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid;
>> QUERY PLAN
>> ----------------------------------------------------------------------
>> Custom Scan (ctidscan) on t1 (cost=0.00..100.00 rows=3333 width=43)
>> Filter: (ctid > '(10,0)'::tid)
>> (2 rows)
>>
>> Of course, more cost effective plan will win if exists.
>>
>> postgres=# EXPLAIN SELECT ctid,* FROM t1 WHERE ctid > '(10,0)'::tid AND a = 200;
>> QUERY PLAN
>> -------------------------------------------------------------------
>> Index Scan using t1_pkey on t1 (cost=0.29..8.30 rows=1 width=43)
>> Index Cond: (a = 200)
>> Filter: (ctid > '(10,0)'::tid)
>> (3 rows)
>>
>> One other worthwhile example is remote-join enhancement on the
>> postgres_fdw as follows. Both of ft1 and ft2 are foreign table being
>> managed by same foreign server.
>>
>> postgres=# EXPLAIN (verbose) SELECT * FROM ft1 JOIN ft2 ON a = x
>> WHERE f_leak(b) AND y
>> like '%aaa%';
>> QUERY PLAN
>> ------------------------------------------------------------------------------------------------------
>> Custom Scan (postgres-fdw) (cost=100.00..100.01 rows=0 width=72)
>> Output: a, b, x, y
>> Filter: f_leak(b)
>> Remote SQL: SELECT r1.a, r1.b, r2.x, r2.y FROM (public.ft1 r1 JOIN
>> public.ft2 r2 ON ((r1.a = r2.x))) WHERE ((r2.y ~~ '%aaa%'::text))
>> (4 rows)
>>
>> ---------------------------
>> How does it works
>> ---------------------------
>> This patch adds two hooks (for base and join relations) around allpaths.c
>> and joinpaths.c. It allows extensions to add alternative paths to handle
>> scanning on the base relation or join of two relations.
>>
>> Its callback routine can add CustomPath using add_path() to inform
>> optimizer this alternative scan path. Every custom-scan provider is
>> identified by its name being registered preliminary using the following
>> function.
>>
>> void register_custom_provider(const CustomProvider *provider);
>>
>> CustomProvider is a set of name string and function pointers of callbacks.
>>
>> Once CustomPath got chosen, create_scan_plan() construct a custom-
>> scan plan and calls back extension to initialize the node.
>> Rest of portions are similar to foreign scan, however, some of detailed
>> portions are different. For example, foreign scan is assumed to return
>> a tuple being formed according to table definition. On the other hand,
>> custom-scan does not have such assumption, so extension needs to
>> set tuple-descriptor on the scan tuple slot of ScanState structure by
>> itself.
>>
>> In case of join, custom-scan performs as like a regular scan but it
>> returns tuples being already joined on underlying relations.
>> The patched postgres_fdw utilizes a hook at joinpaths.c to run
>> remote join.
>>
>> ------------
>> Issues
>> ------------
>> I'm not 100% certain whether arguments of add_join_path_hook is
>> reasonable. I guess the first 7 arguments are minimum necessity.
>> The mergeclause_list and semifactors might be useful if someone
>> tries to implement its own mergejoin or semijoin. Also, I'm not
>> good at usage of path parameterization, but the last two arguments
>> are related to. Where is the best code to learn about its usage?
>>
>> +/* Hook for plugins to add custom join path, in addition to default ones */
>> +typedef void (*add_join_path_hook_type)(PlannerInfo *root,
>> + RelOptInfo *joinrel,
>> + RelOptInfo *outerrel,
>> + RelOptInfo *innerrel,
>> + JoinType jointype,
>> + SpecialJoinInfo *sjinfo,
>> + List *restrictlist,
>> + List *mergeclause_list,
>> + SemiAntiJoinFactors *semifactors,
>> + Relids param_source_rels,
>> + Relids extra_lateral_rels);
>> +extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>>
>>
>> When we replace a join by a custom scan, where is the best target
>> for Var node that referenced relations under the join?
>> Usually, Var->varno is given as rtindex of tables being joined, then,
>> it shall be replaced to OUTER_VAR or INNER_VAR at set_join_references().
>> It eventually determines the slot to be fetched on ExecEvalScalarVar().
>> On the other hand, we want Var-node to reference scan-tuple-slot
>> neither outer-slot nor inner-slot, if we replaced a join.
>> I tried to add a new CUSTOM_VAR that references scan-tuple-slot.
>> Probably, it is a straightforward way to run remote join as like a scan,
>> but I'm not certain whether it is the best way.
>>
>>
>> I was concerned about FDW callback of postgres_fdw is designed to
>> take ForeignState argument. Because of this, remote join code did
>> not available to call these routines, even though most of custom-join
>> portions are similar.
>> So, I'd like to rework postgres_fdw first to put a common routine that
>> can be called from FDW portion and remote join portions.
>> However, I thought it makes reviewing hard due to the large scale of
>> changeset. So, I'd like to have a code reworking first.
>>
>>
>> ----------------
>> Jobs to do
>> ----------------
>> * SGML documentation like fdwhandler.sgml is still under construction.
>> * Probably, a wikipage may help people to understand it well.
>> * Postgres_fdw needs reworking to share common code for both of
>> FDW and remote join portions.
>>
>> Thanks,
>>
>> 2013/10/5 Kohei KaiGai <kaigai@kaigai.gr.jp>:
>>> 2013/10/3 Robert Haas <robertmhaas@gmail.com>:
>>>> Well, there were a lot of problems with your demonstration, which have
>>>> already been pointed out upthread. I'm skeptical about the idea of
>>>> simply replacing planner nodes wholesale, and Tom is outright opposed.
>>>> I think you'll do better to focus on a narrower case - I'd suggest
>>>> custom scan nodes - and leave the rest as a project for another time.
>>>>
>>> Thanks, it makes me clear what we should target on v9.4 development.
>>> Towards the next commitfest, I'm planning to develop the following
>>> features:
>>> * CustomScan node that can run custom code instead of built-in
>>> scan nodes.
>>> * Join-pushdown of postgres_fdw using the hook to be located on
>>> the add_paths_to_joinrel(), for demonstration purpose.
>>> * Something new way to scan a relation; probably, your suggested
>>> ctid scan with less or bigger qualifier is a good example, also for
>>> demonstration purpose.
>>>
>>> Probably, above set of jobs will be the first chunk of this feature.
>>> Then, let's do other stuff like Append, Sort, Aggregate and so on
>>> later. It seems to me a reasonable strategy.
>>>
>>
>> --
>> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>
>
> --
> KaiGai Kohei <kaigai@kaigai.gr.jp>
--
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
Thanks for your review. 2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>: > My initial review on this feature: > - The patches apply and build, but it produces a warning: > ctidscan.c: In function ‘CTidInitCustomScanPlan’: > ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable] > This variable was only used in Assert() macro, so it causes a warning if you don't put --enable-cassert on the configure script. Anyway, I adjusted the code to check relid of RelOptInfo directly. > I'd recommend that you split the part1 patch containing the ctidscan contrib > into its own patch. It is more than half of the patch and its certainly > stands on its own. IMO, I think ctidscan fits a very specific use case and > would be better off being an extension instead of in contrib. > OK, I split them off. The part-1 is custom-scan API itself, the part-2 is ctidscan portion, and the part-3 is remote join on postgres_fdw. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
KaiGai
On Tue, Nov 19, 2013 at 9:41 AM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
Thanks for your review.
2013/11/19 Jim Mlodgenski <jimmy76@gmail.com>:> My initial review on this feature:This variable was only used in Assert() macro, so it causes a warning if you
> - The patches apply and build, but it produces a warning:
> ctidscan.c: In function ‘CTidInitCustomScanPlan’:
> ctidscan.c:362:9: warning: unused variable ‘scan_relid’ [-Wunused-variable]
>
don't put --enable-cassert on the configure script.
Anyway, I adjusted the code to check relid of RelOptInfo directly.
The warning is now gone.
> I'd recommend that you split the part1 patch containing the ctidscan contribOK, I split them off. The part-1 is custom-scan API itself, the part-2 is
> into its own patch. It is more than half of the patch and its certainly
> stands on its own. IMO, I think ctidscan fits a very specific use case and
> would be better off being an extension instead of in contrib.
>
ctidscan portion, and the part-3 is remote join on postgres_fdw.
Attached is a patch for the documentation. I think the documentation still needs a little more work, but it is pretty close. I can add some more detail to it once finish adapting the hadoop_fdw to using the custom scan api and have a better understanding of all of the calls.
Thanks,
--
KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
contrib/ctidscan/ctidscan.c:44: indent with spaces. contrib/ctidscan/ctidscan.c:250: indent with spaces. contrib/ctidscan/ctidscan.c:266: trailing whitespace. contrib/postgres_fdw/deparse.c:1044: indent with spaces. contrib/postgres_fdw/postgres_fdw.c:940: indent with spaces. src/backend/commands/explain.c:2097: indent with spaces. src/backend/optimizer/plan/createplan.c:2097: trailing whitespace.
2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>: > OK, I split them off. The part-1 is custom-scan API itself, the part-2 is > ctidscan portion, and the part-3 is remote join on postgres_fdw. These three patches can be applied with no conflict onto 2013-11-27 HEAD, but some fixes are necessary to build because commit 784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21) allows FunctionScan node to have multiple function expression, so Node * funcexpr in CustomScan should be List *funcitons now. I'll continue to review by applying patches onto 2013-11-19 HEAD. -- Shigeru HANADA
Thanks for the series of checks. The attached ones are the revised patches. I merged all the propositions from Jim. Thanks, it made the documentation quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab. An actual code changes are to follow the changes in FunctionScan when CustomScan replaces a FunctionScan. It puts a List * object instead of a single expression tree, to have multiple functions. Nothing were changed from the previous version. Best regards, 2013/11/27 Shigeru Hanada <shigeru.hanada@gmail.com>: > 2013/11/19 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> OK, I split them off. The part-1 is custom-scan API itself, the part-2 is >> ctidscan portion, and the part-3 is remote join on postgres_fdw. > > These three patches can be applied with no conflict onto 2013-11-27 > HEAD, but some fixes are necessary to build because commit > 784e762e886e6f72f548da86a27cd2ead87dbd1c (committed on 2013-11-21) > allows FunctionScan node to have multiple function expression, so Node > * funcexpr in CustomScan should be List *funcitons now. > > I'll continue to review by applying patches onto 2013-11-19 HEAD. > > -- > Shigeru HANADA -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Hi KaiGai-san, 2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>: > The attached ones are the revised patches. > > I merged all the propositions from Jim. Thanks, it made the documentation > quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab. > > An actual code changes are to follow the changes in FunctionScan when > CustomScan replaces a FunctionScan. It puts a List * object instead of > a single expression tree, to have multiple functions. > > Nothing were changed from the previous version. I first reviewed postgres_fdw portion of the patches to learn the outline of Custom Plan. Wiki page is also a good textbook of the feature. I have some random comments about the basic design of Custom Plan: (1) IIUC add_join_path and add_scan_path are added to allow extensions to plug their code into planner. (2) FDW framework has executor callbacks based on existing executor nodes. Is there any plan to integrate them into one way, or wrap on by another? I'm not sure that we should have two similar framework side by side. # I'm sorry if I've missed the past discussion about this issue. (3) Internal routines such as is_self_managed_relation and has_wholerow_reference seem to be useful for other FDWs. Is it able to move them into core? (4) postgres_fdw estimates costs of join by calculating local numbers.How about to support remote estimation by throwingEXPLALAIN query when use_remote_estimates = true. -- Shigeru HANADA
2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>: > I merged all the propositions from Jim. Thanks, it made the documentation > quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab. I found some typos in documents and comments. Please see attached patch for detail. -- Shigeru HANADA
Attachment
Hanada-san, Thanks for your reviewing, 2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>: > I first reviewed postgres_fdw portion of the patches to learn the > outline of Custom Plan. Wiki page is also a good textbook of the > feature. I have some random comments about the basic design of Custom > Plan: > > (1) IIUC add_join_path and add_scan_path are added to allow extensions > to plug their code into planner. > Almost yes. For more correctness, these hooks allows extensions to plug paths they can provide into a particular join or scan. Then planner will choose the cheapest one according to the cost value. > (2) FDW framework has executor callbacks based on existing executor > nodes. Is there any plan to integrate them into one way, or wrap on > by another? I'm not sure that we should have two similar framework > side by side. > # I'm sorry if I've missed the past discussion about this issue. > Probably, FDW has different role from the CustomScan API. As literal, FDW performs as a bridge between a relation form and an opaque external data source, to intermediate two different world on behalf of a foreign table. On the other hand, CustomScan allows to provide alternative logic to scan or join particular relations, in addition to the built-in ones, but does not perform on behalf of foreign tables. Existing FDW is designed to implement a scan on an intangible relation, thus it can assume some things; like a tuple returned from FDW has equivalent TupleDesc as table definition, or it can always use ExecScan() for all the cases. So, I don't think these two frameworks should be consolidated because it makes confusion on the existing extensions that assumes FDW callbacks always has a particular foreign table definition. > (3) Internal routines such as is_self_managed_relation and > has_wholerow_reference seem to be useful for other FDWs. Is it able > to move them into core? > Probably, src/backend/foreign/foreign.c is a good host for them. > (4) postgres_fdw estimates costs of join by calculating local numbers. > How about to support remote estimation by throwing EXPLALAIN query > when use_remote_estimates = true. > I'm uncertain whether the cost value from remote EXPLAIN represents right difficulty on the local side, because it indeed represents the difficulty to join two relations on the remote side, however, does not represents local job; that just fetches tuples from the result set of remote query with table joining. How about your opinion? Is the remote cost estimation value comparable with local value? Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Thanks for fixing many my carelessness. I didn't know "seek" was an irregular verb... Best regards, 2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>: > 2013/11/29 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> I merged all the propositions from Jim. Thanks, it made the documentation >> quality better. Also, I fixed up cosmetic stuff around whitespace <-> tab. > > I found some typos in documents and comments. Please see attached > patch for detail. > > -- > Shigeru HANADA -- KaiGai Kohei <kaigai@kaigai.gr.jp>
The attached patches include documentation fixup by Hanada-san, and relocation of is_managed_relation (the portion to check whether the relation is a foreign table managed by a particular FDW) and has_wholerow_reference. I didn't touch the EXPLAIN logic because I'm uncertain whether the cost of remote join is reasonable towards the cost as an alternative path to local joins. Please check it. Thanks, 2013/12/5 Kohei KaiGai <kaigai@kaigai.gr.jp>: > Hanada-san, > > Thanks for your reviewing, > > 2013/12/4 Shigeru Hanada <shigeru.hanada@gmail.com>: >> I first reviewed postgres_fdw portion of the patches to learn the >> outline of Custom Plan. Wiki page is also a good textbook of the >> feature. I have some random comments about the basic design of Custom >> Plan: >> >> (1) IIUC add_join_path and add_scan_path are added to allow extensions >> to plug their code into planner. >> > Almost yes. For more correctness, these hooks allows extensions to > plug paths they can provide into a particular join or scan. Then planner > will choose the cheapest one according to the cost value. > >> (2) FDW framework has executor callbacks based on existing executor >> nodes. Is there any plan to integrate them into one way, or wrap on >> by another? I'm not sure that we should have two similar framework >> side by side. >> # I'm sorry if I've missed the past discussion about this issue. >> > Probably, FDW has different role from the CustomScan API. > As literal, FDW performs as a bridge between a relation form and > an opaque external data source, to intermediate two different world > on behalf of a foreign table. > On the other hand, CustomScan allows to provide alternative logic > to scan or join particular relations, in addition to the built-in ones, > but does not perform on behalf of foreign tables. > > Existing FDW is designed to implement a scan on an intangible > relation, thus it can assume some things; like a tuple returned > from FDW has equivalent TupleDesc as table definition, or it can > always use ExecScan() for all the cases. > So, I don't think these two frameworks should be consolidated > because it makes confusion on the existing extensions that > assumes FDW callbacks always has a particular foreign table > definition. > >> (3) Internal routines such as is_self_managed_relation and >> has_wholerow_reference seem to be useful for other FDWs. Is it able >> to move them into core? >> > Probably, src/backend/foreign/foreign.c is a good host for them. > >> (4) postgres_fdw estimates costs of join by calculating local numbers. >> How about to support remote estimation by throwing EXPLALAIN query >> when use_remote_estimates = true. >> > I'm uncertain whether the cost value from remote EXPLAIN represents > right difficulty on the local side, because it indeed represents the > difficulty to join two relations on the remote side, however, does not > represents local job; that just fetches tuples from the result set of > remote query with table joining. > How about your opinion? Is the remote cost estimation value comparable > with local value? > > Thanks, > -- > KaiGai Kohei <kaigai@kaigai.gr.jp> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Hi KaiGai-san, 2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>: > The attached patches include documentation fixup by Hanada-san, > and relocation of is_managed_relation (the portion to check whether > the relation is a foreign table managed by a particular FDW) and > has_wholerow_reference. > I didn't touch the EXPLAIN logic because I'm uncertain whether the > cost of remote join is reasonable towards the cost as an alternative > path to local joins. > > Please check it. Thanks, The patches could be applied cleanly, but I saw a compiler warning about get_rel_relkind() in foreign.c, but it's minor issue. Please just add #include of utils/lsyscache.h there. I have some more random comments about EXPLAIN. 1) You use "Operation" as the label of Custom Scan nodes in non-text format, but it seems to me rather "provider name". What is the string shown there? 2) It would be nice if we can see the information about what the Custom Scan node replaced in EXPLAIN output (even only in verbose mode). I know that we can't show plan tree below custom scan nodes, because CS Provider can't obtain other candidates. But even only relation names used in the join or the scan would help users to understand what is going on in Custom Scan. Regards, -- Shigeru HANADA
Hanada-san, Thanks for your reviewing. 2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>: > Hi KaiGai-san, > > 2013/12/8 Kohei KaiGai <kaigai@kaigai.gr.jp>: >> The attached patches include documentation fixup by Hanada-san, >> and relocation of is_managed_relation (the portion to check whether >> the relation is a foreign table managed by a particular FDW) and >> has_wholerow_reference. >> I didn't touch the EXPLAIN logic because I'm uncertain whether the >> cost of remote join is reasonable towards the cost as an alternative >> path to local joins. >> >> Please check it. Thanks, > > The patches could be applied cleanly, but I saw a compiler warning > about get_rel_relkind() in foreign.c, but it's minor issue. Please > just add #include of utils/lsyscache.h there. > Fixed, > I have some more random comments about EXPLAIN. > > 1) You use "Operation" as the label of Custom Scan nodes in non-text > format, but it seems to me rather "provider name". What is the string > shown there? > I tried free-riding on the existing properties, but it does not make sense indeed, as you pointed out. I adjusted the explain.c to show "Custom-Provider" property for Custom- Scan node, as follows. postgres=# explain(format xml) select * from t1 where ctid > '(4,0)'::tid; QUERY PLAN ---------------------------------------------------------- <explain xmlns="http://www.postgresql.org/2009/explain">+ <Query> + <Plan> + <Node-Type>Custom Scan</Node-Type> + <Custom-Provider>ctidscan</Custom-Provider> + <Relation-Name>t1</Relation-Name> + <Alias>t1</Alias> + <Startup-Cost>0.00</Startup-Cost> + <Total-Cost>12.30</Total-Cost> + <Plan-Rows>410</Plan-Rows> + <Plan-Width>36</Plan-Width> + <Filter>(ctid > '(4,0)'::tid)</Filter> + </Plan> + </Query> + </explain> (1 row) > 2) It would be nice if we can see the information about what the > Custom Scan node replaced in EXPLAIN output (even only in verbose > mode). I know that we can't show plan tree below custom scan nodes, > because CS Provider can't obtain other candidates. But even only > relation names used in the join or the scan would help users to > understand what is going on in Custom Scan. > Even though I agree that it helps users to understand the plan, it also has a headache to implement because CustomScan node (and its super class) does not have an information which relations are underlying. Probably, this functionality needs to show the underlying relations on ExplainTargetRel() if CustomScan node represents a scan instead of join. What data source can produce the list of underlying relations here? So, if it is not a significant restriction for users, I'd like to work on this feature later. The attached patch fixes up a minor warning around get_rel_relkind and name of the property for custom-provider. Please check it. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Hi Kaigai-san, 2013/12/11 Kohei KaiGai <kaigai@kaigai.gr.jp>: > 2013/12/10 Shigeru Hanada <shigeru.hanada@gmail.com>: >> The patches could be applied cleanly, but I saw a compiler warning >> about get_rel_relkind() in foreign.c, but it's minor issue. Please >> just add #include of utils/lsyscache.h there. >> > Fixed, Check. >> I have some more random comments about EXPLAIN. >> >> 1) You use "Operation" as the label of Custom Scan nodes in non-text >> format, but it seems to me rather "provider name". What is the string >> shown there? >> > I tried free-riding on the existing properties, but it does not make sense > indeed, as you pointed out. > I adjusted the explain.c to show "Custom-Provider" property for Custom- > Scan node, as follows. New name seems better, it is what the node express. >> 2) It would be nice if we can see the information about what the >> Custom Scan node replaced in EXPLAIN output (even only in verbose >> mode). I know that we can't show plan tree below custom scan nodes, >> because CS Provider can't obtain other candidates. But even only >> relation names used in the join or the scan would help users to >> understand what is going on in Custom Scan. >> > Even though I agree that it helps users to understand the plan, > it also has a headache to implement because CustomScan node > (and its super class) does not have an information which relations > are underlying. Probably, this functionality needs to show > the underlying relations on ExplainTargetRel() if CustomScan node > represents a scan instead of join. What data source can produce > the list of underlying relations here? > So, if it is not a significant restriction for users, I'd like to work on this > feature later. Agreed. It would be enough that Custom Scan Providers can add arbitrary information, such as "Remote SQL" of postgres_fdw, to EXPLAIN result via core API. Some kind of framework which helps authors of Custom Scan Providers, but it should be considered after the first cut. > The attached patch fixes up a minor warning around get_rel_relkind > and name of the property for custom-provider. Please check it. The patch can be applied onto 2013-12-16 HEAD cleanly, and gives no unexpected error/warinig. I'm sorry to post separately, but I have some comments on document. (1) ctidscan Is session_preload_libraries available to enable the feature, like shared_*** and local_***? According to my trial it works fine like two similar GUCs. (2) postgres_fdw JOIN push--down is a killer application of Custom Scan Provider feature, so I think it's good to mention it in the "Remote Query Optimization" section. Codes for core and contrib seem fine, so I'll mark the patches "Ready for committer" after the document enhancement. Regards, -- Shigeru HANADA
Hi Hanada-san, (2013/12/16 14:15), Shigeru Hanada wrote: > I'm sorry to post separately, but I have some comments on document. > > (1) ctidscan > Is session_preload_libraries available to enable the feature, like > shared_*** and local_***? According to my trial it works fine like > two similar GUCs. > It shall be available; nothing different from the two parameters that we have supported for long time. Sorry, I missed the new feature to mention about. > (2) postgres_fdw > JOIN push--down is a killer application of Custom Scan Provider > feature, so I think it's good to mention it in the "Remote Query > Optimization" section. > I added an explanation about remote join execution on the section. Probably, it help users understand why Custom Scan node is here instead of Join node. Thanks for your suggestion. Best regards, -- OSS Promotion Center / The PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Attachment
KaiGai-san, 2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>: > (2013/12/16 14:15), Shigeru Hanada wrote: >> (1) ctidscan >> Is session_preload_libraries available to enable the feature, like >> shared_*** and local_***? According to my trial it works fine like >> two similar GUCs. >> > It shall be available; nothing different from the two parameters that > we have supported for long time. Sorry, I missed the new feature to > mention about. Check. >> (2) postgres_fdw >> JOIN push--down is a killer application of Custom Scan Provider >> feature, so I think it's good to mention it in the "Remote Query >> Optimization" section. >> > I added an explanation about remote join execution on the section. > Probably, it help users understand why Custom Scan node is here > instead of Join node. Thanks for your suggestion. Check. I think that these patches are enough considered to mark as "Ready for Committer". Regards, -- Shigeru HANADA
Hello, The attached patches are the ones rebased to the latest git tree, but no functional changes from the previous revision on the commit-fest:Nov. Hanada-san volunteered to review the series of patches, including the portion for postgres_fdw, then marked it as "ready for committer" on the last commit fest. So, I hope someone of committer also volunteer to review the patches for final checking. * Part-1 - CustomScan APIs This patch provides a set of interfaces to interact query-optimizer and -executor for extensions. The new add_scan_path_hook or add_join_path_hook allows to offer alternative ways to scan a particular relation or to join a particular relations. Then, once the alternative ways are chosen by the optimizer, associated callbacks shall be kicked from the executor. In this case, extension has responsibility to return a slot that hold a tuple (or empty for end of scan) being scanned from the underlying relation. * Part-2 - contrib/ctidscan This patch provides a simple example implementation of CustomScan API. It enables to skip pages when inequality operators are given on ctid system columns. That is, at least, better than sequential full-scan, so it usually wins to SeqScan, but Index-scan is much better. * Part-3 - remote join implementation This patch provides an example to replace a join by a custom scan node that runs on a result set of remote join query, on top of existing postgres_fdw extension. The idea is, a result set of remote query looks like a relation but intangible, thus, it is feasible to replace a local join by a scan on the result set of a query executed on the remote host, if both of the relation to be joined belongs to the identical foreign server. This patch gives postgres_fdw a capability to run a join on the remote host. Thanks, 2013/12/16 Shigeru Hanada <shigeru.hanada@gmail.com>: > KaiGai-san, > > 2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>: >> (2013/12/16 14:15), Shigeru Hanada wrote: >>> (1) ctidscan >>> Is session_preload_libraries available to enable the feature, like >>> shared_*** and local_***? According to my trial it works fine like >>> two similar GUCs. >>> >> It shall be available; nothing different from the two parameters that >> we have supported for long time. Sorry, I missed the new feature to >> mention about. > > Check. > >>> (2) postgres_fdw >>> JOIN push--down is a killer application of Custom Scan Provider >>> feature, so I think it's good to mention it in the "Remote Query >>> Optimization" section. >>> >> I added an explanation about remote join execution on the section. >> Probably, it help users understand why Custom Scan node is here >> instead of Join node. Thanks for your suggestion. > > Check. > > I think that these patches are enough considered to mark as "Ready for > Committer". > > Regards, > -- > Shigeru HANADA -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Hackers, Is somebody available to volunteer to review the custom-scan patch? Even though Hanada-san acknowledged before, it seems to me this patch has potentially arguable implementations. Even if you have enough time to review whole of the code, it helps me if you can comment on the following topics. (1) Interface to add alternative paths instead of built-in join paths This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow extensions to provide alternative scan path in addition to the built-in join paths. Custom-scan path being added is assumed to perform to scan on a (virtual) relation that is a result set of joining relations. My concern is its arguments to be pushed. This hook is declared as follows: /* Hook for plugins to add custom join path, in addition to default ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, SpecialJoinInfo *sjinfo, List *restrictlist, List *mergeclause_list, SemiAntiJoinFactors *semifactors, Relids param_source_rels, Relids extra_lateral_rels); extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; Likely, its arguments upper than restrictlist should be informed to extensions, because these are also arguments of add_paths_to_joinrel(). However, I'm not 100% certain how about other arguments should be informed. Probably, it makes sense to inform param_source_rels and extra_lateral_rels to check whether the path is sensible for parameterized paths. On the other hand, I doubt whether mergeclause_list is usuful to deliver. (It may make sense if someone tries to implement their own merge-join implementation??) I'd like to seem idea to improve the current interface specification. (2) CUSTOM_VAR for special Var reference @@ -134,6 +134,7 @@ typedef struct Expr#define INNER_VAR 65000 /* reference to inner subplan */#define OUTER_VAR 65001 /* reference to outer subplan */#define INDEX_VAR 65002 /* reference to indexcolumn */ +#define CUSTOM_VAR 65003 /* reference to custom column */ I newly added CUSTOM_VAR to handle a case when custom-scan override join relations. Var-nodes within join plan are adjusted to refer either ecxt_innertuple or ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs instead of built-in joins, because its tuples being fetched are usually stored on the ecxt_scantuple, thus Var-nodes also need to have right varno neither inner nor outer. SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows extensions to rewrite Var-nodes within custom-scan node to indicate ecxt_scantuple using CUSTOM_VAR, instead of inner or outer. For example, a var-node with varno=CUSTOM_VAR and varattno=3 means this node reference the third attribute of the tuple in ecxt_scantuple. I think it is a reasonable solution, however, I'm not 100% certain whether people have more graceful idea to implement it. If you have comments around above two topic, or others, please give your ideas. Thanks, > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kohei KaiGai > Sent: Tuesday, January 14, 2014 11:20 PM > To: Shigeru Hanada > Cc: Kaigai, Kouhei(海外, 浩平); Jim Mlodgenski; Robert Haas; Tom Lane; > PgHacker; Peter Eisentraut > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > Hello, > > The attached patches are the ones rebased to the latest git tree, but no > functional changes from the previous revision on the commit-fest:Nov. > Hanada-san volunteered to review the series of patches, including the > portion for postgres_fdw, then marked it as "ready for committer" on the > last commit fest. > So, I hope someone of committer also volunteer to review the patches for > final checking. > > * Part-1 - CustomScan APIs > This patch provides a set of interfaces to interact query-optimizer and > -executor for extensions. The new add_scan_path_hook or add_join_path_hook > allows to offer alternative ways to scan a particular relation or to join > a particular relations. > Then, once the alternative ways are chosen by the optimizer, associated > callbacks shall be kicked from the executor. In this case, extension has > responsibility to return a slot that hold a tuple (or empty for end of scan) > being scanned from the underlying relation. > > * Part-2 - contrib/ctidscan > This patch provides a simple example implementation of CustomScan API. > It enables to skip pages when inequality operators are given on ctid system > columns. That is, at least, better than sequential full-scan, so it usually > wins to SeqScan, but Index-scan is much better. > > * Part-3 - remote join implementation > This patch provides an example to replace a join by a custom scan node that > runs on a result set of remote join query, on top of existing postgres_fdw > extension. The idea is, a result set of remote query looks like a relation > but intangible, thus, it is feasible to replace a local join by a scan on > the result set of a query executed on the remote host, if both of the relation > to be joined belongs to the identical foreign server. > This patch gives postgres_fdw a capability to run a join on the remote host. > > Thanks, > > > 2013/12/16 Shigeru Hanada <shigeru.hanada@gmail.com>: > > KaiGai-san, > > > > 2013/12/16 KaiGai Kohei <kaigai@ak.jp.nec.com>: > >> (2013/12/16 14:15), Shigeru Hanada wrote: > >>> (1) ctidscan > >>> Is session_preload_libraries available to enable the feature, like > >>> shared_*** and local_***? According to my trial it works fine like > >>> two similar GUCs. > >>> > >> It shall be available; nothing different from the two parameters that > >> we have supported for long time. Sorry, I missed the new feature to > >> mention about. > > > > Check. > > > >>> (2) postgres_fdw > >>> JOIN push--down is a killer application of Custom Scan Provider > >>> feature, so I think it's good to mention it in the "Remote Query > >>> Optimization" section. > >>> > >> I added an explanation about remote join execution on the section. > >> Probably, it help users understand why Custom Scan node is here > >> instead of Join node. Thanks for your suggestion. > > > > Check. > > > > I think that these patches are enough considered to mark as "Ready for > > Committer". > > > > Regards, > > -- > > Shigeru HANADA > > > > -- > KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai Kohei, * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > Is somebody available to volunteer to review the custom-scan patch? I looked through it a bit and my first take away from it was that the patches to actually use the new hooks were also making more changes to the backend code, leaving me with the impression that the proposed interface isn't terribly stable. Perhaps those changes should have just been in the first patch, but they weren't and that certainly gave me pause. I'm also not entirely convinced that this is the direction to go in when it comes to pushing down joins to FDWs. While that's certainly a goal that I think we all share, this seems to be intending to add a completely different feature which happens to be able to be used for that. For FDWs, wouldn't we only present the FDW with the paths where the foreign tables for that FDW, or perhaps just a given foreign server, are being joined? Thanks, Stephen
Hi Stephen, Thanks for your comments. > * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > Is somebody available to volunteer to review the custom-scan patch? > > I looked through it a bit and my first take away from it was that the patches > to actually use the new hooks were also making more changes to the backend > code, leaving me with the impression that the proposed interface isn't > terribly stable. Perhaps those changes should have just been in the first > patch, but they weren't and that certainly gave me pause. > Yes, the part-1 patch provides a set of interface portion to interact between the backend code and extension code. Rest of part-2 and part-3 portions are contrib modules that implements its feature on top of custom-scan API. > I'm also not entirely convinced that this is the direction to go in when > it comes to pushing down joins to FDWs. While that's certainly a goal that > I think we all share, this seems to be intending to add a completely different > feature which happens to be able to be used for that. For FDWs, wouldn't > we only present the FDW with the paths where the foreign tables for that > FDW, or perhaps just a given foreign server, are being joined? > FDW's join pushing down is one of the valuable use-cases of this interface, but not all. As you might know, my motivation is to implement GPU acceleration feature on top of this interface, that offers alternative way to scan or join relations or potentially sort or aggregate. Probably, it is too stretch interpretation if we implement radix-sort on top of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down) is a demonstration of custom-scan interface for application, but not designed for a special purpose. Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw side, it might be an idea to have common code (like a logic to check whether the both relations to be joined belongs to same foreign server) on the backend side as something like a gateway of them. As an aside, what should be the scope of FDW interface? In my understanding, it allows extension to implement "something" on behalf of a particular data structure being declared with CREATE FOREIGN TABLE. In other words, extension's responsibility is to generate a view of "something" according to PostgreSQL' internal data structure, instead of the object itself. On the other hands, custom-scan interface allows extensions to implement alternative methods to scan or join particular relations, but it is not a role to perform as a target being referenced in queries. In other words, it is methods to access objects. It is natural both features are similar because both of them intends extensions to hook the planner and executor, however, its purpose is different. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Mon, Jan 27, 2014 at 7:14 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
FDW's join pushing down is one of the valuable use-cases of this interface,
but not all. As you might know, my motivation is to implement GPU acceleration
feature on top of this interface, that offers alternative way to scan or join
relations or potentially sort or aggregate.
I'm curious how this relates to the pluggable storage idea discussed here https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting and here
I haven't seen a specific proposal about how much functionality should be encapsulated by a pluggable storage system. But I wonder if that would be the best place for specialized table-scan code to end up?
2014-01-29 Christian Convey <christian.convey@gmail.com>: > > On Mon, Jan 27, 2014 at 7:14 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> >> FDW's join pushing down is one of the valuable use-cases of this >> interface, >> but not all. As you might know, my motivation is to implement GPU >> acceleration >> feature on top of this interface, that offers alternative way to scan or >> join >> relations or potentially sort or aggregate. > > > I'm curious how this relates to the pluggable storage idea discussed here > https://wiki.postgresql.org/wiki/PgCon_2013_Developer_Meeting and here > http://www.databasesoup.com/2013/05/postgresql-new-development-priorities-2.html > > I haven't seen a specific proposal about how much functionality should be > encapsulated by a pluggable storage system. But I wonder if that would be > the best place for specialized table-scan code to end up? > If you are interested in designing your own storage layer (thus it needs to have own scan/writer implementation), FDW is an option currently available. It defines a set of interface that allows extensions to generate "things to be there" on the fly. It does not force to perform as a client of remote database, even though it was originally designed for dblink purpose. In other words, FDW is a feature to translate a particular data source into something visible according to the table definition. As long as driver can intermediate table definition and data format of your own storage layer, it shall work. On the other hands, custom-scan interface, basically, allows extensions to implement "alternative way to access" the data. If we have wiser way to scan or join relations than built-in logic (yep, it will be a wiser logic to scan a result set of remote-join than local join on a couple of remote scan results), this interface suggest the backend "I have such a wise strategy", then planner will choose one of them; including either built-in or additional one, according to the cost. Let's back to your question. This interface is, right now, not designed to implement pluggable storage layer. FDW is an option now, and maybe a development item in v9.5 cycle if we want regular tables being pluggable. Because I'm motivated to implement my GPU acceleration feature to perform on regular relations, I put my higher priority on the interface to allow extension to suggest "how to scan" it. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Folks, Let me remind the custom-scan patches; that is a basis feature of remote join of postgres_fdw, cache-only scan, (upcoming) GPU acceleration feature or various alternative ways to scan/join relations. Unfortunately, small amount of discussion we could have in this commit fest, even though Hanada-san volunteered to move the patches into "ready for committer" state at the CF-Nov. Prior to time-up, I'd like to ask hacker's opinion about its potential arguable points (from my standpoint) if it needs to be fixed up. One is hook definition to add alternative join path, and the other one is a special varno when a custom scan replace a join node. I'd like to see your opinion about them while we still have to change the design if needed. (1) Interface to add alternative paths in addition to built-in join paths This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow extensions to provide alternative scan path in addition to the built-in join paths. Custom-scan path being added is assumed to perform to scan on a (virtual) relation that is a result set of joining relations. My concern is its arguments to be pushed. This hook is declared as follows: /* Hook for plugins to add custom join path, in addition to default ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root, RelOptInfo *joinrel, RelOptInfo *outerrel, RelOptInfo *innerrel, JoinType jointype, SpecialJoinInfo *sjinfo, List *restrictlist, List *mergeclause_list, SemiAntiJoinFactors *semifactors, Relids param_source_rels, Relids extra_lateral_rels); extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; Likely, its arguments upper than restrictlist should be informed to extensions, because these are also arguments of add_paths_to_joinrel(). However, I'm not 100% certain how about other arguments should be informed. Probably, it makes sense to inform param_source_rels and extra_lateral_rels to check whether the path is sensible for parameterized paths. On the other hand, I doubt whether mergeclause_list is usuful to deliver. (It may make sense if someone tries to implement their own merge-join implementation??) I'd like to seem idea to improve the current interface specification. (2) CUSTOM_VAR for special Var reference @@ -134,6 +134,7 @@ typedef struct Expr #define INNER_VAR 65000 /* reference to inner subplan */ #define OUTER_VAR 65001 /* reference to outer subplan */ #define INDEX_VAR 65002 /* reference to index column */ +#define CUSTOM_VAR 65003 /* reference to custom column */ I newly added CUSTOM_VAR to handle a case when custom-scan override join relations. Var-nodes within join plan are adjusted to refer either ecxt_innertuple or ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs instead of built-in joins, because its tuples being fetched are usually stored on the ecxt_scantuple, thus Var-nodes also need to have right varno neither inner nor outer. SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows extensions to rewrite Var-nodes within custom-scan node to indicate ecxt_scantuple using CUSTOM_VAR, instead of inner or outer. For example, a var-node with varno=CUSTOM_VAR and varattno=3 means this node reference the third attribute of the tuple in ecxt_scantuple. I think it is a reasonable solution, however, I'm not 100% certain whether people have more graceful idea to implement it. If you have comments around above two topic, or others, please give your ideas. Thanks, 2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > Hi Stephen, > > Thanks for your comments. > >> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: >> > Is somebody available to volunteer to review the custom-scan patch? >> >> I looked through it a bit and my first take away from it was that the patches >> to actually use the new hooks were also making more changes to the backend >> code, leaving me with the impression that the proposed interface isn't >> terribly stable. Perhaps those changes should have just been in the first >> patch, but they weren't and that certainly gave me pause. >> > Yes, the part-1 patch provides a set of interface portion to interact > between the backend code and extension code. Rest of part-2 and part-3 > portions are contrib modules that implements its feature on top of > custom-scan API. > >> I'm also not entirely convinced that this is the direction to go in when >> it comes to pushing down joins to FDWs. While that's certainly a goal that >> I think we all share, this seems to be intending to add a completely different >> feature which happens to be able to be used for that. For FDWs, wouldn't >> we only present the FDW with the paths where the foreign tables for that >> FDW, or perhaps just a given foreign server, are being joined? >> > FDW's join pushing down is one of the valuable use-cases of this interface, > but not all. As you might know, my motivation is to implement GPU acceleration > feature on top of this interface, that offers alternative way to scan or join > relations or potentially sort or aggregate. > Probably, it is too stretch interpretation if we implement radix-sort on top > of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down) > is a demonstration of custom-scan interface for application, but not designed > for a special purpose. > > Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw > side, it might be an idea to have common code (like a logic to check whether > the both relations to be joined belongs to same foreign server) on the backend > side as something like a gateway of them. > > > As an aside, what should be the scope of FDW interface? > In my understanding, it allows extension to implement "something" on behalf of > a particular data structure being declared with CREATE FOREIGN TABLE. > In other words, extension's responsibility is to generate a view of "something" > according to PostgreSQL' internal data structure, instead of the object itself. > On the other hands, custom-scan interface allows extensions to implement > alternative methods to scan or join particular relations, but it is not a role > to perform as a target being referenced in queries. In other words, it is methods > to access objects. > It is natural both features are similar because both of them intends extensions > to hook the planner and executor, however, its purpose is different. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project > KaiGai Kohei <kaigai@ak.jp.nec.com> -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Attachment
Hi Kaigai-san, Sorry to leave the thread for a while. 2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > (1) Interface to add alternative paths in addition to built-in join paths > > This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow > extensions to provide alternative scan path in addition to the built-in > join paths. Custom-scan path being added is assumed to perform to scan > on a (virtual) relation that is a result set of joining relations. > My concern is its arguments to be pushed. This hook is declared as follows: > > /* Hook for plugins to add custom join path, in addition to default ones */ > typedef void (*add_join_path_hook_type)(PlannerInfo *root, > RelOptInfo *joinrel, > RelOptInfo *outerrel, > RelOptInfo *innerrel, > JoinType jointype, > SpecialJoinInfo *sjinfo, > List *restrictlist, > List *mergeclause_list, > SemiAntiJoinFactors *semifactors, > Relids param_source_rels, > Relids extra_lateral_rels); > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; > > Likely, its arguments upper than restrictlist should be informed to extensions, > because these are also arguments of add_paths_to_joinrel(). > However, I'm not 100% certain how about other arguments should be informed. > Probably, it makes sense to inform param_source_rels and extra_lateral_rels > to check whether the path is sensible for parameterized paths. > On the other hand, I doubt whether mergeclause_list is usuful to deliver. > (It may make sense if someone tries to implement their own merge-join > implementation??) > > I'd like to seem idea to improve the current interface specification. I've read the code path to add custom join again, and felt that providing semifactors seems not necessary for the first cut, because it is used in only initial_cost_nestloop (final_cost_nestloop receives semifactors but it is not used there), and external module would not become so smart before 9.5 development cycle. It seems enough complex to postpone determinig whether it's essential for add_join_path_hook.Do you have any concrete use case for the parameter? mergeclause_list and param_source_rels seem little easier to use, but anyway it should be documented how to use those parameters. IMHO, minimal interface seems better than fully-fledged but half-baked one, especially in the first-cut. -- Shigeru HANADA
2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > Folks, > > Let me remind the custom-scan patches; that is a basis feature of > remote join of postgres_fdw, cache-only scan, (upcoming) GPU > acceleration feature or various alternative ways to scan/join relations. > Unfortunately, small amount of discussion we could have in this commit > fest, even though Hanada-san volunteered to move the patches into > "ready for committer" state at the CF-Nov. I found some cosmetic flaw and .gitignore leak in the patches. Please see attached a patch for details. -- Shigeru HANADA
Attachment
> > /* Hook for plugins to add custom join path, in addition to default > > ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root, > > RelOptInfo *joinrel, > > RelOptInfo *outerrel, > > RelOptInfo *innerrel, > > JoinType jointype, > > SpecialJoinInfo *sjinfo, > > List *restrictlist, > > List *mergeclause_list, > > SemiAntiJoinFactors *semifactors, > > Relids param_source_rels, > > Relids extra_lateral_rels); > > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; > > > > Likely, its arguments upper than restrictlist should be informed to > > extensions, because these are also arguments of add_paths_to_joinrel(). > > However, I'm not 100% certain how about other arguments should be informed. > > Probably, it makes sense to inform param_source_rels and > > extra_lateral_rels to check whether the path is sensible for parameterized > paths. > > On the other hand, I doubt whether mergeclause_list is usuful to deliver. > > (It may make sense if someone tries to implement their own merge-join > > implementation??) > > > > I'd like to seem idea to improve the current interface specification. > > I've read the code path to add custom join again, and felt that providing > semifactors seems not necessary for the first cut, because it is used in > only initial_cost_nestloop (final_cost_nestloop receives semifactors but > it is not used there), and external module would not become so smart before > 9.5 development cycle. It seems enough complex to postpone determinig > whether it's essential for add_join_path_hook. > Do you have any concrete use case for the parameter? > The reason why I asked the question above is, I haven't been 100% certain about its usage. Indeed, semifactors is applied on a limited usage, but quite easy to reproduce by extension later (using clauselist_selectivity) if extension wants this factor. So, I agree with removing the semifactors here. > mergeclause_list and param_source_rels seem little easier to use, but > anyway it should be documented how to use those parameters. > The mergeclause_list might not be sufficient for extensions to determine whether its own mergejoin is applicable here. See the comment below; that is on the head of select_mergejoin_clauses. | * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if | * this is a right/full join and there are nonmergejoinable join clauses. | * The executor's mergejoin machinery cannot handle such cases, so we have | * to avoid generating a mergejoin plan. (Note that this flag does NOT | * consider whether there are actually any mergejoinable clauses. This is | * correct because in some cases we need to build a clauseless mergejoin. | * Simply returning NIL is therefore not enough to distinguish safe from | * unsafe cases.) | It says, mergejoin_clause == NIL is not a sufficient check to determine whether the mergejoin logic is applicable on the target join. So, either of them is probably an option for extension that tries to implement their own mergejoin logic; (1) putting both of mergejoin_allowed and mergeclause_list as arguments of the hook, or (2) re-definition of select_mergejoin_clauses() as extern function to reproduce the variables on demand. Which one is more preferable? BTW, I found a duplicate clause_sides_match_join() definition, that is invoked at select_mergejoin_clauses(), in joinpath.c and analyzejoins.c. Either of them should be eliminated, I think. For param_source_rels and extra_lateral_rels, I'll put source code comments around add_join_path_hook. Earlier half of try_(nestloop|hashjoin|mergejoin)_path is probably useful as example of extension implementation. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Shigeru Hanada [mailto:shigeru.hanada@gmail.com] > Sent: Tuesday, February 25, 2014 12:41 AM > To: Kohei KaiGai > Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Jim Mlodgenski; Robert Haas; > Tom Lane; PgHacker; Peter Eisentraut > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > Hi Kaigai-san, > > Sorry to leave the thread for a while. > > 2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > > (1) Interface to add alternative paths in addition to built-in join > > paths > > > > This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow > > extensions to provide alternative scan path in addition to the > > built-in join paths. Custom-scan path being added is assumed to > > perform to scan on a (virtual) relation that is a result set of joining > relations. > > My concern is its arguments to be pushed. This hook is declared as follows: > > > > /* Hook for plugins to add custom join path, in addition to default > > ones */ typedef void (*add_join_path_hook_type)(PlannerInfo *root, > > RelOptInfo *joinrel, > > RelOptInfo *outerrel, > > RelOptInfo *innerrel, > > JoinType jointype, > > SpecialJoinInfo *sjinfo, > > List *restrictlist, > > List *mergeclause_list, > > SemiAntiJoinFactors > *semifactors, > > Relids param_source_rels, > > Relids extra_lateral_rels); > > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; > > > > Likely, its arguments upper than restrictlist should be informed to > > extensions, because these are also arguments of add_paths_to_joinrel(). > > However, I'm not 100% certain how about other arguments should be informed. > > Probably, it makes sense to inform param_source_rels and > > extra_lateral_rels to check whether the path is sensible for parameterized > paths. > > On the other hand, I doubt whether mergeclause_list is usuful to deliver. > > (It may make sense if someone tries to implement their own merge-join > > implementation??) > > > > I'd like to seem idea to improve the current interface specification. > > I've read the code path to add custom join again, and felt that providing > semifactors seems not necessary for the first cut, because it is used in > only initial_cost_nestloop (final_cost_nestloop receives semifactors but > it is not used there), and external module would not become so smart before > 9.5 development cycle. It seems enough complex to postpone determinig > whether it's essential for add_join_path_hook. > Do you have any concrete use case for the parameter? > > mergeclause_list and param_source_rels seem little easier to use, but > anyway it should be documented how to use those parameters. > > IMHO, minimal interface seems better than fully-fledged but half-baked one, > especially in the first-cut. > > -- > Shigeru HANADA
On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
Folks,
Let me remind the custom-scan patches; that is a basis feature of
remote join of postgres_fdw, cache-only scan, (upcoming) GPU
acceleration feature or various alternative ways to scan/join relations.
Unfortunately, small amount of discussion we could have in this commit
fest, even though Hanada-san volunteered to move the patches into
"ready for committer" state at the CF-Nov.
Sorry for jumping into this late.
Instead of custom node, it might be better idea to improve FDW infrastructure to push join. For the starters, is it possible for the custom scan node hooks to create a ForeignScan node? In general, I think, it might be better for the custom scan hooks to create existing nodes if they serve the purpose.
Prior to time-up, I'd like to ask hacker's opinion about its potential
arguable points (from my standpoint) if it needs to be fixed up.
One is hook definition to add alternative join path, and the other one
is a special varno when a custom scan replace a join node.
I'd like to see your opinion about them while we still have to change
the design if needed.
(1) Interface to add alternative paths in addition to built-in join paths
This patch adds "add_join_path_hook" on add_paths_to_joinrel to allow
extensions to provide alternative scan path in addition to the built-in
join paths. Custom-scan path being added is assumed to perform to scan
on a (virtual) relation that is a result set of joining relations.
My concern is its arguments to be pushed. This hook is declared as follows:
/* Hook for plugins to add custom join path, in addition to default ones */
typedef void (*add_join_path_hook_type)(PlannerInfo *root,
RelOptInfo *joinrel,
RelOptInfo *outerrel,
RelOptInfo *innerrel,
JoinType jointype,
SpecialJoinInfo *sjinfo,
List *restrictlist,
List *mergeclause_list,
SemiAntiJoinFactors *semifactors,
Relids param_source_rels,
Relids extra_lateral_rels);
extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
Likely, its arguments upper than restrictlist should be informed to extensions,
because these are also arguments of add_paths_to_joinrel().
However, I'm not 100% certain how about other arguments should be informed.
Probably, it makes sense to inform param_source_rels and extra_lateral_rels
to check whether the path is sensible for parameterized paths.
On the other hand, I doubt whether mergeclause_list is usuful to deliver.
(It may make sense if someone tries to implement their own merge-join
implementation??)
I'd like to seem idea to improve the current interface specification.
Since a custom node is open implementation, it will be important to pass as much information down to the hooks as possible; lest the hooks will be constrained. Since the functions signatures within the planner, optimizer will change from time to time, so the custom node hook signatures will need to change from time to time. That might turn out to be maintenance overhead.
BTW, is it a good idea for custom nodes to also affect other paths like append, group etc.? Will it need separate hooks for each of those?
2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
(2) CUSTOM_VAR for special Var reference
@@ -134,6 +134,7 @@ typedef struct Expr
#define INNER_VAR 65000 /* reference to inner subplan */
#define OUTER_VAR 65001 /* reference to outer subplan */
#define INDEX_VAR 65002 /* reference to index column */
+#define CUSTOM_VAR 65003 /* reference to custom column */
I newly added CUSTOM_VAR to handle a case when custom-scan override
join relations.
Var-nodes within join plan are adjusted to refer either ecxt_innertuple or
ecxt_outertuple of ExprContext. It makes a trouble if custom-scan runs
instead of built-in joins, because its tuples being fetched are usually
stored on the ecxt_scantuple, thus Var-nodes also need to have right
varno neither inner nor outer.
SetPlanRefCustomScan callback, being kicked on set_plan_refs, allows
extensions to rewrite Var-nodes within custom-scan node to indicate
ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
this node reference the third attribute of the tuple in ecxt_scantuple.
I think it is a reasonable solution, however, I'm not 100% certain
whether people have more graceful idea to implement it.
If you have comments around above two topic, or others, please give
your ideas.
Thanks,KaiGai Kohei <kaigai@kaigai.gr.jp>> Hi Stephen,
>
> Thanks for your comments.
>
>> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
>> > Is somebody available to volunteer to review the custom-scan patch?
>>
>> I looked through it a bit and my first take away from it was that the patches
>> to actually use the new hooks were also making more changes to the backend
>> code, leaving me with the impression that the proposed interface isn't
>> terribly stable. Perhaps those changes should have just been in the first
>> patch, but they weren't and that certainly gave me pause.
>>
> Yes, the part-1 patch provides a set of interface portion to interact
> between the backend code and extension code. Rest of part-2 and part-3
> portions are contrib modules that implements its feature on top of
> custom-scan API.
>
>> I'm also not entirely convinced that this is the direction to go in when
>> it comes to pushing down joins to FDWs. While that's certainly a goal that
>> I think we all share, this seems to be intending to add a completely different
>> feature which happens to be able to be used for that. For FDWs, wouldn't
>> we only present the FDW with the paths where the foreign tables for that
>> FDW, or perhaps just a given foreign server, are being joined?
>>
> FDW's join pushing down is one of the valuable use-cases of this interface,
> but not all. As you might know, my motivation is to implement GPU acceleration
> feature on top of this interface, that offers alternative way to scan or join
> relations or potentially sort or aggregate.
> Probably, it is too stretch interpretation if we implement radix-sort on top
> of FDW. I'd like you to understand the part-3 patch (FDW's join pushing-down)
> is a demonstration of custom-scan interface for application, but not designed
> for a special purpose.
>
> Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw
> side, it might be an idea to have common code (like a logic to check whether
> the both relations to be joined belongs to same foreign server) on the backend
> side as something like a gateway of them.
>
>
> As an aside, what should be the scope of FDW interface?
> In my understanding, it allows extension to implement "something" on behalf of
> a particular data structure being declared with CREATE FOREIGN TABLE.
> In other words, extension's responsibility is to generate a view of "something"
> according to PostgreSQL' internal data structure, instead of the object itself.
> On the other hands, custom-scan interface allows extensions to implement
> alternative methods to scan or join particular relations, but it is not a role
> to perform as a target being referenced in queries. In other words, it is methods
> to access objects.
> It is natural both features are similar because both of them intends extensions
> to hook the planner and executor, however, its purpose is different.
>
> Thanks,
> --
> NEC OSS Promotion Center / PG-Strom Project
> KaiGai Kohei <kaigai@ak.jp.nec.com>
--
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
> Sorry for jumping into this late. > > Instead of custom node, it might be better idea to improve FDW infrastructure > to push join. For the starters, is it possible for the custom scan node > hooks to create a ForeignScan node? In general, I think, it might be better > for the custom scan hooks to create existing nodes if they serve the purpose. > It does not work well because existing FDW infrastructure is designed to perform on foreign tables, not regular tables. Probably, it needs to revise much our assumption around the background code, if we re-define the purpose of FDW infrastructure. For example, ForeignScan is expected to return a tuple according to the TupleDesc that is exactly same with table definition. It does not fit the requirement if we replace a join-node by ForeignScan because its TupleDesc of joined relations is not predefined. I'd like to define these features are designed for individual purpose. FDW is designed to intermediate an external data source and internal heap representation according to foreign table definition. In other words, its role is to generate contents of predefined database object on the fly. On the other hands, custom-scan is designed to implement alternative ways to scan / join relations in addition to the methods supported by built-in feature. I'm motivated to implement GPU acceleration feature that works transparently for application. Thus, it has to be capable on regular tables, because most of application stores data on regular tables, not foreign ones. > Since a custom node is open implementation, it will be important to pass > as much information down to the hooks as possible; lest the hooks will be > constrained. Since the functions signatures within the planner, optimizer > will change from time to time, so the custom node hook signatures will need > to change from time to time. That might turn out to be maintenance overhead. > Yes. You are also right. But it also makes maintenance overhead if hook has many arguments nobody uses. Probably, it makes sense to list up the arguments that cannot be reproduced from other information, can be reproduced but complicated steps, and can be reproduced easily. Below is the information we cannot reproduce:- PlannerInfo *root- RelOptInfo *joinrel- RelOptInfo *outerrel- RelOptInfo *innerrel-JoinType jointype- SpecialJoinInfo *sjinfo- List *restrictlist Below is the information we can reproduce but complicated steps:- List *mergeclause_list- bool mergejoin_allow- Relids param_source_rels-Relids extra_lateral_rels Below is the information we can reproduce easily:- SemiAntiJoinFactors *semifactors I think, the first two categories or the first category (if functions to reproduce the second group is exposed) should be informed to extension, however, priority of the third group is not high. > BTW, is it a good idea for custom nodes to also affect other paths like > append, group etc.? Will it need separate hooks for each of those? > Yes. I plan to support above plan node, in addition to scan / join only. The custom-scan node is thin abstraction towards general executor behavior, so I believe it is not hard to enhance this node, without new plan node for each of them. Of course, it will need separate hook to add alternative path on the planner stage, but no individual plan nodes. (Sorry, it was unclear for me what does the "hook" mean.) Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com] > Sent: Tuesday, February 25, 2014 5:59 PM > To: Kohei KaiGai > Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim > Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > > > > On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote: > > > Folks, > > Let me remind the custom-scan patches; that is a basis feature of > remote join of postgres_fdw, cache-only scan, (upcoming) GPU > acceleration feature or various alternative ways to scan/join > relations. > Unfortunately, small amount of discussion we could have in this > commit > fest, even though Hanada-san volunteered to move the patches into > "ready for committer" state at the CF-Nov. > > > > Sorry for jumping into this late. > > Instead of custom node, it might be better idea to improve FDW infrastructure > to push join. For the starters, is it possible for the custom scan node > hooks to create a ForeignScan node? In general, I think, it might be better > for the custom scan hooks to create existing nodes if they serve the purpose. > > > > > Prior to time-up, I'd like to ask hacker's opinion about its > potential > arguable points (from my standpoint) if it needs to be fixed up. > One is hook definition to add alternative join path, and the other > one > is a special varno when a custom scan replace a join node. > I'd like to see your opinion about them while we still have to change > the design if needed. > > (1) Interface to add alternative paths in addition to built-in join > paths > > > This patch adds "add_join_path_hook" on add_paths_to_joinrel to > allow > extensions to provide alternative scan path in addition to the > built-in > join paths. Custom-scan path being added is assumed to perform to > scan > on a (virtual) relation that is a result set of joining relations. > My concern is its arguments to be pushed. This hook is declared > as follows: > > /* Hook for plugins to add custom join path, in addition to default > ones */ > typedef void (*add_join_path_hook_type)(PlannerInfo *root, > RelOptInfo *joinrel, > RelOptInfo *outerrel, > RelOptInfo *innerrel, > JoinType jointype, > SpecialJoinInfo > *sjinfo, > List *restrictlist, > List *mergeclause_list, > SemiAntiJoinFactors > *semifactors, > Relids > param_source_rels, > Relids > extra_lateral_rels); > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; > > Likely, its arguments upper than restrictlist should be informed > to extensions, > because these are also arguments of add_paths_to_joinrel(). > However, I'm not 100% certain how about other arguments should be > informed. > Probably, it makes sense to inform param_source_rels and > extra_lateral_rels > to check whether the path is sensible for parameterized paths. > On the other hand, I doubt whether mergeclause_list is usuful to > deliver. > (It may make sense if someone tries to implement their own > merge-join > implementation??) > > I'd like to seem idea to improve the current interface > specification. > > > > > Since a custom node is open implementation, it will be important to pass > as much information down to the hooks as possible; lest the hooks will be > constrained. Since the functions signatures within the planner, optimizer > will change from time to time, so the custom node hook signatures will need > to change from time to time. That might turn out to be maintenance overhead. > > > BTW, is it a good idea for custom nodes to also affect other paths like > append, group etc.? Will it need separate hooks for each of those? > > > > > (2) CUSTOM_VAR for special Var reference > > @@ -134,6 +134,7 @@ typedef struct Expr > #define INNER_VAR 65000 /* reference to inner > subplan */ > #define OUTER_VAR 65001 /* reference to outer > subplan */ > #define INDEX_VAR 65002 /* reference to index > column */ > +#define CUSTOM_VAR 65003 /* reference to custom > column */ > > I newly added CUSTOM_VAR to handle a case when custom-scan override > join relations. > Var-nodes within join plan are adjusted to refer either > ecxt_innertuple or > ecxt_outertuple of ExprContext. It makes a trouble if custom-scan > runs > instead of built-in joins, because its tuples being fetched are > usually > stored on the ecxt_scantuple, thus Var-nodes also need to have right > varno neither inner nor outer. > > SetPlanRefCustomScan callback, being kicked on set_plan_refs, > allows > extensions to rewrite Var-nodes within custom-scan node to indicate > ecxt_scantuple using CUSTOM_VAR, instead of inner or outer. > For example, a var-node with varno=CUSTOM_VAR and varattno=3 means > this node reference the third attribute of the tuple in > ecxt_scantuple. > I think it is a reasonable solution, however, I'm not 100% certain > whether people have more graceful idea to implement it. > > If you have comments around above two topic, or others, please give > your ideas. > > Thanks, > > > 2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > > > Hi Stephen, > > > > Thanks for your comments. > > > >> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > >> > Is somebody available to volunteer to review the custom-scan > patch? > >> > >> I looked through it a bit and my first take away from it was > that the patches > >> to actually use the new hooks were also making more changes to > the backend > >> code, leaving me with the impression that the proposed interface > isn't > >> terribly stable. Perhaps those changes should have just been > in the first > >> patch, but they weren't and that certainly gave me pause. > >> > > Yes, the part-1 patch provides a set of interface portion to > interact > > between the backend code and extension code. Rest of part-2 and > part-3 > > portions are contrib modules that implements its feature on top > of > > custom-scan API. > > > >> I'm also not entirely convinced that this is the direction to > go in when > >> it comes to pushing down joins to FDWs. While that's certainly > a goal that > >> I think we all share, this seems to be intending to add a > completely different > >> feature which happens to be able to be used for that. For FDWs, > wouldn't > >> we only present the FDW with the paths where the foreign tables > for that > >> FDW, or perhaps just a given foreign server, are being joined? > >> > > FDW's join pushing down is one of the valuable use-cases of this > interface, > > but not all. As you might know, my motivation is to implement > GPU acceleration > > feature on top of this interface, that offers alternative way > to scan or join > > relations or potentially sort or aggregate. > > Probably, it is too stretch interpretation if we implement > radix-sort on top > > of FDW. I'd like you to understand the part-3 patch (FDW's join > pushing-down) > > is a demonstration of custom-scan interface for application, but > not designed > > for a special purpose. > > > > Right now, I put all the logic to interact CSI and FDW driver > on postgres_fdw > > side, it might be an idea to have common code (like a logic to > check whether > > the both relations to be joined belongs to same foreign server) > on the backend > > side as something like a gateway of them. > > > > > > As an aside, what should be the scope of FDW interface? > > In my understanding, it allows extension to implement "something" > on behalf of > > a particular data structure being declared with CREATE FOREIGN > TABLE. > > In other words, extension's responsibility is to generate a view > of "something" > > according to PostgreSQL' internal data structure, instead of the > object itself. > > On the other hands, custom-scan interface allows extensions to > implement > > alternative methods to scan or join particular relations, but > it is not a role > > to perform as a target being referenced in queries. In other words, > it is methods > > to access objects. > > It is natural both features are similar because both of them > intends extensions > > to hook the planner and executor, however, its purpose is > different. > > > > Thanks, > > -- > > NEC OSS Promotion Center / PG-Strom Project > > KaiGai Kohei <kaigai@ak.jp.nec.com> > > > > -- > > KaiGai Kohei <kaigai@kaigai.gr.jp> > > > -- > Sent via pgsql-hackers mailing list > (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > > > > > > -- > > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company
On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote:
> Sorry for jumping into this late.It does not work well because existing FDW infrastructure is designed to
>
> Instead of custom node, it might be better idea to improve FDW infrastructure
> to push join. For the starters, is it possible for the custom scan node
> hooks to create a ForeignScan node? In general, I think, it might be better
> for the custom scan hooks to create existing nodes if they serve the purpose.
>
perform on foreign tables, not regular tables. Probably, it needs to revise
much our assumption around the background code, if we re-define the purpose
of FDW infrastructure. For example, ForeignScan is expected to return a tuple
according to the TupleDesc that is exactly same with table definition.
It does not fit the requirement if we replace a join-node by ForeignScan
because its TupleDesc of joined relations is not predefined.
If one wants to push joins, aggregates, grouping across to other data sources capable of handling them, that will need to change. But, at the same time, letting custom scan node being able to decide that doesn't seem to be a very good idea. Although, through custom scan nodes, we can see the potential in adding these features.
I'd like to define these features are designed for individual purpose.
FDW is designed to intermediate an external data source and internal heap
representation according to foreign table definition. In other words, its
role is to generate contents of predefined database object on the fly.
On the other hands, custom-scan is designed to implement alternative ways
to scan / join relations in addition to the methods supported by built-in
feature.
I'm motivated to implement GPU acceleration feature that works transparently
for application. Thus, it has to be capable on regular tables, because most
of application stores data on regular tables, not foreign ones.
It looks like my description was misleading. In some cases, it might be possible that the ultimate functionality that a particular instantiation of custom node scan is already available as a Plan node in PG, but PG optimizer is not able to optimize the operation that way. In such case, custom scan node infrastructure should produce the corresponding Path node and not implement that functionality itself.
> Since a custom node is open implementation, it will be important to passYes. You are also right. But it also makes maintenance overhead if hook has
> as much information down to the hooks as possible; lest the hooks will be
> constrained. Since the functions signatures within the planner, optimizer
> will change from time to time, so the custom node hook signatures will need
> to change from time to time. That might turn out to be maintenance overhead.
>
many arguments nobody uses.
Probably, it makes sense to list up the arguments that cannot be reproduced
from other information, can be reproduced but complicated steps, and can be
reproduced easily.
Below is the information we cannot reproduce:
- PlannerInfo *root
- RelOptInfo *joinrel
- RelOptInfo *outerrel
- RelOptInfo *innerrel
- JoinType jointype
- SpecialJoinInfo *sjinfo
- List *restrictlist
Most of this information is available through corresponding RelOptInfo, or we should make RelOptInfo contain all the information related to every relation required to be computed during the query. So, any function which creates paths can just take that RelOptInfo as an argument and produce the path/s. That way there is lesser chance that the function signatures change.
Below is the information we can reproduce but complicated steps:
- List *mergeclause_list
- bool mergejoin_allow
- Relids param_source_rels
- Relids extra_lateral_rels
Below is the information we can reproduce easily:
- SemiAntiJoinFactors *semifactors
I think, the first two categories or the first category (if functions to
reproduce the second group is exposed) should be informed to extension,
however, priority of the third group is not high.
Yes. I plan to support above plan node, in addition to scan / join only.
> BTW, is it a good idea for custom nodes to also affect other paths like
> append, group etc.? Will it need separate hooks for each of those?
>
The custom-scan node is thin abstraction towards general executor behavior,
so I believe it is not hard to enhance this node, without new plan node
for each of them.
Of course, it will need separate hook to add alternative path on the planner
stage, but no individual plan nodes. (Sorry, it was unclear for me what
does the "hook" mean.)
If we represent all the operation like grouping, sorting, aggregation, as some sort of relation, we can create paths for each of the relation like we do (I am heavily borrowing from Tom's idea of pathifying those operations). We will need much lesser hooks in custom scan node.
BTW, from the patch, I do not see this change to be light weight. I was expecting more of a list of hooks to be defined by the user and this infrastructure just calling them at appropriate places.
> -----Original Message-----
> From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com]
> Sent: Tuesday, February 25, 2014 5:59 PM
> To: Kohei KaiGai> Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim
> Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut
> Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node)
>
>
>
>> On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> wrote:
>
>
> Folks,
>
> Let me remind the custom-scan patches; that is a basis feature of
> remote join of postgres_fdw, cache-only scan, (upcoming) GPU
> acceleration feature or various alternative ways to scan/join
> relations.
> Unfortunately, small amount of discussion we could have in this
> commit
> fest, even though Hanada-san volunteered to move the patches into
> "ready for committer" state at the CF-Nov.
>
>
>
> Sorry for jumping into this late.
>
> Instead of custom node, it might be better idea to improve FDW infrastructure
> to push join. For the starters, is it possible for the custom scan node
> hooks to create a ForeignScan node? In general, I think, it might be better
> for the custom scan hooks to create existing nodes if they serve the purpose.
>
>
>
>
> Prior to time-up, I'd like to ask hacker's opinion about its
> potential
> arguable points (from my standpoint) if it needs to be fixed up.
> One is hook definition to add alternative join path, and the other
> one
> is a special varno when a custom scan replace a join node.
> I'd like to see your opinion about them while we still have to change
> the design if needed.
>
> (1) Interface to add alternative paths in addition to built-in join
> paths
>
>
> This patch adds "add_join_path_hook" on add_paths_to_joinrel to
> allow
> extensions to provide alternative scan path in addition to the
> built-in
> join paths. Custom-scan path being added is assumed to perform to
> scan
> on a (virtual) relation that is a result set of joining relations.
> My concern is its arguments to be pushed. This hook is declared
> as follows:
>
> /* Hook for plugins to add custom join path, in addition to default
> ones */
> typedef void (*add_join_path_hook_type)(PlannerInfo *root,
> RelOptInfo *joinrel,
> RelOptInfo *outerrel,
> RelOptInfo *innerrel,
> JoinType jointype,
> SpecialJoinInfo
> *sjinfo,
> List *restrictlist,
> List *mergeclause_list,
> SemiAntiJoinFactors
> *semifactors,
> Relids
> param_source_rels,
> Relids
> extra_lateral_rels);
> extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook;
>
> Likely, its arguments upper than restrictlist should be informed
> to extensions,
> because these are also arguments of add_paths_to_joinrel().
> However, I'm not 100% certain how about other arguments should be
> informed.
> Probably, it makes sense to inform param_source_rels and
> extra_lateral_rels
> to check whether the path is sensible for parameterized paths.
> On the other hand, I doubt whether mergeclause_list is usuful to
> deliver.
> (It may make sense if someone tries to implement their own
> merge-join
> implementation??)
>
> I'd like to seem idea to improve the current interface
> specification.
>
>
>
>
> Since a custom node is open implementation, it will be important to pass
> as much information down to the hooks as possible; lest the hooks will be
> constrained. Since the functions signatures within the planner, optimizer
> will change from time to time, so the custom node hook signatures will need
> to change from time to time. That might turn out to be maintenance overhead.
>
>
> BTW, is it a good idea for custom nodes to also affect other paths like
> append, group etc.? Will it need separate hooks for each of those?
>
>
>
>
> (2) CUSTOM_VAR for special Var reference
>
> @@ -134,6 +134,7 @@ typedef struct Expr
> #define INNER_VAR 65000 /* reference to inner
> subplan */
> #define OUTER_VAR 65001 /* reference to outer
> subplan */
> #define INDEX_VAR 65002 /* reference to index
> column */
> +#define CUSTOM_VAR 65003 /* reference to custom
> column */
>
> I newly added CUSTOM_VAR to handle a case when custom-scan override
> join relations.
> Var-nodes within join plan are adjusted to refer either
> ecxt_innertuple or
> ecxt_outertuple of ExprContext. It makes a trouble if custom-scan
> runs
> instead of built-in joins, because its tuples being fetched are
> usually
> stored on the ecxt_scantuple, thus Var-nodes also need to have right
> varno neither inner nor outer.
>
> SetPlanRefCustomScan callback, being kicked on set_plan_refs,
> allows
> extensions to rewrite Var-nodes within custom-scan node to indicate
> ecxt_scantuple using CUSTOM_VAR, instead of inner or outer.
> For example, a var-node with varno=CUSTOM_VAR and varattno=3 means
> this node reference the third attribute of the tuple in
> ecxt_scantuple.
> I think it is a reasonable solution, however, I'm not 100% certain
> whether people have more graceful idea to implement it.
>
> If you have comments around above two topic, or others, please give
> your ideas.
>
> Thanks,
>
>
> 2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>:
>
> > Hi Stephen,
> >
> > Thanks for your comments.
> >
> >> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote:
> >> > Is somebody available to volunteer to review the custom-scan
> patch?
> >>
> >> I looked through it a bit and my first take away from it was
> that the patches
> >> to actually use the new hooks were also making more changes to
> the backend
> >> code, leaving me with the impression that the proposed interface
> isn't
> >> terribly stable. Perhaps those changes should have just been
> in the first
> >> patch, but they weren't and that certainly gave me pause.
> >>
> > Yes, the part-1 patch provides a set of interface portion to
> interact
> > between the backend code and extension code. Rest of part-2 and
> part-3
> > portions are contrib modules that implements its feature on top
> of
> > custom-scan API.
> >
> >> I'm also not entirely convinced that this is the direction to
> go in when
> >> it comes to pushing down joins to FDWs. While that's certainly
> a goal that
> >> I think we all share, this seems to be intending to add a
> completely different
> >> feature which happens to be able to be used for that. For FDWs,
> wouldn't
> >> we only present the FDW with the paths where the foreign tables
> for that
> >> FDW, or perhaps just a given foreign server, are being joined?
> >>
> > FDW's join pushing down is one of the valuable use-cases of this
> interface,
> > but not all. As you might know, my motivation is to implement
> GPU acceleration
> > feature on top of this interface, that offers alternative way
> to scan or join
> > relations or potentially sort or aggregate.
> > Probably, it is too stretch interpretation if we implement
> radix-sort on top
> > of FDW. I'd like you to understand the part-3 patch (FDW's join
> pushing-down)
> > is a demonstration of custom-scan interface for application, but
> not designed
> > for a special purpose.
> >
> > Right now, I put all the logic to interact CSI and FDW driver
> on postgres_fdw
> > side, it might be an idea to have common code (like a logic to
> check whether
> > the both relations to be joined belongs to same foreign server)
> on the backend
> > side as something like a gateway of them.
> >
> >
> > As an aside, what should be the scope of FDW interface?
> > In my understanding, it allows extension to implement "something"
> on behalf of
> > a particular data structure being declared with CREATE FOREIGN
> TABLE.
> > In other words, extension's responsibility is to generate a view
> of "something"
> > according to PostgreSQL' internal data structure, instead of the
> object itself.
> > On the other hands, custom-scan interface allows extensions to
> implement
> > alternative methods to scan or join particular relations, but
> it is not a role
> > to perform as a target being referenced in queries. In other words,
> it is methods
> > to access objects.
> > It is natural both features are similar because both of them
> intends extensions
> > to hook the planner and executor, however, its purpose is
> different.
> >
> > Thanks,
> > --
> > NEC OSS Promotion Center / PG-Strom Project
> > KaiGai Kohei <kaigai@ak.jp.nec.com>
>
>
>
> --
>
> KaiGai Kohei <kaigai@kaigai.gr.jp>
>
>
> --
> Sent via pgsql-hackers mailing list
> (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>
>
>
>
>
> --
>
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
2014-02-25 20:32 GMT+09:00 Ashutosh Bapat <ashutosh.bapat@enterprisedb.com>: > > On Tue, Feb 25, 2014 at 3:39 PM, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: >> >> > Sorry for jumping into this late. >> > >> > Instead of custom node, it might be better idea to improve FDW >> > infrastructure >> > to push join. For the starters, is it possible for the custom scan node >> > hooks to create a ForeignScan node? In general, I think, it might be >> > better >> > for the custom scan hooks to create existing nodes if they serve the >> > purpose. >> > >> It does not work well because existing FDW infrastructure is designed to >> perform on foreign tables, not regular tables. Probably, it needs to >> revise >> much our assumption around the background code, if we re-define the >> purpose >> of FDW infrastructure. For example, ForeignScan is expected to return a >> tuple >> according to the TupleDesc that is exactly same with table definition. >> It does not fit the requirement if we replace a join-node by ForeignScan >> because its TupleDesc of joined relations is not predefined. >> > > If one wants to push joins, aggregates, grouping across to other data > sources capable of handling them, that will need to change. But, at the same > time, letting custom scan node being able to decide that doesn't seem to be > a very good idea. Although, through custom scan nodes, we can see the > potential in adding these features. > Of course, existing form of custom-scan node is designed to support scan or join relations, as a first step. It will also need some enhancement to support other class of execution node in the future version. I'm not certain why it is problematic. >> I'd like to define these features are designed for individual purpose. >> FDW is designed to intermediate an external data source and internal heap >> representation according to foreign table definition. In other words, its >> role is to generate contents of predefined database object on the fly. >> On the other hands, custom-scan is designed to implement alternative ways >> to scan / join relations in addition to the methods supported by built-in >> feature. >> >> I'm motivated to implement GPU acceleration feature that works >> transparently >> for application. Thus, it has to be capable on regular tables, because >> most >> of application stores data on regular tables, not foreign ones. >> > > It looks like my description was misleading. In some cases, it might be > possible that the ultimate functionality that a particular instantiation of > custom node scan is already available as a Plan node in PG, but PG optimizer > is not able to optimize the operation that way. In such case, custom scan > node infrastructure should produce the corresponding Path node and not > implement that functionality itself. > You are suggesting that CustomSort, CustomAgg, CustomAppend and so on should be supported in the future version, for better integration with the plan optimizer. Right? It is probably a good idea if optimizer needs to identify CustomXXXX node using node tag, rather than something others like custom-scan provider name, Right now, custom-scan feature focuses on optimization of relation scan and join as its first scope, and does not need to identify the class of corresponding Path node. On the upthread of this discussion, I initially proposed to have separated CustomScan and CustomJoin node, however, our consensus was that CustomScan can perform as like a scan on the result set of joined relations, so I dropped multiple node types from the first version. >> > Since a custom node is open implementation, it will be important to pass >> > as much information down to the hooks as possible; lest the hooks will >> > be >> > constrained. Since the functions signatures within the planner, >> > optimizer >> > will change from time to time, so the custom node hook signatures will >> > need >> > to change from time to time. That might turn out to be maintenance >> > overhead. >> > >> Yes. You are also right. But it also makes maintenance overhead if hook >> has >> many arguments nobody uses. >> Probably, it makes sense to list up the arguments that cannot be >> reproduced >> from other information, can be reproduced but complicated steps, and can >> be >> reproduced easily. >> >> Below is the information we cannot reproduce: >> - PlannerInfo *root >> - RelOptInfo *joinrel >> - RelOptInfo *outerrel >> - RelOptInfo *innerrel >> - JoinType jointype >> - SpecialJoinInfo *sjinfo >> - List *restrictlist >> > > Most of this information is available through corresponding RelOptInfo, or > we should make RelOptInfo contain all the information related to every > relation required to be computed during the query. So, any function which > creates paths can just take that RelOptInfo as an argument and produce the > path/s. That way there is lesser chance that the function signatures change. > Uhmm.... It is inconvenience to write extensions. I want the variables in the first and second groups being delivered to the hook, even though it may have minor modification in the future release. Relations join is one of the heart of RDBMS, so I'd like to believe these arguments are one of the most stable stuffs. >> Below is the information we can reproduce but complicated steps: >> - List *mergeclause_list >> - bool mergejoin_allow >> - Relids param_source_rels >> - Relids extra_lateral_rels >> >> Below is the information we can reproduce easily: >> - SemiAntiJoinFactors *semifactors >> >> I think, the first two categories or the first category (if functions to >> reproduce the second group is exposed) should be informed to extension, >> however, priority of the third group is not high. >> >> >> > BTW, is it a good idea for custom nodes to also affect other paths like >> > append, group etc.? Will it need separate hooks for each of those? >> > >> Yes. I plan to support above plan node, in addition to scan / join only. >> The custom-scan node is thin abstraction towards general executor >> behavior, >> so I believe it is not hard to enhance this node, without new plan node >> for each of them. >> Of course, it will need separate hook to add alternative path on the >> planner >> stage, but no individual plan nodes. (Sorry, it was unclear for me what >> does the "hook" mean.) >> > > If we represent all the operation like grouping, sorting, aggregation, as > some sort of relation, we can create paths for each of the relation like we > do (I am heavily borrowing from Tom's idea of pathifying those operations). > We will need much lesser hooks in custom scan node. > > BTW, from the patch, I do not see this change to be light weight. I was > expecting more of a list of hooks to be defined by the user and this > infrastructure just calling them at appropriate places. > Let's focus on scan and join that we are currently working on. Even if we need separate node type for grouping or sorting, it will not be necessary to construct whole of the framework from the scratch. For example, definition of CustomProvider table will be able to reuse for other class of operations, because most of them are thin abstraction of existing executor's interface. Thanks, >> Thanks, >> -- >> NEC OSS Promotion Center / PG-Strom Project >> KaiGai Kohei <kaigai@ak.jp.nec.com> >> >> >> > -----Original Message----- >> > From: Ashutosh Bapat [mailto:ashutosh.bapat@enterprisedb.com] >> > Sent: Tuesday, February 25, 2014 5:59 PM >> > To: Kohei KaiGai >> > Cc: Kaigai, Kouhei(海外, 浩平); Stephen Frost; Shigeru Hanada; Jim >> > Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter Eisentraut >> > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) >> > >> > >> > >> > >> > On Sun, Feb 23, 2014 at 6:54 PM, Kohei KaiGai <kaigai@kaigai.gr.jp> >> > wrote: >> > >> > >> > Folks, >> > >> > Let me remind the custom-scan patches; that is a basis feature of >> > remote join of postgres_fdw, cache-only scan, (upcoming) GPU >> > acceleration feature or various alternative ways to scan/join >> > relations. >> > Unfortunately, small amount of discussion we could have in this >> > commit >> > fest, even though Hanada-san volunteered to move the patches into >> > "ready for committer" state at the CF-Nov. >> > >> > >> > >> > Sorry for jumping into this late. >> > >> > Instead of custom node, it might be better idea to improve FDW >> > infrastructure >> > to push join. For the starters, is it possible for the custom scan node >> > hooks to create a ForeignScan node? In general, I think, it might be >> > better >> > for the custom scan hooks to create existing nodes if they serve the >> > purpose. >> > >> > >> > >> > >> > Prior to time-up, I'd like to ask hacker's opinion about its >> > potential >> > arguable points (from my standpoint) if it needs to be fixed up. >> > One is hook definition to add alternative join path, and the other >> > one >> > is a special varno when a custom scan replace a join node. >> > I'd like to see your opinion about them while we still have to >> > change >> > the design if needed. >> > >> > (1) Interface to add alternative paths in addition to built-in >> > join >> > paths >> > >> > >> > This patch adds "add_join_path_hook" on add_paths_to_joinrel to >> > allow >> > extensions to provide alternative scan path in addition to the >> > built-in >> > join paths. Custom-scan path being added is assumed to perform to >> > scan >> > on a (virtual) relation that is a result set of joining relations. >> > My concern is its arguments to be pushed. This hook is declared >> > as follows: >> > >> > /* Hook for plugins to add custom join path, in addition to >> > default >> > ones */ >> > typedef void (*add_join_path_hook_type)(PlannerInfo *root, >> > RelOptInfo *joinrel, >> > RelOptInfo *outerrel, >> > RelOptInfo *innerrel, >> > JoinType jointype, >> > SpecialJoinInfo >> > *sjinfo, >> > List *restrictlist, >> > List *mergeclause_list, >> > SemiAntiJoinFactors >> > *semifactors, >> > Relids >> > param_source_rels, >> > Relids >> > extra_lateral_rels); >> > extern PGDLLIMPORT add_join_path_hook_type add_join_path_hook; >> > >> > Likely, its arguments upper than restrictlist should be informed >> > to extensions, >> > because these are also arguments of add_paths_to_joinrel(). >> > However, I'm not 100% certain how about other arguments should be >> > informed. >> > Probably, it makes sense to inform param_source_rels and >> > extra_lateral_rels >> > to check whether the path is sensible for parameterized paths. >> > On the other hand, I doubt whether mergeclause_list is usuful to >> > deliver. >> > (It may make sense if someone tries to implement their own >> > merge-join >> > implementation??) >> > >> > I'd like to seem idea to improve the current interface >> > specification. >> > >> > >> > >> > >> > Since a custom node is open implementation, it will be important to pass >> > as much information down to the hooks as possible; lest the hooks will >> > be >> > constrained. Since the functions signatures within the planner, >> > optimizer >> > will change from time to time, so the custom node hook signatures will >> > need >> > to change from time to time. That might turn out to be maintenance >> > overhead. >> > >> > >> > BTW, is it a good idea for custom nodes to also affect other paths like >> > append, group etc.? Will it need separate hooks for each of those? >> > >> > >> > >> > >> > (2) CUSTOM_VAR for special Var reference >> > >> > @@ -134,6 +134,7 @@ typedef struct Expr >> > #define INNER_VAR 65000 /* reference to inner >> > subplan */ >> > #define OUTER_VAR 65001 /* reference to outer >> > subplan */ >> > #define INDEX_VAR 65002 /* reference to index >> > column */ >> > +#define CUSTOM_VAR 65003 /* reference to custom >> > column */ >> > >> > I newly added CUSTOM_VAR to handle a case when custom-scan >> > override >> > join relations. >> > Var-nodes within join plan are adjusted to refer either >> > ecxt_innertuple or >> > ecxt_outertuple of ExprContext. It makes a trouble if custom-scan >> > runs >> > instead of built-in joins, because its tuples being fetched are >> > usually >> > stored on the ecxt_scantuple, thus Var-nodes also need to have >> > right >> > varno neither inner nor outer. >> > >> > SetPlanRefCustomScan callback, being kicked on set_plan_refs, >> > allows >> > extensions to rewrite Var-nodes within custom-scan node to >> > indicate >> > ecxt_scantuple using CUSTOM_VAR, instead of inner or outer. >> > For example, a var-node with varno=CUSTOM_VAR and varattno=3 means >> > this node reference the third attribute of the tuple in >> > ecxt_scantuple. >> > I think it is a reasonable solution, however, I'm not 100% certain >> > whether people have more graceful idea to implement it. >> > >> > If you have comments around above two topic, or others, please >> > give >> > your ideas. >> > >> > Thanks, >> > >> > >> > 2014-01-28 9:14 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >> > >> > > Hi Stephen, >> > > >> > > Thanks for your comments. >> > > >> > >> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: >> > >> > Is somebody available to volunteer to review the custom-scan >> > patch? >> > >> >> > >> I looked through it a bit and my first take away from it was >> > that the patches >> > >> to actually use the new hooks were also making more changes to >> > the backend >> > >> code, leaving me with the impression that the proposed >> > interface >> > isn't >> > >> terribly stable. Perhaps those changes should have just been >> > in the first >> > >> patch, but they weren't and that certainly gave me pause. >> > >> >> > > Yes, the part-1 patch provides a set of interface portion to >> > interact >> > > between the backend code and extension code. Rest of part-2 and >> > part-3 >> > > portions are contrib modules that implements its feature on top >> > of >> > > custom-scan API. >> > > >> > >> I'm also not entirely convinced that this is the direction to >> > go in when >> > >> it comes to pushing down joins to FDWs. While that's certainly >> > a goal that >> > >> I think we all share, this seems to be intending to add a >> > completely different >> > >> feature which happens to be able to be used for that. For >> > FDWs, >> > wouldn't >> > >> we only present the FDW with the paths where the foreign tables >> > for that >> > >> FDW, or perhaps just a given foreign server, are being joined? >> > >> >> > > FDW's join pushing down is one of the valuable use-cases of this >> > interface, >> > > but not all. As you might know, my motivation is to implement >> > GPU acceleration >> > > feature on top of this interface, that offers alternative way >> > to scan or join >> > > relations or potentially sort or aggregate. >> > > Probably, it is too stretch interpretation if we implement >> > radix-sort on top >> > > of FDW. I'd like you to understand the part-3 patch (FDW's join >> > pushing-down) >> > > is a demonstration of custom-scan interface for application, but >> > not designed >> > > for a special purpose. >> > > >> > > Right now, I put all the logic to interact CSI and FDW driver >> > on postgres_fdw >> > > side, it might be an idea to have common code (like a logic to >> > check whether >> > > the both relations to be joined belongs to same foreign server) >> > on the backend >> > > side as something like a gateway of them. >> > > >> > > >> > > As an aside, what should be the scope of FDW interface? >> > > In my understanding, it allows extension to implement >> > "something" >> > on behalf of >> > > a particular data structure being declared with CREATE FOREIGN >> > TABLE. >> > > In other words, extension's responsibility is to generate a view >> > of "something" >> > > according to PostgreSQL' internal data structure, instead of the >> > object itself. >> > > On the other hands, custom-scan interface allows extensions to >> > implement >> > > alternative methods to scan or join particular relations, but >> > it is not a role >> > > to perform as a target being referenced in queries. In other >> > words, >> > it is methods >> > > to access objects. >> > > It is natural both features are similar because both of them >> > intends extensions >> > > to hook the planner and executor, however, its purpose is >> > different. >> > > >> > > Thanks, >> > > -- >> > > NEC OSS Promotion Center / PG-Strom Project >> > > KaiGai Kohei <kaigai@ak.jp.nec.com> >> > >> > >> > >> > -- >> > >> > KaiGai Kohei <kaigai@kaigai.gr.jp> >> > >> > >> > -- >> > Sent via pgsql-hackers mailing list >> > (pgsql-hackers@postgresql.org) >> > To make changes to your subscription: >> > http://www.postgresql.org/mailpref/pgsql-hackers >> > >> > >> > >> > >> > >> > >> > -- >> > >> > Best Wishes, >> > Ashutosh Bapat >> > EnterpriseDB Corporation >> > The Postgres Database Company >> > > > > -- > Best Wishes, > Ashutosh Bapat > EnterpriseDB Corporation > The Postgres Database Company -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Hi Kaigai-san, 2014-02-25 13:28 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > The reason why I asked the question above is, I haven't been 100% certain > about its usage. Indeed, semifactors is applied on a limited usage, but > quite easy to reproduce by extension later (using clauselist_selectivity) > if extension wants this factor. So, I agree with removing the semifactors > here. Agreed. It would be nice to mention how to obtain semifactos for people who want to implement advanced join overriding. >> mergeclause_list and param_source_rels seem little easier to use, but >> anyway it should be documented how to use those parameters. >> > The mergeclause_list might not be sufficient for extensions to determine > whether its own mergejoin is applicable here. See the comment below; that > is on the head of select_mergejoin_clauses. > > | * *mergejoin_allowed is normally set to TRUE, but it is set to FALSE if > | * this is a right/full join and there are nonmergejoinable join clauses. > | * The executor's mergejoin machinery cannot handle such cases, so we have > | * to avoid generating a mergejoin plan. (Note that this flag does NOT > | * consider whether there are actually any mergejoinable clauses. This is > | * correct because in some cases we need to build a clauseless mergejoin. > | * Simply returning NIL is therefore not enough to distinguish safe from > | * unsafe cases.) > | > It says, mergejoin_clause == NIL is not a sufficient check to determine > whether the mergejoin logic is applicable on the target join. > So, either of them is probably an option for extension that tries to implement Perhaps you mean "both of them"? > their own mergejoin logic; (1) putting both of mergejoin_allowed and > mergeclause_list as arguments of the hook, or (2) re-definition of > select_mergejoin_clauses() as extern function to reproduce the variables on > demand. Which one is more preferable? I prefer (1), because exposing inside of planner might blocks changing those internal functions. If (at the moment) those information is enough for overriding merge join for CSP, let's provide as parameters. -- Shigeru HANADA
Hi Kaigai-san, 2014-02-23 22:24 GMT+09:00 Kohei KaiGai <kaigai@kaigai.gr.jp>: > (1) Interface to add alternative paths in addition to built-in join paths I found that create_custom_path is not used at all in your patch. I revised postgresql_fdw.c to use it like this. ... /* Create join information which is stored as private information. */ memset(&jinfo, 0, sizeof(PgRemoteJoinInfo)); jinfo.fdw_server_oid = o_server_oid; jinfo.fdw_user_oid = o_user_oid; jinfo.relids = joinrel->relids; jinfo.jointype = jointype; jinfo.outer_rel = o_relinfo; jinfo.inner_rel = i_relinfo; jinfo.remote_conds= j_remote_conds; jinfo.local_conds = j_local_conds; /* OK, make a CustomScan node to run remote join */ cpath = create_customscan_path(root, joinrel, 0, 0, 0, /* estimate later */ NIL, required_outer, "postgres-fdw", 0, packPgRemoteJoinInfo(&jinfo)); estimate_remote_join_cost(root, cpath, &jinfo, sjinfo); add_path(joinrel, &cpath->path); ... This seems to work fine. Is this right approach? If so,this portion would be a good example to replace local join with custom scan for authors of custom scan providers. One thing I worry is the case that you've intentionally avoided calling create_customscan_path. -- Shigeru HANADA
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > Instead of custom node, it might be better idea to improve FDW infrastructure > > to push join. For the starters, is it possible for the custom scan node > > hooks to create a ForeignScan node? In general, I think, it might be better > > for the custom scan hooks to create existing nodes if they serve the purpose. > > > It does not work well because existing FDW infrastructure is designed to > perform on foreign tables, not regular tables. Probably, it needs to revise > much our assumption around the background code, if we re-define the purpose > of FDW infrastructure. For example, ForeignScan is expected to return a tuple > according to the TupleDesc that is exactly same with table definition. > It does not fit the requirement if we replace a join-node by ForeignScan > because its TupleDesc of joined relations is not predefined. I'm not following this logic at all- how are you defining "foreign" from "regular"? Certainly, in-memory-only tables which are sitting out in some non-persistent GPU memory aren't "regular" by any PG definition. Perhaps you can't make ForeignScan suddenly work as a join-node replacement, but I've not seen where anyone has proposed that (directly- I've implied it on occation where a remote view can be used, but that's not the same thing as having proper push-down support for joins). > I'd like to define these features are designed for individual purpose. My previous complaint about this patch set has been precisely that each piece seems to be custom-built and every patch needs more and more backend changes. If every time someone wants to do something with this CustomScan API, they need changes made to the backend code, then it's not a generally useful external API. We really don't want to define such an external API as then we have to deal with backwards compatibility, particularly when it's all specialized to specific use cases which are all different. > FDW is designed to intermediate an external data source and internal heap > representation according to foreign table definition. In other words, its > role is to generate contents of predefined database object on the fly. There's certainly nothing in the FDW API which requires that the remote side have an internal heap representation, as evidenced by the various FDWs which already exist and certainly are not any kind of 'normal' heap. Every query against the foriegn relation goes through the FDW API and can end up returning whatever the FDW author decides is appropriate to return at that time, as long as it matches the tuple description- which is absolutely necessary for any kind of sanity, imv. > On the other hands, custom-scan is designed to implement alternative ways > to scan / join relations in addition to the methods supported by built-in > feature. I can see the usefulness in being able to push down aggregates or other function-type calls to the remote side of an FDW and would love to see work done along those lines, along with the ability to push down joins to remote systems- but I'm not convinced that the claimed flexibility with the CustomScan API is there, given the need to continue modifying the backend code for each use-case, nor that there are particularly new and inventive ways of saying "find me all the cases where set X overlaps with set Y". I'm certainly open to the idea that we could have an FDW API which allows us to ask exactly that question and let the remote side cost it out and give us an answer for a pair of relations but that isn't what this is. Note also that in any kind of aggregation push-down we must be sure that the function is well-defined and that the FDW is on the hook to ensure that the returned data is the same as if we ran the same aggregate function locally, otherwise the results of a query might differ based on if the aggregate was fired locally or remotely (which could be influenced by costing- eg: the size of the relation or its statistics). > I'm motivated to implement GPU acceleration feature that works transparently > for application. Thus, it has to be capable on regular tables, because most > of application stores data on regular tables, not foreign ones. You want to persist that data in the GPU across multiple calls though, which makes it unlike any kind of regular PG table and much more like some foreign table. Perhaps the data is initially loaded from a local table and then updated on the GPU card in some way when the 'real' table is updated, but neither of those makes it a "regular" PG table. > > Since a custom node is open implementation, it will be important to pass > > as much information down to the hooks as possible; lest the hooks will be > > constrained. Since the functions signatures within the planner, optimizer > > will change from time to time, so the custom node hook signatures will need > > to change from time to time. That might turn out to be maintenance overhead. It's more than "from time-to-time", it was "for each use case in the given patch set asking for this feature", which is why I'm pushing back on it. > Yes. You are also right. But it also makes maintenance overhead if hook has > many arguments nobody uses. I can agree with this- there should be a sensible API if we're going to do this. > Probably, it makes sense to list up the arguments that cannot be reproduced > from other information, can be reproduced but complicated steps, and can be > reproduced easily. This really strikes me as the wrong approach for an FDW join-pushdown API, which should be geared around giving the remote side an opportunity on a case-by-case basis to cost out joins using whatever methods it has available to implement them. I've outlined above the reasons I don't agree with just making the entire planner/optimizer pluggable. Thanks, Stephen
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > Yes, the part-1 patch provides a set of interface portion to interact > between the backend code and extension code. Rest of part-2 and part-3 > portions are contrib modules that implements its feature on top of > custom-scan API. Just to come back to this- the other two "contrib module" patches, at least as I read over their initial submission, were *also* patching portions of backend code which it was apparently discovered that they needed. That's a good bit of my complaint regarding this approach. > FDW's join pushing down is one of the valuable use-cases of this interface, > but not all. As you might know, my motivation is to implement GPU acceleration > feature on top of this interface, that offers alternative way to scan or join > relations or potentially sort or aggregate. If you're looking to just use GPU acceleration for improving individual queries, I would think that Robert's work around backend workers would be a more appropriate way to go, with the ability to move a working set of data from shared buffers and on-disk representation of a relation over to the GPU's memory, perform the operation, and then copy the results back. If that's not possible or effective wrt performance, then I think we need to look at managing the external GPU memory as a foreign system through an FDW which happens to be updated through triggers or similar. The same could potentially be done for memcached systems, etc. "regular" PG tables, just to point out one issue, can be locked on a row-by-row basis, and we know exactly where in shared buffers to go hunt down the rows. How is that going to work here, if this is both a "regular" table and stored off in a GPU's memory across subsequent queries or even transactions? > Right now, I put all the logic to interact CSI and FDW driver on postgres_fdw > side, it might be an idea to have common code (like a logic to check whether > the both relations to be joined belongs to same foreign server) on the backend > side as something like a gateway of them. Yes, that's what I was suggesting above- we should be asking the FDWs on a case-by-case basis how to cost out the join between foreign tables which they are responsible for. Asking two different FDWs servers to cost out a join between their tables doesn't make any sense to me. > As an aside, what should be the scope of FDW interface? > In my understanding, it allows extension to implement "something" on behalf of > a particular data structure being declared with CREATE FOREIGN TABLE. That's where it is today, but certainly not our end goal. > In other words, extension's responsibility is to generate a view of "something" > according to PostgreSQL' internal data structure, instead of the object itself. The result of the FDW call needs to be something which PG understands and can work with, otherwise we wouldn't be able to, say, run PL/pgsql code on the result, or pass it into some other aggregate which we decided was cheaper to run locally. Being able to push down aggregates to the remote side of an FDW certainly fits in quite well with that. > On the other hands, custom-scan interface allows extensions to implement > alternative methods to scan or join particular relations, but it is not a role > to perform as a target being referenced in queries. In other words, it is methods > to access objects. The custom-scan interface still needs to produce "something" according to PG's internal data structures, so it's not clear to me where you're going with this. > It is natural both features are similar because both of them intends extensions > to hook the planner and executor, however, its purpose is different. I disagree as I don't really view FDWs as "hooks". A "hook" is more like a trigger- sure, you can modify the data in transit, or throw an error if you see an issue, but you don't get to redefine the world and throw out what the planner or optimizer knows about the rest of what is going on in the query. Thanks, Stephen
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > > Instead of custom node, it might be better idea to improve FDW > > > infrastructure to push join. For the starters, is it possible for > > > the custom scan node hooks to create a ForeignScan node? In general, > > > I think, it might be better for the custom scan hooks to create existing > nodes if they serve the purpose. > > > > > It does not work well because existing FDW infrastructure is designed > > to perform on foreign tables, not regular tables. Probably, it needs > > to revise much our assumption around the background code, if we > > re-define the purpose of FDW infrastructure. For example, ForeignScan > > is expected to return a tuple according to the TupleDesc that is exactly > same with table definition. > > It does not fit the requirement if we replace a join-node by > > ForeignScan because its TupleDesc of joined relations is not predefined. > > I'm not following this logic at all- how are you defining "foreign" from > "regular"? Certainly, in-memory-only tables which are sitting out in some > non-persistent GPU memory aren't "regular" by any PG definition. > Perhaps you can't make ForeignScan suddenly work as a join-node replacement, > but I've not seen where anyone has proposed that (directly- I've implied > it on occation where a remote view can be used, but that's not the same > thing as having proper push-down support for joins). > This regular one means usual tables. Even though custom implementation may reference self-managed in-memory cache instead of raw heap, the table pointed in user's query shall be a usual table. In the past, Hanada-san had proposed an enhancement of FDW to support remote-join but eventually rejected. > > I'd like to define these features are designed for individual purpose. > > My previous complaint about this patch set has been precisely that each > piece seems to be custom-built and every patch needs more and more backend > changes. If every time someone wants to do something with this CustomScan > API, they need changes made to the backend code, then it's not a generally > useful external API. We really don't want to define such an external API > as then we have to deal with backwards compatibility, particularly when > it's all specialized to specific use cases which are all different. > The changes to backend are just for convenient. We may be able to implement functions to translate Bitmapset from/to cstring form in postgres_fdw, does it make sense to maintain individually? I thought these functions were useful to have in the backend commonly, but is not a fundamental functionality lacks of the custom-scan interface. > > FDW is designed to intermediate an external data source and internal > > heap representation according to foreign table definition. In other > > words, its role is to generate contents of predefined database object > on the fly. > > There's certainly nothing in the FDW API which requires that the remote > side have an internal heap representation, as evidenced by the various FDWs > which already exist and certainly are not any kind of 'normal' > heap. Every query against the foriegn relation goes through the FDW API > and can end up returning whatever the FDW author decides is appropriate > to return at that time, as long as it matches the tuple description- which > is absolutely necessary for any kind of sanity, imv. > Yes. It's my understanding for the role of FDW driver. > > On the other hands, custom-scan is designed to implement alternative > > ways to scan / join relations in addition to the methods supported by > > built-in feature. > > I can see the usefulness in being able to push down aggregates or other > function-type calls to the remote side of an FDW and would love to see work > done along those lines, along with the ability to push down joins to remote > systems- but I'm not convinced that the claimed flexibility with the > CustomScan API is there, given the need to continue modifying the backend > code for each use-case, nor that there are particularly new and inventive > ways of saying "find me all the cases where set X overlaps with set Y". > I'm certainly open to the idea that we could have an FDW API which allows > us to ask exactly that question and let the remote side cost it out and > give us an answer for a pair of relations but that isn't what this is. Note > also that in any kind of aggregation push-down we must be sure that the > function is well-defined and that the FDW is on the hook to ensure that > the returned data is the same as if we ran the same aggregate function locally, > otherwise the results of a query might differ based on if the aggregate > was fired locally or remotely (which could be influenced by costing- eg: > the size of the relation or its statistics). > I can also understand the usefulness of join or aggregation into the remote side in case of foreign table reference. In similar way, it is also useful if we can push these CPU intensive operations into co-processors on regular table references. As I mentioned above, the backend changes by the part-2/-3 patches are just minor stuff, and I thought it should not be implemented by contrib module locally. Regarding to the condition where we can run remote aggregation, you are right. As current postgres_fdw push-down qualifiers into remote side, we need to ensure remote aggregate definition is identical with local one. > > I'm motivated to implement GPU acceleration feature that works > > transparently for application. Thus, it has to be capable on regular > > tables, because most of application stores data on regular tables, not > foreign ones. > > You want to persist that data in the GPU across multiple calls though, which > makes it unlike any kind of regular PG table and much more like some foreign > table. Perhaps the data is initially loaded from a local table and then > updated on the GPU card in some way when the 'real' table is updated, but > neither of those makes it a "regular" PG table. > No. What I want to implement is, read the regular table and transfer the contents into GPU's local memory for calculation, then receives its calculation result. The in-memory cache (also I'm working on) is supplemental stuff because disk access is much slower and row-oriented data structure is not suitable for SIMD style instructions. > > > Since a custom node is open implementation, it will be important to > > > pass as much information down to the hooks as possible; lest the > > > hooks will be constrained. Since the functions signatures within > > > the planner, optimizer will change from time to time, so the custom > > > node hook signatures will need to change from time to time. That might > turn out to be maintenance overhead. > > It's more than "from time-to-time", it was "for each use case in the given > patch set asking for this feature", which is why I'm pushing back on it. > My patch set didn't change the interface itself. All it added was (probably) useful utility routines to be placed on the backend, rather than contrib. > > Yes. You are also right. But it also makes maintenance overhead if > > hook has many arguments nobody uses. > > I can agree with this- there should be a sensible API if we're going to > do this. > > > Probably, it makes sense to list up the arguments that cannot be > > reproduced from other information, can be reproduced but complicated > > steps, and can be reproduced easily. > > This really strikes me as the wrong approach for an FDW join-pushdown API, > which should be geared around giving the remote side an opportunity on a > case-by-case basis to cost out joins using whatever methods it has available > to implement them. I've outlined above the reasons I don't agree with just > making the entire planner/optimizer pluggable. > I'm also inclined to have arguments that will provide enough information for extensions to determine the best path for them. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > This regular one means usual tables. Even though custom implementation > may reference self-managed in-memory cache instead of raw heap, the table > pointed in user's query shall be a usual table. > In the past, Hanada-san had proposed an enhancement of FDW to support > remote-join but eventually rejected. I'm not aware of the specifics around that proposal but I don't believe we, as a community, have decided to reject the idea in general. > The changes to backend are just for convenient. We may be able to implement > functions to translate Bitmapset from/to cstring form in postgres_fdw, > does it make sense to maintain individually? Perhaps not. > I thought these functions were useful to have in the backend commonly, but > is not a fundamental functionality lacks of the custom-scan interface. Then perhaps they should be exposed more directly? I can understand generally useful functionality being exposed in a way that anyone can use it, but we need to avoid interfaces which can't be stable due to normal / ongoing changes to the backend code. > I can also understand the usefulness of join or aggregation into the remote > side in case of foreign table reference. In similar way, it is also useful > if we can push these CPU intensive operations into co-processors on regular > table references. That's fine, if we can get data to and from those co-processors efficiently enough that it's worth doing so. If moving the data to the GPU's memory will take longer than running the actual aggregation, then it doesn't make any sense for regular tables because then we'd have to cache the data in the GPU's memory in some way across multiple queries, which isn't something we're set up to do. > As I mentioned above, the backend changes by the part-2/-3 patches are just > minor stuff, and I thought it should not be implemented by contrib module > locally. Fine- then propose them as generally useful additions, not as patches which are supposed to just be for contrib modules using an already defined interface. If you can make a case for that then perhaps this is more practical. > Regarding to the condition where we can run remote aggregation, you are > right. As current postgres_fdw push-down qualifiers into remote side, > we need to ensure remote aggregate definition is identical with local one. Of course. > No. What I want to implement is, read the regular table and transfer the > contents into GPU's local memory for calculation, then receives its > calculation result. The in-memory cache (also I'm working on) is supplemental > stuff because disk access is much slower and row-oriented data structure is > not suitable for SIMD style instructions. Is that actually performant? Is it actually faster than processing the data directly? The discussions that I've had with folks have cast a great deal of doubt in my mind about just how well that kind of quick turn-around to the GPU's memory actually works. > > This really strikes me as the wrong approach for an FDW join-pushdown API, > > which should be geared around giving the remote side an opportunity on a > > case-by-case basis to cost out joins using whatever methods it has available > > to implement them. I've outlined above the reasons I don't agree with just > > making the entire planner/optimizer pluggable. > > > I'm also inclined to have arguments that will provide enough information > for extensions to determine the best path for them. For join push-down, I proposed above that we have an interface to the FDW which allows us to ask it how much each join of the tables which are on a given FDW's server would cost if the FDW did it vs. pulling it back and doing it locally. We could also pass all of the relations to the FDW with the various join-quals and try to get an answer to everything, but I'm afraid that'd simply end up duplicating the logic of the optimizer into every FDW, which would be counter-productive. Admittedly, getting the costing right isn't easy either, but it's not clear to me how it'd make sense for the local server to be doing costing for remote servers. Thanks, Stephen
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > Yes, the part-1 patch provides a set of interface portion to interact > > between the backend code and extension code. Rest of part-2 and part-3 > > portions are contrib modules that implements its feature on top of > > custom-scan API. > > Just to come back to this- the other two "contrib module" patches, at least > as I read over their initial submission, were *also* patching portions of > backend code which it was apparently discovered that they needed. That's > a good bit of my complaint regarding this approach. > ?? Sorry, are you still negative on the portion of backend patched by the part-2 and part-3 portion?? > > FDW's join pushing down is one of the valuable use-cases of this > > interface, but not all. As you might know, my motivation is to > > implement GPU acceleration feature on top of this interface, that > > offers alternative way to scan or join relations or potentially sort or > aggregate. > > If you're looking to just use GPU acceleration for improving individual > queries, I would think that Robert's work around backend workers would be > a more appropriate way to go, with the ability to move a working set of > data from shared buffers and on-disk representation of a relation over to > the GPU's memory, perform the operation, and then copy the results back. > The approach is similar to the Robert's work except for GPU adoption, instead of multicore CPUs. So, I tried to review his work to apply the facilities on my extension also. > If that's not possible or effective wrt performance, then I think we need > to look at managing the external GPU memory as a foreign system through > an FDW which happens to be updated through triggers or similar. The same > could potentially be done for memcached systems, etc. > I didn't imagine the idea that expose GPU's local memory. A supplemental stuff for the data load performance I'm planning is just a cache mechanism besides regular tables. > "regular" PG tables, just to point out one issue, can be locked on a > row-by-row basis, and we know exactly where in shared buffers to go hunt > down the rows. How is that going to work here, if this is both a "regular" > table and stored off in a GPU's memory across subsequent queries or even > transactions? > It shall be handled "case-by-case" basis, I think. If row-level lock is required over the table scan, custom-scan node shall return a tuple being located on the shared buffer, instead of the cached tuples. Of course, it is an option for custom-scan node to calculate qualifiers by GPU with cached data and returns tuples identified by ctid of the cached tuples. Anyway, it is not a significant problem. > > Right now, I put all the logic to interact CSI and FDW driver on > > postgres_fdw side, it might be an idea to have common code (like a > > logic to check whether the both relations to be joined belongs to same > > foreign server) on the backend side as something like a gateway of them. > > Yes, that's what I was suggesting above- we should be asking the FDWs on > a case-by-case basis how to cost out the join between foreign tables which > they are responsible for. Asking two different FDWs servers to cost out > a join between their tables doesn't make any sense to me. > OK, I'll move the portion that will be needed commonly for other FDWs into the backend code. > > As an aside, what should be the scope of FDW interface? > > In my understanding, it allows extension to implement "something" on > > behalf of a particular data structure being declared with CREATE FOREIGN > TABLE. > > That's where it is today, but certainly not our end goal. > > > In other words, extension's responsibility is to generate a view of > "something" > > according to PostgreSQL' internal data structure, instead of the object > itself. > > The result of the FDW call needs to be something which PG understands and > can work with, otherwise we wouldn't be able to, say, run PL/pgsql code > on the result, or pass it into some other aggregate which we decided was > cheaper to run locally. Being able to push down aggregates to the remote > side of an FDW certainly fits in quite well with that. > Yes. According to the previous discussion around postgres_fdw getting merged, all we can trust on the remote side are built-in data types, functions, operators or other stuffs only. > > On the other hands, custom-scan interface allows extensions to > > implement alternative methods to scan or join particular relations, > > but it is not a role to perform as a target being referenced in > > queries. In other words, it is methods to access objects. > > The custom-scan interface still needs to produce "something" according to > PG's internal data structures, so it's not clear to me where you're going > with this. > The custom-scan node is intended to perform on regular relations, not only foreign tables. It means a special feature (like GPU acceleration) can perform transparently for most of existing applications. Usually, it defines regular tables for their work on installation, not foreign tables. It is the biggest concern for me. > > It is natural both features are similar because both of them intends > > extensions to hook the planner and executor, however, its purpose is > different. > > I disagree as I don't really view FDWs as "hooks". A "hook" is more like > a trigger- sure, you can modify the data in transit, or throw an error if > you see an issue, but you don't get to redefine the world and throw out > what the planner or optimizer knows about the rest of what is going on in > the query. > I might have miswording. Anyway, I want plan nodes that enable extensions to define its behavior, even though it's similar to ForeignScan, but allows to perform on regular relations. Also, not only custom-scan and foreign-scan, any plan nodes work according to the interface to co-work with other nodes, it is not strange that both of interfaces are similar. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
2014-02-26 16:46 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: >> Just to come back to this- the other two "contrib module" patches, at least >> as I read over their initial submission, were *also* patching portions of >> backend code which it was apparently discovered that they needed. That's >> a good bit of my complaint regarding this approach. >> > ?? Sorry, are you still negative on the portion of backend patched > by the part-2 and part-3 portion?? Perhaps he meant to separate patches based on feature-based rule. IMO if exposing utilities is essential for Custom Scan API in practical meaning, IOW to implement and maintain an extension which implements Custom Scan API, they should be go into the first patch. IIUC two contrib modules are also PoC for the API, so part-2/3 patch should contain only changes against contrib and its document. Besides that, some typo fixing are mixed in part-2 patch. They should go into the part-1 patch where the typo introduced. -- Shigeru HANADA
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > Just to come back to this- the other two "contrib module" patches, at least > > as I read over their initial submission, were *also* patching portions of > > backend code which it was apparently discovered that they needed. That's > > a good bit of my complaint regarding this approach. > > > ?? Sorry, are you still negative on the portion of backend patched > by the part-2 and part-3 portion?? Pretty sure that I sent that prior to your last email, or at least before I was to the end of it. > > If you're looking to just use GPU acceleration for improving individual > > queries, I would think that Robert's work around backend workers would be > > a more appropriate way to go, with the ability to move a working set of > > data from shared buffers and on-disk representation of a relation over to > > the GPU's memory, perform the operation, and then copy the results back. > > > The approach is similar to the Robert's work except for GPU adoption, > instead of multicore CPUs. So, I tried to review his work to apply > the facilities on my extension also. Good, I'd be very curious to hear how that might solve the issue for you, instead of using hte CustomScan approach.. > > "regular" PG tables, just to point out one issue, can be locked on a > > row-by-row basis, and we know exactly where in shared buffers to go hunt > > down the rows. How is that going to work here, if this is both a "regular" > > table and stored off in a GPU's memory across subsequent queries or even > > transactions? > > > It shall be handled "case-by-case" basis, I think. If row-level lock is > required over the table scan, custom-scan node shall return a tuple being > located on the shared buffer, instead of the cached tuples. Of course, > it is an option for custom-scan node to calculate qualifiers by GPU with > cached data and returns tuples identified by ctid of the cached tuples. > Anyway, it is not a significant problem. I think you're being a bit too hand-wavey here, but if we're talking about pre-scanning the data using PG before sending it to the GPU and then only performing a single statement on the GPU, we should be able to deal with it. I'm worried about your ideas to try and cache things on the GPU though, if you're not prepared to deal with locks happening in shared memory on the rows you've got cached out on the GPU, or hint bits, or the visibility map being updated, etc... > OK, I'll move the portion that will be needed commonly for other FDWs into > the backend code. Alright- but realize that there may be objections there on the basis that the code/structures which you're exposing aren't, and will not be, stable. I'll have to go back and look at them myself, certainly, and their history. > Yes. According to the previous discussion around postgres_fdw getting > merged, all we can trust on the remote side are built-in data types, > functions, operators or other stuffs only. Well, we're going to need to expand that a bit for aggregates, I'm afraid, but we should be able to define the API for those aggregates very tightly based on what PG does today and require that any FDW purporting to provides those aggregates do it the way PG does. Note that this doesn't solve all the problems- we've got other issues with regard to pushing aggregates down into FDWs that need to be solved. > The custom-scan node is intended to perform on regular relations, not > only foreign tables. It means a special feature (like GPU acceleration) > can perform transparently for most of existing applications. Usually, > it defines regular tables for their work on installation, not foreign > tables. It is the biggest concern for me. The line between a foreign table and a local one is becoming blurred already, but still, if this is the goal then I really think the background worker is where you should be focused, not on this Custom Scan API. Consider that, once we've got proper background workers, we're going to need new nodes which operate in parallel (or some other rejiggering of the nodes- I don't pretend to know exactly what Robert is thinking here, and I've apparently forgotten it if he's posted it somewhere) and those interfaces may drive changes which would impact the Custom Scan API- or worse, make us deprecate or regret having added it because now we'll need to break backwards compatibility to add in the parallel node capability to satisfy the more general non-GPU case. > I might have miswording. Anyway, I want plan nodes that enable extensions > to define its behavior, even though it's similar to ForeignScan, but allows > to perform on regular relations. Also, not only custom-scan and foreign-scan, > any plan nodes work according to the interface to co-work with other nodes, > it is not strange that both of interfaces are similar. It sounds a lot like you're trying to define, external to PG, what Robert is already trying to get going *internal* to PG, and I really don't want to end up in a situation where we've got a solution for the uncommon case but aren't able to address the common case due to risk of breaking backwards compatibility... Thanks, Stephen
* Shigeru Hanada (shigeru.hanada@gmail.com) wrote: > Perhaps he meant to separate patches based on feature-based rule. IMO > if exposing utilities is essential for Custom Scan API in practical > meaning, IOW to implement and maintain an extension which implements > Custom Scan API, they should be go into the first patch. IIUC two > contrib modules are also PoC for the API, so part-2/3 patch should > contain only changes against contrib and its document. That's what I was getting at, yes. > Besides that, some typo fixing are mixed in part-2 patch. They should > go into the part-1 patch where the typo introduced. Agreed. THanks, Stephen
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > This regular one means usual tables. Even though custom implementation > > may reference self-managed in-memory cache instead of raw heap, the > > table pointed in user's query shall be a usual table. > > In the past, Hanada-san had proposed an enhancement of FDW to support > > remote-join but eventually rejected. > > I'm not aware of the specifics around that proposal but I don't believe > we, as a community, have decided to reject the idea in general. > IIUC, his approach was integration of join-pushdown within FDW APIs, however, it does not mean the idea of remote-join is rejected. I believe it is still one of our killer feature if we can revise the implementation. Hanada-san, could you put the reason why your proposition was rejected before? > > I thought these functions were useful to have in the backend commonly, > > but is not a fundamental functionality lacks of the custom-scan interface. > > Then perhaps they should be exposed more directly? I can understand > generally useful functionality being exposed in a way that anyone can use > it, but we need to avoid interfaces which can't be stable due to normal > / ongoing changes to the backend code. > The functions my patches want to expose are:- get_restriction_qual_cost()- fix_expr_common() And, the functions my patches newly want are:- bms_to_string()- bms_from_string() Above two functions are defined as static functions because cost estimation is done at costsize.c and set-reference is done at setrefs.c, however, custom-scan breaks this assumption, so I moved it into public. These are used by everyone, but everyone exists on a particular file. > > I can also understand the usefulness of join or aggregation into the > > remote side in case of foreign table reference. In similar way, it is > > also useful if we can push these CPU intensive operations into > > co-processors on regular table references. > > That's fine, if we can get data to and from those co-processors efficiently > enough that it's worth doing so. If moving the data to the GPU's memory > will take longer than running the actual aggregation, then it doesn't make > any sense for regular tables because then we'd have to cache the data in > the GPU's memory in some way across multiple queries, which isn't something > we're set up to do. > When I made a prototype implementation on top of FDW, using CUDA, it enabled to run sequential scan 10 times faster than SeqScan on regular tables, if qualifiers are enough complex. Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer mode using hardware DMA. It allows to hide the cost of data transfer by pipelining, if here is enough number of records to be transferred. Also, the recent trend of semiconductor device is GPU integration with CPU, that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or Tegra K1 of nvidia. All of them shares same memory, so no need to transfer the data to be calculated. This trend is dominated by physical law because of energy consumption by semiconductor. So, I'm optimistic for my idea. > > As I mentioned above, the backend changes by the part-2/-3 patches are > > just minor stuff, and I thought it should not be implemented by > > contrib module locally. > > Fine- then propose them as generally useful additions, not as patches which > are supposed to just be for contrib modules using an already defined > interface. If you can make a case for that then perhaps this is more > practical. > The usage was found by the contrib module that wants to call static functions, or feature to translate existing data structure to/from cstring. But, anyway, does separated patch make sense? > > No. What I want to implement is, read the regular table and transfer > > the contents into GPU's local memory for calculation, then receives > > its calculation result. The in-memory cache (also I'm working on) is > > supplemental stuff because disk access is much slower and row-oriented > > data structure is not suitable for SIMD style instructions. > > Is that actually performant? Is it actually faster than processing the > data directly? The discussions that I've had with folks have cast a great > deal of doubt in my mind about just how well that kind of quick turn-around > to the GPU's memory actually works. > See above. > > > This really strikes me as the wrong approach for an FDW > > > join-pushdown API, which should be geared around giving the remote > > > side an opportunity on a case-by-case basis to cost out joins using > > > whatever methods it has available to implement them. I've outlined > > > above the reasons I don't agree with just making the entire > planner/optimizer pluggable. > > > > > I'm also inclined to have arguments that will provide enough > > information for extensions to determine the best path for them. > > For join push-down, I proposed above that we have an interface to the FDW > which allows us to ask it how much each join of the tables which are on > a given FDW's server would cost if the FDW did it vs. pulling it back and > doing it locally. We could also pass all of the relations to the FDW with > the various join-quals and try to get an answer to everything, but I'm afraid > that'd simply end up duplicating the logic of the optimizer into every FDW, > which would be counter-productive. > Hmm... It seems to me we should follow the existing manner to construct join path, rather than special handling. Even if a query contains three or more foreign tables managed by same server, it shall be consolidated into one remote join as long as its cost is less than local ones. So, I'd like to bed using the new add_join_path_hook to compute possible join path. If remote join implemented by custom-scan is cheaper than local join, it shall be chosen, then optimizer will try joining with other foreign tables with this custom-scan node. If remote-join is still cheap, then it shall be consolidated again. > Admittedly, getting the costing right isn't easy either, but it's not clear > to me how it'd make sense for the local server to be doing costing for remote > servers. > Right now, I ignored the cost to run remote-server, focused on the cost to transfer via network. It might be an idea to discount the CPU cost of remote execution. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> > > If you're looking to just use GPU acceleration for improving > > > individual queries, I would think that Robert's work around backend > > > workers would be a more appropriate way to go, with the ability to > > > move a working set of data from shared buffers and on-disk > > > representation of a relation over to the GPU's memory, perform the > operation, and then copy the results back. > > > > > The approach is similar to the Robert's work except for GPU adoption, > > instead of multicore CPUs. So, I tried to review his work to apply the > > facilities on my extension also. > > Good, I'd be very curious to hear how that might solve the issue for you, > instead of using hte CustomScan approach.. > I (plan to) use custom-scan of course. Once a relation is referenced and optimizer decided GPU acceleration is cheaper, associated custom- scan node read the data from underlying relation (or in-memory cache if exists) then move to the shared memory buffer to deliver GPU management background worker that launches asynchronous DMA one by one. After that, custom-scan node receives filtered records via shared- memory buffer, so it can construct tuples to be returned to the upper node. > > > "regular" PG tables, just to point out one issue, can be locked on a > > > row-by-row basis, and we know exactly where in shared buffers to go > > > hunt down the rows. How is that going to work here, if this is both > a "regular" > > > table and stored off in a GPU's memory across subsequent queries or > > > even transactions? > > > > > It shall be handled "case-by-case" basis, I think. If row-level lock > > is required over the table scan, custom-scan node shall return a tuple > > being located on the shared buffer, instead of the cached tuples. Of > > course, it is an option for custom-scan node to calculate qualifiers > > by GPU with cached data and returns tuples identified by ctid of the cached > tuples. > > Anyway, it is not a significant problem. > > I think you're being a bit too hand-wavey here, but if we're talking about > pre-scanning the data using PG before sending it to the GPU and then only > performing a single statement on the GPU, we should be able to deal with > it. It's what I want to implement. > I'm worried about your ideas to try and cache things on the GPU though, > if you're not prepared to deal with locks happening in shared memory on > the rows you've got cached out on the GPU, or hint bits, or the visibility > map being updated, etc... > It does not remain any state/information on the GPU side. Things related to PG internal stuff is job of CPU. > > OK, I'll move the portion that will be needed commonly for other FDWs > > into the backend code. > > Alright- but realize that there may be objections there on the basis that > the code/structures which you're exposing aren't, and will not be, stable. > I'll have to go back and look at them myself, certainly, and their history. > I see, but it is a process during code getting merged. > > Yes. According to the previous discussion around postgres_fdw getting > > merged, all we can trust on the remote side are built-in data types, > > functions, operators or other stuffs only. > > Well, we're going to need to expand that a bit for aggregates, I'm afraid, > but we should be able to define the API for those aggregates very tightly > based on what PG does today and require that any FDW purporting to provides > those aggregates do it the way PG does. Note that this doesn't solve all > the problems- we've got other issues with regard to pushing aggregates down > into FDWs that need to be solved. > I see. It probably needs more detailed investigation. > > The custom-scan node is intended to perform on regular relations, not > > only foreign tables. It means a special feature (like GPU > > acceleration) can perform transparently for most of existing > > applications. Usually, it defines regular tables for their work on > > installation, not foreign tables. It is the biggest concern for me. > > The line between a foreign table and a local one is becoming blurred already, > but still, if this is the goal then I really think the background worker > is where you should be focused, not on this Custom Scan API. Consider that, > once we've got proper background workers, we're going to need new nodes > which operate in parallel (or some other rejiggering of the nodes- I don't > pretend to know exactly what Robert is thinking here, and I've apparently > forgotten it if he's posted it > somewhere) and those interfaces may drive changes which would impact the > Custom Scan API- or worse, make us deprecate or regret having added it > because now we'll need to break backwards compatibility to add in the > parallel node capability to satisfy the more general non-GPU case. > The custom-scan API is thin abstraction towards the plan node interface, not tightly convinced with a particular use case, like GPU, remote-join and so on. So, I'm quite optimistic for the future maintainability. Also, please remind the discussion at the last developer meeting. The purpose of custom-scan (we didn't name it at that time) is to avoid unnecessary project branch for people who want to implement their own special feature but no facilities to enhance optimizer/executor are supported. Even though we have in-core parallel execution feature by CPU, it also makes sense to provide some unique implementation that may be suitable for a specific region. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
2014-02-26 17:31 GMT+09:00 Kouhei Kaigai <kaigai@ak.jp.nec.com>: > IIUC, his approach was integration of join-pushdown within FDW APIs, > however, it does not mean the idea of remote-join is rejected. > I believe it is still one of our killer feature if we can revise the > implementation. > > Hanada-san, could you put the reason why your proposition was rejected > before? IIUC it was not rejected, just returned-with-feedback. We could not get consensus about how join-push-down works. A duscussion point was multiple paths for a joinrel, but it was not so serious point. Here is the tail of the thread. http://www.postgresql.org/message-id/4F058241.2000606@enterprisedb.com >> Heikki Linnakangas<heikki.linnakangas@enterprisedb.com> writes: >>> >>> Hmm, so you're saying that the FDW function needs to be able to return >>> multiple paths for a single joinrel. Fair enough, and that's not >>> specific to remote joins. Even a single-table foreign scan could be >>> implemented differently depending on whether you prefer fast-start or >>> cheapest total. >> >> >> ... or ordered vs unordered, etc. Yeah, good point, we already got this >> wrong with the PlanForeignScan API. Good thing we didn't promise that >> would be stable. > > > This discussion withered down here... > > I think the advice to Shigeru-san is to work on the API. We didn't reach a > consensus on what exactly it should look like, but at least you need to be > able to return multiple paths for a single joinrel, and should look at > fixing the PlanForeignScan API to allow that too. And I've gave up for lack of time, IOW to finish more fundamental portion of FDW API. http://www.postgresql.org/message-id/4F39FC1A.7090202@gmail.com -- Shigeru HANADA
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > IIUC, his approach was integration of join-pushdown within FDW APIs, > however, it does not mean the idea of remote-join is rejected. For my part, trying to consider doing remote joins *without* going through FDWs is just nonsensical. What are you joining remotely if not two foreign tables? With regard to the GPU approach, if that model works whereby the normal PG tuples are read off disk, fed over to the GPU, processed, then returned back to the user through PG, then I wouldn't consider it really a 'remote' join but rather simply a new execution node inside of PG which is planned and costed just like the others. We've been over the discussion already about trying to make that a pluggable system but the, very reasonable, push-back on that has been if it's really possible and really makes sense to be pluggable. It certainly doesn't *have* to be- PostgreSQL is written in C, as we all know, and plenty of C code talks to GPUs and shuffles memory around- and that's almost exactly what Robert is working on supporting with regular CPUs and PG backends already. In many ways, trying to conflate this idea of using-GPUs-to-do-work with the idea of remote-FDW-joins has really disillusioned me with regard to the CustomScan approach. > > Then perhaps they should be exposed more directly? I can understand > > generally useful functionality being exposed in a way that anyone can use > > it, but we need to avoid interfaces which can't be stable due to normal > > / ongoing changes to the backend code. > > > The functions my patches want to expose are: > - get_restriction_qual_cost() > - fix_expr_common() I'll try and find time to go look at these in more detail later this week. I have reservations about exposing the current estimates on costs as we may want to adjust them in the future- but such adjustments may need to be made in balance with other changes throughout the system and an external module which depends on one result from the qual costing might end up having problems with the costing changes because the extension author wasn't aware of the other changes happening in other areas of the costing. I'm talking about this from a "beyond-just-the-GUCs" point of view, I realize that the extension author could go look at the GUC settings, but it's entirely reasonable to believe we'll make changes to the default GUC settings along with how they're used in the future. > And, the functions my patches newly want are: > - bms_to_string() > - bms_from_string() Offhand, these look fine, if there's really an external use for them. Will try to look at them in more detail later. > > That's fine, if we can get data to and from those co-processors efficiently > > enough that it's worth doing so. If moving the data to the GPU's memory > > will take longer than running the actual aggregation, then it doesn't make > > any sense for regular tables because then we'd have to cache the data in > > the GPU's memory in some way across multiple queries, which isn't something > > we're set up to do. > > > When I made a prototype implementation on top of FDW, using CUDA, it enabled > to run sequential scan 10 times faster than SeqScan on regular tables, if > qualifiers are enough complex. > Library to communicate GPU (OpenCL/CUDA) has asynchronous data transfer > mode using hardware DMA. It allows to hide the cost of data transfer by > pipelining, if here is enough number of records to be transferred. That sounds very interesting and certainly figuring out the costing to support that model will be tricky. Also, shuffling the data around in that way will also be interesting. It strikes me that it'll be made more difficult if we're trying to do it through the limitations of a pre-defined API between the core code and an extension. > Also, the recent trend of semiconductor device is GPU integration with CPU, > that shares a common memory space. See, Haswell of Intel, Kaveri of AMD, or > Tegra K1 of nvidia. All of them shares same memory, so no need to transfer > the data to be calculated. This trend is dominated by physical law because > of energy consumption by semiconductor. So, I'm optimistic for my idea. And this just makes me wonder why the focus isn't on the background worker approach instead of trying to do this all in an extension. > The usage was found by the contrib module that wants to call static > functions, or feature to translate existing data structure to/from > cstring. But, anyway, does separated patch make sense? I haven't had a chance to go back and look into the functions in detail, but offhand I'd say the bms ones are probably fine while the others would need more research as to if they make sense to expose to an extension. > Hmm... It seems to me we should follow the existing manner to construct > join path, rather than special handling. Even if a query contains three or > more foreign tables managed by same server, it shall be consolidated into > one remote join as long as its cost is less than local ones. I'm not convinced that it's going to be that simple, but I'm certainly interested in the general idea. > So, I'd like to bed using the new add_join_path_hook to compute possible > join path. If remote join implemented by custom-scan is cheaper than local > join, it shall be chosen, then optimizer will try joining with other foreign > tables with this custom-scan node. If remote-join is still cheap, then it > shall be consolidated again. And I'm still unconvinced that trying to make this a hook and implemented by an extension makes sense. > > Admittedly, getting the costing right isn't easy either, but it's not clear > > to me how it'd make sense for the local server to be doing costing for remote > > servers. > > > Right now, I ignored the cost to run remote-server, focused on the cost to > transfer via network. It might be an idea to discount the CPU cost of remote > execution. Pretty sure we're going to need to consider the remote processing cost of the join as well.. Thanks, Stephen
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > I (plan to) use custom-scan of course. Once a relation is referenced > and optimizer decided GPU acceleration is cheaper, associated custom- > scan node read the data from underlying relation (or in-memory cache > if exists) then move to the shared memory buffer to deliver GPU > management background worker that launches asynchronous DMA one by one. > After that, custom-scan node receives filtered records via shared- > memory buffer, so it can construct tuples to be returned to the upper > node. Alright- but have you discussed this with Robert? We're going to be whacking things around for parallel support with new nodes and more built-in helper functionality for doing this work and I'm not anxious to have CustomScan end up being a legacy interface that we're required to pull forward because we accepted it before things had settled. > > I'm worried about your ideas to try and cache things on the GPU though, > > if you're not prepared to deal with locks happening in shared memory on > > the rows you've got cached out on the GPU, or hint bits, or the visibility > > map being updated, etc... > > > It does not remain any state/information on the GPU side. Things related > to PG internal stuff is job of CPU. Right, good, I'm glad to hear that this approach is for doing things at only a individual statement level and it's good to know that it can be performant at that level now. > > Well, we're going to need to expand that a bit for aggregates, I'm afraid, > > but we should be able to define the API for those aggregates very tightly > > based on what PG does today and require that any FDW purporting to provides > > those aggregates do it the way PG does. Note that this doesn't solve all > > the problems- we've got other issues with regard to pushing aggregates down > > into FDWs that need to be solved. > > > I see. It probably needs more detailed investigation. These issues will hopefully not be a problem (or at least, one that can be worked around) for non-FDW implementations which are part of core and implemented in a similar way to the existing aggregates.. Where the scan node could continue to be a simple SeqScan as it is today. > The custom-scan API is thin abstraction towards the plan node interface, > not tightly convinced with a particular use case, like GPU, remote-join > and so on. So, I'm quite optimistic for the future maintainability. I don't see how you can be when there hasn't been any discussion that I've seen about how parallel query execution is going to change things for us. > Also, please remind the discussion at the last developer meeting. > The purpose of custom-scan (we didn't name it at that time) is to avoid > unnecessary project branch for people who want to implement their own > special feature but no facilities to enhance optimizer/executor are > supported. > Even though we have in-core parallel execution feature by CPU, it also > makes sense to provide some unique implementation that may be suitable > for a specific region. The issue here is that we're going to be expected to maintain an interface once we provide it and so that isn't something we should be doing lightly. Particularly when it's as involved as this kind of change is with what's going on in the backend where we are nearly 100% sure to be changing things in the next release or two. Thanks, Stephen
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > IIUC, his approach was integration of join-pushdown within FDW APIs, > > however, it does not mean the idea of remote-join is rejected. > > For my part, trying to consider doing remote joins *without* going through > FDWs is just nonsensical. What are you joining remotely if not two foreign > tables? > It is a case to be joined locally. If query has two foreign tables managed by same server, this couple shall be found during the optimizer tries various possible combinations. > With regard to the GPU approach, if that model works whereby the > normal PG tuples are read off disk, fed over to the GPU, processed, then > returned back to the user through PG, then I wouldn't consider it really > a 'remote' join but rather simply a new execution node inside of PG which > is planned and costed just like the others. We've been over the discussion > already about trying to make that a pluggable system but the, very reasonable, > push-back on that has been if it's really possible and really makes sense > to be pluggable. It certainly doesn't *have* to be- PostgreSQL is written > in C, as we all know, and plenty of C code talks to GPUs and shuffles memory > around- and that's almost exactly what Robert is working on supporting with > regular CPUs and PG backends already. > > In many ways, trying to conflate this idea of using-GPUs-to-do-work with > the idea of remote-FDW-joins has really disillusioned me with regard to > the CustomScan approach. > Are you suggesting me to focus on the GPU stuff, rather than killing two birds with a stone? It may be an approach, however, these have common part because the plan-node for remote-join will pops tuples towards its upper node. From viewpoint of the upper node, it looks like a black box that returns tuples that joined two underlying relations. On the other hands, here is another black box that returns tuples that scans or joins underlying relations with GPU assist. Both of implementation detail is not visible for the upper node, but its external interface is common. The custom-scan node can provide a pluggable way for both of use-case. Anyway, I'm not motivated to remote-join feature more than GPU-acceleration stuff. If it is better to drop FDW's remote-join stuff from the custom-scan scope, I don't claim it. > > > Then perhaps they should be exposed more directly? I can understand > > > generally useful functionality being exposed in a way that anyone > > > can use it, but we need to avoid interfaces which can't be stable > > > due to normal / ongoing changes to the backend code. > > > > > The functions my patches want to expose are: > > - get_restriction_qual_cost() > > - fix_expr_common() > > I'll try and find time to go look at these in more detail later this week. > I have reservations about exposing the current estimates on costs as we > may want to adjust them in the future- but such adjustments may need to > be made in balance with other changes throughout the system and an external > module which depends on one result from the qual costing might end up having > problems with the costing changes because the extension author wasn't aware > of the other changes happening in other areas of the costing. > It is also the point of mine. If cost estimation logic is revised in the future, it makes a problem if extension cuts and copies the code. > I'm talking about this from a "beyond-just-the-GUCs" point of view, I > realize that the extension author could go look at the GUC settings, but > it's entirely reasonable to believe we'll make changes to the default GUC > settings along with how they're used in the future. > Is the GUC something like Boolean that shows whether the new costing model is applied or not? If so, extension needs to keep two cost estimation logics within its code, isn't it? If the GUC shows something like a weight, I also think it makes sense. > > And, the functions my patches newly want are: > > - bms_to_string() > > - bms_from_string() > > Offhand, these look fine, if there's really an external use for them. > Will try to look at them in more detail later. > At least, it makes sense to carry bitmap data structure on the private field of custom-scan, because all the plan node has to be safe for copyObject() manner. > > > That's fine, if we can get data to and from those co-processors > > > efficiently enough that it's worth doing so. If moving the data to > > > the GPU's memory will take longer than running the actual > > > aggregation, then it doesn't make any sense for regular tables > > > because then we'd have to cache the data in the GPU's memory in some > > > way across multiple queries, which isn't something we're set up to do. > > > > > When I made a prototype implementation on top of FDW, using CUDA, it > > enabled to run sequential scan 10 times faster than SeqScan on regular > > tables, if qualifiers are enough complex. > > Library to communicate GPU (OpenCL/CUDA) has asynchronous data > > transfer mode using hardware DMA. It allows to hide the cost of data > > transfer by pipelining, if here is enough number of records to be > transferred. > > That sounds very interesting and certainly figuring out the costing to > support that model will be tricky. Also, shuffling the data around in that > way will also be interesting. It strikes me that it'll be made more > difficult if we're trying to do it through the limitations of a pre-defined > API between the core code and an extension. > This data shuffling is done within extension side, so it looks like the core PG just picks up tuples from the box that handles underlying table scan in some way. > > Also, the recent trend of semiconductor device is GPU integration with > > CPU, that shares a common memory space. See, Haswell of Intel, Kaveri > > of AMD, or Tegra K1 of nvidia. All of them shares same memory, so no > > need to transfer the data to be calculated. This trend is dominated by > > physical law because of energy consumption by semiconductor. So, I'm > optimistic for my idea. > > And this just makes me wonder why the focus isn't on the background worker > approach instead of trying to do this all in an extension. > The GPU portion of above processors have different instruction set from CPU, so we cannot utilize its parallel execution capability even if we launch tons of background workers; that run existing CPU instructions. > > Hmm... It seems to me we should follow the existing manner to > > construct join path, rather than special handling. Even if a query > > contains three or more foreign tables managed by same server, it shall > > be consolidated into one remote join as long as its cost is less than > local ones. > > I'm not convinced that it's going to be that simple, but I'm certainly > interested in the general idea. > That is implemented in my part-3 patch, add_join_path hook adds custom-scan path that joins two foreign tables, a foreign table and a custom-scan, or two custom-scans if all of them are managed in same foreign server. As long as its execution cost is reasonable, it allows to run remote join that contains three or more relations. > > So, I'd like to bed using the new add_join_path_hook to compute > > possible join path. If remote join implemented by custom-scan is > > cheaper than local join, it shall be chosen, then optimizer will try > > joining with other foreign tables with this custom-scan node. If > > remote-join is still cheap, then it shall be consolidated again. > > And I'm still unconvinced that trying to make this a hook and implemented > by an extension makes sense. > The postgresAddJoinPaths() in my part-3 patch is doing that. Of course, some portion of its code might have been supported at the code backend. However, I don't think overall design is unreasonable than special handling. > > > Admittedly, getting the costing right isn't easy either, but it's > > > not clear to me how it'd make sense for the local server to be doing > > > costing for remote servers. > > > > > Right now, I ignored the cost to run remote-server, focused on the > > cost to transfer via network. It might be an idea to discount the CPU > > cost of remote execution. > > Pretty sure we're going to need to consider the remote processing cost of > the join as well.. > I also think so, even though it is not done yet. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > > I (plan to) use custom-scan of course. Once a relation is referenced > > and optimizer decided GPU acceleration is cheaper, associated custom- > > scan node read the data from underlying relation (or in-memory cache > > if exists) then move to the shared memory buffer to deliver GPU > > management background worker that launches asynchronous DMA one by one. > > After that, custom-scan node receives filtered records via shared- > > memory buffer, so it can construct tuples to be returned to the upper > > node. > > Alright- but have you discussed this with Robert? We're going to be > whacking things around for parallel support with new nodes and more built-in > helper functionality for doing this work and I'm not anxious to have > CustomScan end up being a legacy interface that we're required to pull > forward because we accepted it before things had settled. > I had briefly introduced him my idea using GPU at Ottawa last year, even though I'm not certain he remembered it. At least, idea of custom-scan node came from the discussion at that time. > > The custom-scan API is thin abstraction towards the plan node > > interface, not tightly convinced with a particular use case, like GPU, > > remote-join and so on. So, I'm quite optimistic for the future > maintainability. > > I don't see how you can be when there hasn't been any discussion that I've > seen about how parallel query execution is going to change things for us. > If parallel query execution changes whole of the structure of plan nodes, it will also affect to the interface of custom-scan because it is a thin- abstraction of plan-node. However, if parallel execution feature is implemented as one of new plan node in addition to existing one, I cannot imagine a scenario that affects to the structure of another plan node. > > Also, please remind the discussion at the last developer meeting. > > The purpose of custom-scan (we didn't name it at that time) is to > > avoid unnecessary project branch for people who want to implement > > their own special feature but no facilities to enhance > > optimizer/executor are supported. > > Even though we have in-core parallel execution feature by CPU, it also > > makes sense to provide some unique implementation that may be suitable > > for a specific region. > > The issue here is that we're going to be expected to maintain an interface > once we provide it and so that isn't something we should be doing lightly. > Particularly when it's as involved as this kind of change is with what's > going on in the backend where we are nearly 100% sure to be changing things > in the next release or two. > FDW APIs are also revised several times in the recent releases. If we can design "perfect" interface from the beginning, it's best but usually hard to design. Also, custom-scan interface is almost symmetric with existing plan node structures, so its stability is relatively high, I think. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > * Stephen Frost (sfrost@snowman.net) wrote: > > I don't see how you can be when there hasn't been any discussion that I've > > seen about how parallel query execution is going to change things for us. > > > If parallel query execution changes whole of the structure of plan nodes, > it will also affect to the interface of custom-scan because it is a thin- > abstraction of plan-node. However, if parallel execution feature is > implemented as one of new plan node in addition to existing one, I cannot > imagine a scenario that affects to the structure of another plan node. Let's just say that I have doubts that we'll be able to implement parallel execution *without* changing the plan node interface in some way which will require, hopefully minor, changes to all of the nodes. The issue is that even a minor change would break the custom-scan API and we'd immediately be in the boat of dealing with complaints regarding backwards compatibility. Perhaps we can hand-wave that, and we've had some success changing hook APIs between major releases, but such changes may also be in ways which wouldn't break in obvious ways or even possibly be changes which have to be introduced into back-branches. Parallel query is going to be brand-new real soon and it's reasonable to think we'll need to make bug-fix changes to it after it's out which might even involve changes to the API which is developed for it. > > The issue here is that we're going to be expected to maintain an interface > > once we provide it and so that isn't something we should be doing lightly. > > Particularly when it's as involved as this kind of change is with what's > > going on in the backend where we are nearly 100% sure to be changing things > > in the next release or two. > > > FDW APIs are also revised several times in the recent releases. If we can > design "perfect" interface from the beginning, it's best but usually hard > to design. Sure, but FDWs also have a *much* broader set of use-cases, in my view, which is also why I was pushing to work on join-push-down to happen there instead of having this kind of a hook interface, which I don't think we'd want to directly expose as part of the 'official FDW API' as it ends up putting all the work on the FDW with little aide, making it terribly likely to end up with a bunch of duplciated code in the FDWs from the backend to deal with it, particularly for individuals writing FDWs who aren't familiar with what PG already has. > Also, custom-scan interface is almost symmetric with existing plan node > structures, so its stability is relatively high, I think. Perhaps it will come to pass that parallel query execution doesn't require any changes to the plan node structure, but that's not the horse that I'd bet on at this point. Thanks, Stephen
2014-03-01 0:36 GMT+09:00 Stephen Frost <sfrost@snowman.net>: > * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: >> * Stephen Frost (sfrost@snowman.net) wrote: >> > I don't see how you can be when there hasn't been any discussion that I've >> > seen about how parallel query execution is going to change things for us. >> > >> If parallel query execution changes whole of the structure of plan nodes, >> it will also affect to the interface of custom-scan because it is a thin- >> abstraction of plan-node. However, if parallel execution feature is >> implemented as one of new plan node in addition to existing one, I cannot >> imagine a scenario that affects to the structure of another plan node. > > Let's just say that I have doubts that we'll be able to implement > parallel execution *without* changing the plan node interface in some > way which will require, hopefully minor, changes to all of the nodes. > The issue is that even a minor change would break the custom-scan API > and we'd immediately be in the boat of dealing with complaints regarding > backwards compatibility. Perhaps we can hand-wave that, and we've had > some success changing hook APIs between major releases, but such changes > may also be in ways which wouldn't break in obvious ways or even > possibly be changes which have to be introduced into back-branches. > Parallel query is going to be brand-new real soon and it's reasonable to > think we'll need to make bug-fix changes to it after it's out which > might even involve changes to the API which is developed for it. > Even if we will change the plan-node interface in the future release, it shall not be a change that makes the existing stuff impossible. The custom-scan API is designed to provide alternative way to scan or join relations, in addition to the existing logic like SeqScan or NestLoop. If this change breaks plan-node interfaces and it couldn't implement existing stuff, it is problematic for all the stuff, not only custom-scan node. I don't think such a change that makes impossible to implement existing logic will be merged. Likely, the new parallel execution feature can work together existing sequential logic and custom-scan interface. BTW, this kind of discussion looks like a talk with a ghost because we cannot see the new interface according to the parallel execution right now, so we cannot have tangible investigation whether it becomes really serious backward incompatibility, or not. My bet is minor one. I cannot imagine plan-node interface that does not support existing non-parallel SeqScan or NestLoop and so on. >> > The issue here is that we're going to be expected to maintain an interface >> > once we provide it and so that isn't something we should be doing lightly. >> > Particularly when it's as involved as this kind of change is with what's >> > going on in the backend where we are nearly 100% sure to be changing things >> > in the next release or two. >> > >> FDW APIs are also revised several times in the recent releases. If we can >> design "perfect" interface from the beginning, it's best but usually hard >> to design. > > Sure, but FDWs also have a *much* broader set of use-cases, in my view, > which is also why I was pushing to work on join-push-down to happen > there instead of having this kind of a hook interface, which I don't > think we'd want to directly expose as part of the 'official FDW API' as > it ends up putting all the work on the FDW with little aide, making it > terribly likely to end up with a bunch of duplciated code in the FDWs > from the backend to deal with it, particularly for individuals writing > FDWs who aren't familiar with what PG already has. > It might not be a good idea to use postgres_fdw as a basis of proof- of-concept to demonstrate that custom-scan can effectively host an alternative way to join; that fetches the result set of remote-join as if relation scan, even though it demonstrated it is possible. So, I never mind the part-3 portion (remote join of postgres_fdw on top of custom-scan) being dropped from the submission. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai, * Kohei KaiGai (kaigai@kaigai.gr.jp) wrote: > BTW, this kind of discussion looks like a talk with a ghost because > we cannot see the new interface according to the parallel execution > right now, so we cannot have tangible investigation whether it becomes > really serious backward incompatibility, or not. Yeah, it would certainly be nice if we had all of the answers up-front. What I keep hoping for is that someone who has been working on this area (eg: Robert) would speak up... > My bet is minor one. I cannot imagine plan-node interface that does > not support existing non-parallel SeqScan or NestLoop and so on. Sure you can- because once we change the interface, we're probably going to go through and make everything use the new one rather than have to special-case things. That's more-or-less exactly my point here because having an external hook like CustomScan would make that kind of wholesale change more difficult. That does *not* mean I'm against using GPUs and GPU optimizations. What it means is that I'd rather see that done in core, which would allow us to simply change that interface along with the rest when doing wholesale changes and not have to worry about backwards compatibility and breaking other people's code. Thanks, Stephen
2014-03-01 22:38 GMT+09:00 Stephen Frost <sfrost@snowman.net>: > KaiGai, > > * Kohei KaiGai (kaigai@kaigai.gr.jp) wrote: >> BTW, this kind of discussion looks like a talk with a ghost because >> we cannot see the new interface according to the parallel execution >> right now, so we cannot have tangible investigation whether it becomes >> really serious backward incompatibility, or not. > > Yeah, it would certainly be nice if we had all of the answers up-front. > What I keep hoping for is that someone who has been working on this area > (eg: Robert) would speak up... > I'd also like to see his opinion. >> My bet is minor one. I cannot imagine plan-node interface that does >> not support existing non-parallel SeqScan or NestLoop and so on. > > Sure you can- because once we change the interface, we're probably going > to go through and make everything use the new one rather than have to > special-case things. That's more-or-less exactly my point here because > having an external hook like CustomScan would make that kind of > wholesale change more difficult. > I think, we should follow the general rule in case of custom-scan also. In other words, it's responsibility of extension's author to follow up the latest specification of interfaces. For example, I have an extension module that is unable to work on the latest PG- code because of interface changes at ProcessUtility_hook. Is it a matter of backward incompatibility? Definitely, no. It should be my job. > That does *not* mean I'm against using GPUs and GPU optimizations. What > it means is that I'd rather see that done in core, which would allow us > to simply change that interface along with the rest when doing wholesale > changes and not have to worry about backwards compatibility and breaking > other people's code. > I also have to introduce a previous background discussion. Now we have two options for GPU programming: CUDA or OpenCL. Both of libraries and drivers are provided under the proprietary license, so it does not fit for the core implementation of PostgreSQL, but extensions that shall be installed on user's responsibility. Because of the story, I brought up a discussion about pluggable planner/executor node (as a basis of GPU acceleration) in the last developer meeting, then has worked for custom-scan node interface. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
KaiGai, * Kohei KaiGai (kaigai@kaigai.gr.jp) wrote: > Now we have two options for GPU programming: CUDA or OpenCL. > Both of libraries and drivers are provided under the proprietary license, > so it does not fit for the core implementation of PostgreSQL, but > extensions that shall be installed on user's responsibility. Being able to work with libraries which are not BSD licensed doesn't change the license under which PostgreSQL code is released. Nor does it require PostgreSQL to be licensed in any different way from how it is today. Where it would get a bit ugly, I agree, is for the packagers who have to decide if they want to build against those libraries or not. We might be able to make things a bit easier by having a startup-time determination of if these nodes are able to be used or not. This isn't unlike OpenSSL which certainly isn't BSD nor is it even GPL-compatible, a situation which causes a great deal of difficulty already due to the whole readline nonsense- but it's difficulty for the packagers, not for the PostgreSQL project, per se. > Because of the story, I brought up a discussion about pluggable > planner/executor node (as a basis of GPU acceleration) in the last > developer meeting, then has worked for custom-scan node interface. And I'm still unconvinced of this approach and worry that it's going to break more often than it works. That's my 2c on it, but I won't get in the way if someone else wants to step up and support it. As I mentioned up-thread, I'd really like to see FDW join push-down, FDW aggregate push-down, parallel query execution, and parallel remote-FDW execution and I don't see this CustomScan approach as the right answer to any of those. Thanks, Stephen
2014-03-02 9:51 GMT+09:00 Stephen Frost <sfrost@snowman.net>: > KaiGai, > > * Kohei KaiGai (kaigai@kaigai.gr.jp) wrote: >> Now we have two options for GPU programming: CUDA or OpenCL. >> Both of libraries and drivers are provided under the proprietary license, >> so it does not fit for the core implementation of PostgreSQL, but >> extensions that shall be installed on user's responsibility. > > Being able to work with libraries which are not BSD licensed doesn't > change the license under which PostgreSQL code is released. Nor does it > require PostgreSQL to be licensed in any different way from how it is > today. Where it would get a bit ugly, I agree, is for the packagers who > have to decide if they want to build against those libraries or not. We > might be able to make things a bit easier by having a startup-time > determination of if these nodes are able to be used or not. This isn't > unlike OpenSSL which certainly isn't BSD nor is it even GPL-compatible, > a situation which causes a great deal of difficulty already due to the > whole readline nonsense- but it's difficulty for the packagers, not for > the PostgreSQL project, per se. > As you mentioned, it is a headache for packagers, and does not make sense for us if packager disabled the feature that requires proprietary drivers. In fact, Fedora / RHEL does not admit to distribute software under the none open source software license. Obviously, nvidia's cuda is a library being distributed under the proprietary license, thus out of the scope for the Linux distributors. It also leads them to turn off the feature that shall be linked with proprietary drivers. All we can do is to implement these features as extension, then offer an option for users to use or not to use. >> Because of the story, I brought up a discussion about pluggable >> planner/executor node (as a basis of GPU acceleration) in the last >> developer meeting, then has worked for custom-scan node interface. > > And I'm still unconvinced of this approach and worry that it's going to > break more often than it works. That's my 2c on it, but I won't get in > the way if someone else wants to step up and support it. As I mentioned > up-thread, I'd really like to see FDW join push-down, FDW aggregate > push-down, parallel query execution, and parallel remote-FDW execution > and I don't see this CustomScan approach as the right answer to any of > those. > It's right approach for FDW functionality enhancement, I never opposed to. What I'd like to implement is GPU acceleration that can perform on regular tables, not only foreign tables. Also, regarding to the backlog in the developer meeting, pluggable executor node is also required feature by PG-XC folks to work their project with upstream. I think custom-scan feature is capable to host these requirement, and does not prevent enhancement FDW features. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
* Kohei KaiGai (kaigai@kaigai.gr.jp) wrote: > As you mentioned, it is a headache for packagers, and does not make > sense for us if packager disabled the feature that requires proprietary > drivers. No, I disagree with that. I don't expect this use-case to be very common to begin with and telling individuals that they have to compile it themselves is certainly not out of the question. > In fact, Fedora / RHEL does not admit to distribute software > under the none open source software license. I'm pretty confident you can get RPMs for those distributions. > Obviously, nvidia's cuda > is a library being distributed under the proprietary license, thus out of > the scope for the Linux distributors. This also doesn't make any sense to me- certainly the CUDA libraries are available under Debian derivatives, along with open-source wrappers for them like pycuda. > It also leads them to turn off the > feature that shall be linked with proprietary drivers. > All we can do is to implement these features as extension, then offer > an option for users to use or not to use. No, we can tell individuals who want it that they're going to need to build with support for it. We don't offer OpenSSL as an extension (I certainly wish we did- and had a way to replace it w/ GNUTLS or one of the better licensed options). > What I'd like to implement is GPU acceleration that can perform on > regular tables, not only foreign tables. Also, regarding to the backlog > in the developer meeting, pluggable executor node is also required > feature by PG-XC folks to work their project with upstream. > I think custom-scan feature is capable to host these requirement, > and does not prevent enhancement FDW features. I think you're conflating things again- while it might be possible to use CustomScan to implement FDW join-pushdown or FDW aggregate-pushdown, *I* don't think it's the right approach. Regarding the PG-XC requirement, I expect they're looking for FDW join/aggregate-pushdown and also see that it *could* be done w/ CustomScan. We could punt on the whole thing and drop in hooks which could be used to replace everything done from the planner through to the executor and then anyone *could* implement any of the above, and parallel query too. That doesn't make it the right approach. Thanks, Stephen
On Wed, Feb 26, 2014 at 3:02 AM, Stephen Frost <sfrost@snowman.net> wrote: >> The custom-scan node is intended to perform on regular relations, not >> only foreign tables. It means a special feature (like GPU acceleration) >> can perform transparently for most of existing applications. Usually, >> it defines regular tables for their work on installation, not foreign >> tables. It is the biggest concern for me. > > The line between a foreign table and a local one is becoming blurred > already, but still, if this is the goal then I really think the > background worker is where you should be focused, not on this Custom > Scan API. Consider that, once we've got proper background workers, > we're going to need new nodes which operate in parallel (or some other > rejiggering of the nodes- I don't pretend to know exactly what Robert is > thinking here, and I've apparently forgotten it if he's posted it > somewhere) and those interfaces may drive changes which would impact the > Custom Scan API- or worse, make us deprecate or regret having added it > because now we'll need to break backwards compatibility to add in the > parallel node capability to satisfy the more general non-GPU case. This critique seems pretty odd to me. I haven't had the time to look at this patch set, but I don't see why anyone would want to use the background worker facility for GPU acceleration, which is what KaiGai's trying to accomplish here. Surely you want, if possible, to copy the data directly from the user backend into the GPU's working memory. What would the use of a background worker be? We definitely need background workers to make use of additional *CPUs*, but I don't see what good they are in leveraging *GPUs*. I seriously doubt there's any real conflict with parallelism here. Parallelism may indeed add more ways to scan a relation (ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we shouldn't have a Custom Scan node too. Indeed, my principle concern about this patch set isn't that it's too specialized, as you seem to be worrying about, but that it's aiming to satisfy *too many* use cases. I think FDW join pushdown is a fairly different problem from adding a custom scan type, and I doubt one patch should try to solve both problems. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Feb 26, 2014 at 10:23 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: >> IIUC, his approach was integration of join-pushdown within FDW APIs, >> however, it does not mean the idea of remote-join is rejected. > > For my part, trying to consider doing remote joins *without* going > through FDWs is just nonsensical. That is, of course, true by definition, but I think it's putting the focus in the wrong place. It's possible that there are other cases when a scan might a plausible path for a joinrel even if there are no foreign tables in play. For example, you could cache the joinrel output and then inject a cache scan as a path for the joinrel. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Wed, Feb 26, 2014 at 3:02 AM, Stephen Frost <sfrost@snowman.net> wrote: > > The line between a foreign table and a local one is becoming blurred > > already, but still, if this is the goal then I really think the > > background worker is where you should be focused, not on this Custom > > Scan API. Consider that, once we've got proper background workers, > > we're going to need new nodes which operate in parallel (or some other > > rejiggering of the nodes- I don't pretend to know exactly what Robert is > > thinking here, and I've apparently forgotten it if he's posted it > > somewhere) and those interfaces may drive changes which would impact the > > Custom Scan API- or worse, make us deprecate or regret having added it > > because now we'll need to break backwards compatibility to add in the > > parallel node capability to satisfy the more general non-GPU case. > > This critique seems pretty odd to me. I haven't had the time to look > at this patch set, but I don't see why anyone would want to use the > background worker facility for GPU acceleration, which is what > KaiGai's trying to accomplish here. Eh, that didn't come out quite right. I had intended it to be more along the lines of "look at what Robert's doing". I was trying to point out that parallel query execution is coming soon thanks to the work on background worker and that parallel query execution might drive changes to the way nodes interact in the executor driving a need to change the API. In other words, CustomScan could easily end up being broken by that change and I'd rather we not have to worry about such breakage. > I seriously doubt there's any real conflict with parallelism here. > Parallelism may indeed add more ways to scan a relation > (ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we > shouldn't have a Custom Scan node too. What about parallel execution through the tree itself, rather than just at specific end nodes like SeqScan and IndexScan? Or parallel aggregates? I agree that simple parallel SeqScan/IndexScan isn't going to change any of the interfaces, but surely we're going for more than that. Indeed, I'm wishing that I had found more time to spend on just a simple select-based Append node which could parallelize I/O across tablespaces and FDWs underneath the Append. > Indeed, my principle concern > about this patch set isn't that it's too specialized, as you seem to > be worrying about, but that it's aiming to satisfy *too many* use > cases. I think FDW join pushdown is a fairly different problem from > adding a custom scan type, and I doubt one patch should try to solve > both problems. Yeah, I've voiced those same concerns later in the thread also, specifically that this punts on nearly everything and expects the implementor to figure it all out. We should be able to do better wrt FDW join-pushdown, etc. Thanks, Stephen
On Fri, Feb 28, 2014 at 10:36 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: >> * Stephen Frost (sfrost@snowman.net) wrote: >> > I don't see how you can be when there hasn't been any discussion that I've >> > seen about how parallel query execution is going to change things for us. >> > >> If parallel query execution changes whole of the structure of plan nodes, >> it will also affect to the interface of custom-scan because it is a thin- >> abstraction of plan-node. However, if parallel execution feature is >> implemented as one of new plan node in addition to existing one, I cannot >> imagine a scenario that affects to the structure of another plan node. > > Let's just say that I have doubts that we'll be able to implement > parallel execution *without* changing the plan node interface in some > way which will require, hopefully minor, changes to all of the nodes. > The issue is that even a minor change would break the custom-scan API > and we'd immediately be in the boat of dealing with complaints regarding > backwards compatibility. Perhaps we can hand-wave that, and we've had > some success changing hook APIs between major releases, but such changes > may also be in ways which wouldn't break in obvious ways or even > possibly be changes which have to be introduced into back-branches. > Parallel query is going to be brand-new real soon and it's reasonable to > think we'll need to make bug-fix changes to it after it's out which > might even involve changes to the API which is developed for it. For what it's worth, and I can't claim to have all the answers here, this doesn't match my expectation. I think we'll do two kinds of parallelism. One will be parallelism within nodes, like parallel sort or parallel seqscan. Any node we parallelize this way is likely to be heavily rewritten, or else to get a sister that looks quite different from the original. The other kind of parallelism will involve pushing a whole subtree of the plan into a different node. In this case we'll need to pass data between nodes in some different way (this was one of the major reasons I designed the shm_mq stuff) but the nodes themselves should change little if at all. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Mar 1, 2014 at 8:49 PM, Stephen Frost <sfrost@snowman.net> wrote: >> This critique seems pretty odd to me. I haven't had the time to look >> at this patch set, but I don't see why anyone would want to use the >> background worker facility for GPU acceleration, which is what >> KaiGai's trying to accomplish here. > > Eh, that didn't come out quite right. I had intended it to be more > along the lines of "look at what Robert's doing". > > I was trying to point out that parallel query execution is coming soon > thanks to the work on background worker and that parallel query > execution might drive changes to the way nodes interact in the executor > driving a need to change the API. In other words, CustomScan could > easily end up being broken by that change and I'd rather we not have to > worry about such breakage. I think the relation is pretty tangential. I'm worried about the possibility that the Custom Scan API is broken *ab initio*, but I'm not worried about a conflict with parallel query. >> I seriously doubt there's any real conflict with parallelism here. >> Parallelism may indeed add more ways to scan a relation >> (ParallelSeqScan, ParallelIndexScan?) but that doesn't mean that we >> shouldn't have a Custom Scan node too. > > What about parallel execution through the tree itself, rather than just > at specific end nodes like SeqScan and IndexScan? Or parallel > aggregates? I agree that simple parallel SeqScan/IndexScan isn't going > to change any of the interfaces, but surely we're going for more than > that. Indeed, I'm wishing that I had found more time to spend on just > a simple select-based Append node which could parallelize I/O across > tablespaces and FDWs underneath the Append. Well, as I said in another recent reply that you probably got after sending this, when you split between nodes, that mostly just has to do with how you funnel the tuples from one node to another. The nodes themselves probably don't even need to know. Or at least that's what I'd hope. I don't see that parallelizing Append is any easier than any other problem in this space. There's no parallel I/O facility, so you need a background worker per append branch to wait on I/O. And you have all the problems of making sure that the workers have the same snapshot, making sure they don't self-deadlock, etc. that you have for any other case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
2014-03-02 10:29 GMT+09:00 Stephen Frost <sfrost@snowman.net>: > * Kohei KaiGai (kaigai@kaigai.gr.jp) wrote: >> As you mentioned, it is a headache for packagers, and does not make >> sense for us if packager disabled the feature that requires proprietary >> drivers. > > No, I disagree with that. I don't expect this use-case to be very > common to begin with and telling individuals that they have to compile > it themselves is certainly not out of the question. > >> In fact, Fedora / RHEL does not admit to distribute software >> under the none open source software license. > > I'm pretty confident you can get RPMs for those distributions. > >> Obviously, nvidia's cuda >> is a library being distributed under the proprietary license, thus out of >> the scope for the Linux distributors. > > This also doesn't make any sense to me- certainly the CUDA libraries are > available under Debian derivatives, along with open-source wrappers for > them like pycuda. > >> It also leads them to turn off the >> feature that shall be linked with proprietary drivers. >> All we can do is to implement these features as extension, then offer >> an option for users to use or not to use. > > No, we can tell individuals who want it that they're going to need to > build with support for it. We don't offer OpenSSL as an extension (I > certainly wish we did- and had a way to replace it w/ GNUTLS or one of > the better licensed options). > I know there is some alternative ways. However, it requires users to take additional knowledge and setting up efforts, also loses the benefit to use distributor's Linux if alternative RPMs are required. I don't want to recommend users such a complicated setting up procedure. >> What I'd like to implement is GPU acceleration that can perform on >> regular tables, not only foreign tables. Also, regarding to the backlog >> in the developer meeting, pluggable executor node is also required >> feature by PG-XC folks to work their project with upstream. >> I think custom-scan feature is capable to host these requirement, >> and does not prevent enhancement FDW features. > > I think you're conflating things again- while it might be possible to > use CustomScan to implement FDW join-pushdown or FDW aggregate-pushdown, > *I* don't think it's the right approach. Regarding the PG-XC > requirement, I expect they're looking for FDW join/aggregate-pushdown > and also see that it *could* be done w/ CustomScan. > The reason why I submitted the part-3 patch (that enhances postgres_fdw for remote-join using custom-scan) is easy to demonstrate the usage of join-replacement by a special scan, with minimum scale of the patch to be reviewed. If we have another idea to demonstrate it, I don't stick on the remot- join feature on foreign tables. Regarding to the PG-XC, I didn't know their exact needs because I didn't attend the cluster meeting, but the someone mentioned about pluggable plan/exec node in this context. > We could punt on the whole thing and drop in hooks which could be used > to replace everything done from the planner through to the executor and > then anyone *could* implement any of the above, and parallel query too. > That doesn't make it the right approach. > That is a problem I pointed out in the last developer meeting. Because we have no way to enhance a part of plan / exec logic by extension, extension has to replace whole of the planner / executor using hooks. It is painful for authors of extensions. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
* Robert Haas (robertmhaas@gmail.com) wrote: > I don't see that parallelizing Append is any easier than any other > problem in this space. There's no parallel I/O facility, so you need > a background worker per append branch to wait on I/O. And you have > all the problems of making sure that the workers have the same > snapshot, making sure they don't self-deadlock, etc. that you have for > any other case. Erm, my thought was to use a select() loop which sends out I/O requests and then loops around waiting to see who finishes it. It doesn't parallelize the CPU cost of getting the rows back to the caller, but it'd at least parallelize the I/O, and if what's underneath is actually a remote FDW running a complex query (because the other side is actually a view), it would be a massive win to have all the remote FDWs executing concurrently instead of serially as we have today. Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > For what it's worth, and I can't claim to have all the answers here, > this doesn't match my expectation. I think we'll do two kinds of > parallelism. One will be parallelism within nodes, like parallel sort > or parallel seqscan. Any node we parallelize this way is likely to be > heavily rewritten, or else to get a sister that looks quite different > from the original. Sure. > The other kind of parallelism will involve pushing > a whole subtree of the plan into a different node. In this case we'll > need to pass data between nodes in some different way (this was one of > the major reasons I designed the shm_mq stuff) but the nodes > themselves should change little if at all. It's that "some different way" of passing data between the nodes that makes me worry, but I hope you're right and we won't actually need to change the interfaces or the nodes very much. Thanks, Stephen
2014-03-02 10:38 GMT+09:00 Robert Haas <robertmhaas@gmail.com>: > On Wed, Feb 26, 2014 at 10:23 AM, Stephen Frost <sfrost@snowman.net> wrote: >> * Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: >>> IIUC, his approach was integration of join-pushdown within FDW APIs, >>> however, it does not mean the idea of remote-join is rejected. >> >> For my part, trying to consider doing remote joins *without* going >> through FDWs is just nonsensical. > > That is, of course, true by definition, but I think it's putting the > focus in the wrong place. It's possible that there are other cases > when a scan might a plausible path for a joinrel even if there are no > foreign tables in play. For example, you could cache the joinrel > output and then inject a cache scan as a path for the joinrel. > That might be an idea to demonstrate usage of custom-scan node, rather than the (ad-hoc) enhancement of postgres_fdw. As I have discussed in another thread, it is available to switch heap reference by cache reference on the fly, it shall be a possible use- case for custom-scan node. So, I'm inclined to drop the portion for postgres_fdw in my submission to focus on custom-scan capability. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> I don't see that parallelizing Append is any easier than any other >> problem in this space. There's no parallel I/O facility, so you need >> a background worker per append branch to wait on I/O. And you have >> all the problems of making sure that the workers have the same >> snapshot, making sure they don't self-deadlock, etc. that you have for >> any other case. > > Erm, my thought was to use a select() loop which sends out I/O requests > and then loops around waiting to see who finishes it. It doesn't > parallelize the CPU cost of getting the rows back to the caller, but > it'd at least parallelize the I/O, and if what's underneath is actually > a remote FDW running a complex query (because the other side is actually > a view), it would be a massive win to have all the remote FDWs executing > concurrently instead of serially as we have today. I can't really make sense of this. In general, what's under each branch of an append node is an arbitrary plan tree, and the only operation you can count on being able to do for each is "get me the next tuple" (i.e. ExecProcNode). Append has no idea whether that involves disk I/O or for what blocks. But even if it did, there's no standard API for issuing an asynchronous read(), which is how we get blocks into shared buffers. We do have an API for requesting the prefetch of a block on platforms with posix_fadvise(), but can't find out whether it's completed using select(), and even if you could you still have to do the actual read() afterwards. For FDWs, one idea might be to kick off the remote query at ExecInitNode() time rather than ExecProcNode() time, at least if the remote query doesn't depend on parameters that aren't available until run time. That actually would allow multiple remote queries to run simultaneously or in parallel with local work. It would also run them in cases where the relevant plan node is never executed, which would be bad but perhaps rare enough not to worry about. Or we could add a new API like ExecPrefetchNode() that tells nodes to prepare to have tuples pulled, and they can do things like kick off asynchronous queries. But I still don't see any clean way for the Append node to find out which one is ready to return results first. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote: > > Erm, my thought was to use a select() loop which sends out I/O requests > > and then loops around waiting to see who finishes it. It doesn't > > parallelize the CPU cost of getting the rows back to the caller, but > > it'd at least parallelize the I/O, and if what's underneath is actually > > a remote FDW running a complex query (because the other side is actually > > a view), it would be a massive win to have all the remote FDWs executing > > concurrently instead of serially as we have today. > > I can't really make sense of this. Sorry, that was a bit hand-wavey since I had posted about it previously here: http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net It'd clearly be more involved than "just build a select() loop" and would require adding an async mechanism. I had been thinking about this primairly with the idea of FDWs and you're right that it'd require more thought to deal with getting data into/through shared_buffers. Still, we seqscan into a ring buffer, I'd think we could make it work but it would require additional work. > For FDWs, one idea might be to kick off the remote query at > ExecInitNode() time rather than ExecProcNode() time, at least if the > remote query doesn't depend on parameters that aren't available until > run time. Right, I had speculated about that also (option #2 in my earlier email). > That actually would allow multiple remote queries to run > simultaneously or in parallel with local work. It would also run them > in cases where the relevant plan node is never executed, which would > be bad but perhaps rare enough not to worry about. This was my primary concern, along with the fact that we explicitly says "don't do that" in the docs for the FDW API. > Or we could add a > new API like ExecPrefetchNode() that tells nodes to prepare to have > tuples pulled, and they can do things like kick off asynchronous > queries. But I still don't see any clean way for the Append node to > find out which one is ready to return results first. Yeah, that's tricky. Thanks, Stephen
On Mon, Mar 3, 2014 at 10:43 AM, Stephen Frost <sfrost@snowman.net> wrote: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote: >> > Erm, my thought was to use a select() loop which sends out I/O requests >> > and then loops around waiting to see who finishes it. It doesn't >> > parallelize the CPU cost of getting the rows back to the caller, but >> > it'd at least parallelize the I/O, and if what's underneath is actually >> > a remote FDW running a complex query (because the other side is actually >> > a view), it would be a massive win to have all the remote FDWs executing >> > concurrently instead of serially as we have today. >> >> I can't really make sense of this. > > Sorry, that was a bit hand-wavey since I had posted about it previously > here: > http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net Huh, somehow I can't remember reading that... but I didn't think I had missed any posts, either. Evidently I did. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > > http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net > > Huh, somehow I can't remember reading that... but I didn't think I had > missed any posts, either. Evidently I did. You and everyone else- you'll note it got exactly zero responses.. Ah well. :) Thanks, Stephen
KaiGai, * Stephen Frost (sfrost@snowman.net) wrote: > And I'm still unconvinced of this approach and worry that it's going to > break more often than it works. That's my 2c on it, but I won't get in > the way if someone else wants to step up and support it. Alright, having heard from Robert (thanks!) regarding his thoughts (which are pretty similar to my own, though he doesn't anticipate issues with API changes), I'm going to step back a bit form the above position. > As I mentioned > up-thread, I'd really like to see FDW join push-down, FDW aggregate > push-down, parallel query execution, and parallel remote-FDW execution > and I don't see this CustomScan approach as the right answer to any of > those. In accordance with the above, what I'd like to see with this patch is removal of the postgres_fdw changes and any changes which were for that support. In addition, I'd like to understand why 'ctidscan' makes any sense to have as an example of what to use this for- if that's valuable, why wouldn't we simply implement that in core? I do want an example in contrib of how to properly use this capability, but I don't think that's it. For one thing, an example where you could have this CustomScan node calling other nodes underneath would be interesting. I realize the CTID scan can't do that directly but I would think your GPU-based system could; after all, if you're running a join or an aggregate with the GPU, the rows could come from nearly anything. Have you considered that, or is the expectation that users will just go off and access the heap and/or whatever indexes directly, like ctidscan does? How would such a requirement be handled? Thanks, Stephen
> > As I mentioned > > up-thread, I'd really like to see FDW join push-down, FDW aggregate > > push-down, parallel query execution, and parallel remote-FDW execution > > and I don't see this CustomScan approach as the right answer to any of > > those. > > In accordance with the above, what I'd like to see with this patch is removal > of the postgres_fdw changes and any changes which were for that support. > I don't argue this approach. It might be useful to *demonstrate* how custom- scan node works as replacement of join, however, > In addition, I'd like to understand why 'ctidscan' makes any sense to have > as an example of what to use this for- if that's valuable, why wouldn't > we simply implement that in core? I do want an example in contrib of how > to properly use this capability, but I don't think that's it. > Do you think it makes sense if my submission was only interface portion without working example? The purpose of ctidscan module is, similar to postgres_fdw, to demonstrate the usage of custom-scan interface with enough small code scale. If tons of code example were attached, nobody will want to review the patch. The "cache_scan" module that I and Haribabu are discussing in another thread also might be a good demonstration for custom-scan interface, however, its code scale is a bit larger than ctidscan. > For one thing, an example where you could have this CustomScan node calling > other nodes underneath would be interesting. I realize the CTID scan can't > do that directly but I would think your GPU-based system could; after all, > if you're running a join or an aggregate with the GPU, the rows could come > from nearly anything. Have you considered that, or is the expectation that > users will just go off and access the heap and/or whatever indexes directly, > like ctidscan does? How would such a requirement be handled? > In case when custom-scan node has underlying nodes, it shall be invoked using ExecProcNode as built-in node doing, then it will be able to fetch tuples come from underlying nodes. Of course, custom-scan provider can perform the tuples come from somewhere as if it came from underlying relation. It is responsibility of extension module. In some cases, it shall be required to return junk system attribute, like ctid, for row-level locks or table updating. It is also responsibility of the extension module (or, should not add custom- path if this custom-scan provider cannot perform as required). Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Mon, Mar 3, 2014 at 9:13 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Robert Haas (robertmhaas@gmail.com) wrote:
> On Sat, Mar 1, 2014 at 9:04 PM, Stephen Frost <sfrost@snowman.net> wrote:> > Erm, my thought was to use a select() loop which sends out I/O requestsSorry, that was a bit hand-wavey since I had posted about it previously
> > and then loops around waiting to see who finishes it. It doesn't
> > parallelize the CPU cost of getting the rows back to the caller, but
> > it'd at least parallelize the I/O, and if what's underneath is actually
> > a remote FDW running a complex query (because the other side is actually
> > a view), it would be a massive win to have all the remote FDWs executing
> > concurrently instead of serially as we have today.
>
> I can't really make sense of this.
here:
http://www.postgresql.org/message-id/20131104032604.GB2706@tamriel.snowman.net
It'd clearly be more involved than "just build a select() loop" and
would require adding an async mechanism. I had been thinking about this
primairly with the idea of FDWs and you're right that it'd require more
thought to deal with getting data into/through shared_buffers. Still,
we seqscan into a ring buffer, I'd think we could make it work but it
would require additional work.Right, I had speculated about that also (option #2 in my earlier email).
> For FDWs, one idea might be to kick off the remote query at
> ExecInitNode() time rather than ExecProcNode() time, at least if the
> remote query doesn't depend on parameters that aren't available until
> run time.
During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries to foreign servers, those would be fired while EXPLAINing a query as well. We want to avoid that. Instead, we can run EXPLAIN on that query at foreign server. But again, not all foreign servers would be able to EXPLAIN the query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(), if it's for EXPLAIN (except for ANALYSE may be).
> That actually would allow multiple remote queries to runThis was my primary concern, along with the fact that we explicitly says
> simultaneously or in parallel with local work. It would also run them
> in cases where the relevant plan node is never executed, which would
> be bad but perhaps rare enough not to worry about.
"don't do that" in the docs for the FDW API.Yeah, that's tricky.
> Or we could add a
> new API like ExecPrefetchNode() that tells nodes to prepare to have
> tuples pulled, and they can do things like kick off asynchronous
> queries. But I still don't see any clean way for the Append node to
> find out which one is ready to return results first.
Thanks,
Stephen
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
According to the suggestion, I dropped the enhancement of postgres_fdw on top of custom-scan interface from my submission, and also move the supplemental functions into the part-1 portion (implementation of custom-scan interface). Even though the ctidscan module is under discussion, I still include this module because of its usefulness as demonstration / example of custom-scan node. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Tuesday, March 04, 2014 8:26 AM > To: 'Stephen Frost'; Kohei KaiGai > Cc: Shigeru Hanada; Jim Mlodgenski; Robert Haas; Tom Lane; PgHacker; Peter > Eisentraut > Subject: RE: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > > > As I mentioned > > > up-thread, I'd really like to see FDW join push-down, FDW aggregate > > > push-down, parallel query execution, and parallel remote-FDW > > > execution and I don't see this CustomScan approach as the right > > > answer to any of those. > > > > In accordance with the above, what I'd like to see with this patch is > > removal of the postgres_fdw changes and any changes which were for that > support. > > > I don't argue this approach. It might be useful to *demonstrate* how custom- > scan node works as replacement of join, however, > > > In addition, I'd like to understand why 'ctidscan' makes any sense to > > have as an example of what to use this for- if that's valuable, why > > wouldn't we simply implement that in core? I do want an example in > > contrib of how to properly use this capability, but I don't think that's > it. > > > Do you think it makes sense if my submission was only interface portion > without working example? The purpose of ctidscan module is, similar to > postgres_fdw, to demonstrate the usage of custom-scan interface with enough > small code scale. If tons of code example were attached, nobody will want > to review the patch. > The "cache_scan" module that I and Haribabu are discussing in another thread > also might be a good demonstration for custom-scan interface, however, its > code scale is a bit larger than ctidscan. > > > For one thing, an example where you could have this CustomScan node > > calling other nodes underneath would be interesting. I realize the > > CTID scan can't do that directly but I would think your GPU-based > > system could; after all, if you're running a join or an aggregate with > > the GPU, the rows could come from nearly anything. Have you > > considered that, or is the expectation that users will just go off and > > access the heap and/or whatever indexes directly, like ctidscan does? > How would such a requirement be handled? > > > In case when custom-scan node has underlying nodes, it shall be invoked > using ExecProcNode as built-in node doing, then it will be able to fetch > tuples come from underlying nodes. Of course, custom-scan provider can > perform the tuples come from somewhere as if it came from underlying relation. > It is responsibility of extension module. In some cases, it shall be required > to return junk system attribute, like ctid, for row-level locks or table > updating. > It is also responsibility of the extension module (or, should not add custom- > path if this custom-scan provider cannot perform as required). > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com>
Attachment
On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote: >> As I mentioned >> up-thread, I'd really like to see FDW join push-down, FDW aggregate >> push-down, parallel query execution, and parallel remote-FDW execution >> and I don't see this CustomScan approach as the right answer to any of >> those. > > In accordance with the above, what I'd like to see with this patch is > removal of the postgres_fdw changes and any changes which were for that > support. In addition, I'd like to understand why 'ctidscan' makes any > sense to have as an example of what to use this for- if that's valuable, > why wouldn't we simply implement that in core? I do want an example in > contrib of how to properly use this capability, but I don't think that's > it. I suggested that example to KaiGai at last year's PGCon. It may indeed be something we want to have in core, but right now we don't. More generally, I think this discussion is focusing on the wrong set of issues. The threshold issue for this patch is whether there is a set of hook points that enable a workable custom-scan functionality, and whether KaiGai has correctly identified them. In other words, I think we should be worrying about whether KaiGai's found all of the places that need to be modified to support a custom scan, and whether the modifications he's made to each of those places are correct and adequate. Whether he's picked the best possible example does not strike me as a matter of principal concern, and it's far too late to tell him he's got to go pick a different one at this point anyway. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Kouhei Kaigai (kaigai@ak.jp.nec.com) wrote: > Do you think it makes sense if my submission was only interface portion > without working example? No, we're pretty strongly against putting in interfaces which don't have working examples in contrib- for one thing, we want to know when we break it. > The purpose of ctidscan module is, similar to > postgres_fdw, to demonstrate the usage of custom-scan interface with > enough small code scale. If tons of code example were attached, nobody > will want to review the patch. I gathered that's why it was included. Is the plan to eventually submit something larger to go into -contrib which will use this interface? Or will it always be external? > The "cache_scan" module that I and Haribabu are discussing in another > thread also might be a good demonstration for custom-scan interface, > however, its code scale is a bit larger than ctidscan. That does sound interesting though I'm curious about the specifics... > > For one thing, an example where you could have this CustomScan node calling > > other nodes underneath would be interesting. I realize the CTID scan can't > > do that directly but I would think your GPU-based system could; after all, > > if you're running a join or an aggregate with the GPU, the rows could come > > from nearly anything. Have you considered that, or is the expectation that > > users will just go off and access the heap and/or whatever indexes directly, > > like ctidscan does? How would such a requirement be handled? > > > In case when custom-scan node has underlying nodes, it shall be invoked using > ExecProcNode as built-in node doing, then it will be able to fetch tuples > come from underlying nodes. Of course, custom-scan provider can perform the > tuples come from somewhere as if it came from underlying relation. It is > responsibility of extension module. In some cases, it shall be required to > return junk system attribute, like ctid, for row-level locks or table updating. > It is also responsibility of the extension module (or, should not add custom- > path if this custom-scan provider cannot perform as required). Right, tons of work to do to make it all fit together and play nice- what I was trying to get at is: has this actually been done? Is the GPU extension that you're talking about as the use-case for this been written? How does it handle all of the above? Or are we going through all these gyrations in vain hope that it'll actually all work when someone tries to use it for something real? Thanks, Stephen
On Tue, Mar 4, 2014 at 7:39 PM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote:I suggested that example to KaiGai at last year's PGCon. It may
>> As I mentioned
>> up-thread, I'd really like to see FDW join push-down, FDW aggregate
>> push-down, parallel query execution, and parallel remote-FDW execution
>> and I don't see this CustomScan approach as the right answer to any of
>> those.
>
> In accordance with the above, what I'd like to see with this patch is
> removal of the postgres_fdw changes and any changes which were for that
> support. In addition, I'd like to understand why 'ctidscan' makes any
> sense to have as an example of what to use this for- if that's valuable,
> why wouldn't we simply implement that in core? I do want an example in
> contrib of how to properly use this capability, but I don't think that's
> it.
indeed be something we want to have in core, but right now we don't.
More generally, I think this discussion is focusing on the wrong set
of issues. The threshold issue for this patch is whether there is a
set of hook points that enable a workable custom-scan functionality,
and whether KaiGai has correctly identified them. In other words, I
think we should be worrying about whether KaiGai's found all of the
places that need to be modified to support a custom scan, and whether
the modifications he's made to each of those places are correct and
adequate. Whether he's picked the best possible example does not
strike me as a matter of principal concern, and it's far too late to
tell him he's got to go pick a different one at this point anyway.
There are so many places in the planner and optimizer code, where we create various types of paths and the number of such paths is again significant, if not large. If we want the custom scan contrib module to work in all those cases (which seems to be the intention here), then we have to expose so many hooks. I don't think all of those hooks have been identified. Second problem is, the functions which create those paths have signatures difficult enough to be exposed as hooks. Take example of the join hook that was exposed. These function signatures do get changed from time to time and thus corresponding hooks need to be changed to. This is going to be a maintenance burden.
So, unless we have some way of exposing these hooks such that the definitions of the hooks are independent of the internal function signatures, supporting custom scan looks difficult.
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
--
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company
* Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries > to foreign servers, those would be fired while EXPLAINing a query as well. > We want to avoid that. Instead, we can run EXPLAIN on that query at foreign > server. But again, not all foreign servers would be able to EXPLAIN the > query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(), > if it's for EXPLAIN (except for ANALYSE may be). Agreed that we wouldn't want to actually run a query when it's just being explain'd. If the FDW can't tell the difference then we'd need to address that, of course. A similar issue would, presumably, be around prepare/execute, though I haven't looked yet. These kinds of issues are why it was option '#2' instead of '#1'. :) I'm not sure they're able to be addressed. :/ Thanks, Stephen
2014-03-04 23:09 GMT+09:00 Robert Haas <robertmhaas@gmail.com>: > On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote: >>> As I mentioned >>> up-thread, I'd really like to see FDW join push-down, FDW aggregate >>> push-down, parallel query execution, and parallel remote-FDW execution >>> and I don't see this CustomScan approach as the right answer to any of >>> those. >> >> In accordance with the above, what I'd like to see with this patch is >> removal of the postgres_fdw changes and any changes which were for that >> support. In addition, I'd like to understand why 'ctidscan' makes any >> sense to have as an example of what to use this for- if that's valuable, >> why wouldn't we simply implement that in core? I do want an example in >> contrib of how to properly use this capability, but I don't think that's >> it. > > I suggested that example to KaiGai at last year's PGCon. It may > indeed be something we want to have in core, but right now we don't. > > More generally, I think this discussion is focusing on the wrong set > of issues. The threshold issue for this patch is whether there is a > set of hook points that enable a workable custom-scan functionality, > and whether KaiGai has correctly identified them. In other words, I > think we should be worrying about whether KaiGai's found all of the > places that need to be modified to support a custom scan, and whether > the modifications he's made to each of those places are correct and > adequate. Whether he's picked the best possible example does not > strike me as a matter of principal concern, and it's far too late to > tell him he's got to go pick a different one at this point anyway. > That is definitely the point to be discussed here. Even though I *believe* I could put the callbacks needed to implement alternative join / scan, it may lead different conclusion from other person's viewpoint. At least, I could implement a custom-scan as an alternative of join using postgres_fdw, however, it's uncertain whether I could cover all the possible case we should care about. So, I'd like to see comments from others. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
* Robert Haas (robertmhaas@gmail.com) wrote: > More generally, I think this discussion is focusing on the wrong set > of issues. The threshold issue for this patch is whether there is a > set of hook points that enable a workable custom-scan functionality, > and whether KaiGai has correctly identified them. Right- I was trying to hit on that in my email this morning. Thanks, Stephen
Stephen Frost <sfrost@snowman.net> writes: > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: >> During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries >> to foreign servers, those would be fired while EXPLAINing a query as well. >> We want to avoid that. Instead, we can run EXPLAIN on that query at foreign >> server. But again, not all foreign servers would be able to EXPLAIN the >> query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(), >> if it's for EXPLAIN (except for ANALYSE may be). > Agreed that we wouldn't want to actually run a query when it's just > being explain'd. If the FDW can't tell the difference then we'd need to > address that, of course. EXEC_FLAG_EXPLAIN_ONLY ... regards, tom lane
2014-03-04 23:10 GMT+09:00 Stephen Frost <sfrost@snowman.net>: >> The "cache_scan" module that I and Haribabu are discussing in another >> thread also might be a good demonstration for custom-scan interface, >> however, its code scale is a bit larger than ctidscan. > > That does sound interesting though I'm curious about the specifics... > This module caches a part of columns, but not all, thus allows to hold much larger number of records for a particular amount of RAM than the standard buffer cache. It is constructed on top of custom-scan node, and also performs a new hook for a callback on page vacuuming to invalidate its cache entry. (I originally designed this module for demonstration of on-vacuum hook because I already made ctidscan and postgres_fdw enhancement for custom-scan node, by the way.) >> > For one thing, an example where you could have this CustomScan node calling >> > other nodes underneath would be interesting. I realize the CTID scan can't >> > do that directly but I would think your GPU-based system could; after all, >> > if you're running a join or an aggregate with the GPU, the rows could come >> > from nearly anything. Have you considered that, or is the expectation that >> > users will just go off and access the heap and/or whatever indexes directly, >> > like ctidscan does? How would such a requirement be handled? >> > >> In case when custom-scan node has underlying nodes, it shall be invoked using >> ExecProcNode as built-in node doing, then it will be able to fetch tuples >> come from underlying nodes. Of course, custom-scan provider can perform the >> tuples come from somewhere as if it came from underlying relation. It is >> responsibility of extension module. In some cases, it shall be required to >> return junk system attribute, like ctid, for row-level locks or table updating. >> It is also responsibility of the extension module (or, should not add custom- >> path if this custom-scan provider cannot perform as required). > > Right, tons of work to do to make it all fit together and play nice- > what I was trying to get at is: has this actually been done? Is the GPU > extension that you're talking about as the use-case for this been > written? > Its chicken-and-egg problem, because implementation of the extension module fully depends on the interface from the backend. Unlike commit-fest, here is no deadline for my extension module, so I put higher priority on the submission of custom-scan node, than the extension. However, GPU extension is not fully theoretical stuff. I had implemented a prototype using FDW APIs, and it allowed to accelerate sequential scan if query has enough complicated qualifiers. See the movie (from 2:45). The table t1 is a regular table, and t2 is a foreign table. Both of them has same contents, however, response time of the query is much faster, if GPU acceleration is working. http://www.youtube.com/watch?v=xrUBffs9aJ0 So, I'm confident that GPU acceleration will have performance gain once it can run regular tables, not only foreign tables. > How does it handle all of the above? Or are we going through > all these gyrations in vain hope that it'll actually all work when > someone tries to use it for something real? > I don't talk something difficult. If junk attribute requires to return "ctid" of the tuple, custom-scan provider reads a tuple of underlying relation then includes a correct item pointer. If this custom-scan is designed to run on the cache, all it needs to do is reconstruct a tuple with correct item-pointer (thus this cache needs to have ctid also). It's all I did in the cache_scan module. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > Stephen Frost <sfrost@snowman.net> writes: > > * Ashutosh Bapat (ashutosh.bapat@enterprisedb.com) wrote: > >> During EXPLAIN, ExecInitNode() is called. If ExecInitNode() fires queries > >> to foreign servers, those would be fired while EXPLAINing a query as well. > >> We want to avoid that. Instead, we can run EXPLAIN on that query at foreign > >> server. But again, not all foreign servers would be able to EXPLAIN the > >> query e.g. file_fdw. OR totally avoid firing query during ExecInitNode(), > >> if it's for EXPLAIN (except for ANALYSE may be). > > > Agreed that we wouldn't want to actually run a query when it's just > > being explain'd. If the FDW can't tell the difference then we'd need to > > address that, of course. > > EXEC_FLAG_EXPLAIN_ONLY ... Yeah, figured there should be a way. Still not sure that kicking the query off from ExecInitNode() is a good idea though. Perhaps it could be optional somehow. I really like the idea of being able to make Append work in an async mode where it's pulling data from multiple sources at the same time, but it's a fair bit of work. Thanks, Stephen
* Robert Haas (robertmhaas@gmail.com) wrote: > On Mon, Mar 3, 2014 at 5:15 PM, Stephen Frost <sfrost@snowman.net> wrote: > > In accordance with the above, what I'd like to see with this patch is > > removal of the postgres_fdw changes and any changes which were for that > > support. In addition, I'd like to understand why 'ctidscan' makes any > > sense to have as an example of what to use this for- if that's valuable, > > why wouldn't we simply implement that in core? I do want an example in > > contrib of how to properly use this capability, but I don't think that's > > it. > > I suggested that example to KaiGai at last year's PGCon. It may > indeed be something we want to have in core, but right now we don't. Alright- so do you feel that the simple ctidscan use-case is a sufficient justification and example of how this can be generally useful that we should be adding these hooks to core..? I'm willing to work through the patch and clean it up this weekend if we agree that it's useful and unlikely to immediately be broken by expected changes.. Thanks, Stephen
I apologize for not having paid much attention to this thread so far. It kept getting stuck on my "to look at later" queue. Anyway, I've taken a preliminary look at the v7 patch now. While the patch seems roughly along the lines of what we talked about last PGCon, I share Stephen's unease about a lot of the details. It's not entirely clear that these hooks are really good for anything, and it's even less clear what APIs the hook functions should be expected to depend on. I really do not like the approach embodied in the later patches of "oh, we'll just expose whatever static planner functions seem convenient". That's not an approach that's available when doing actual external development of an extension, and even if it were that doesn't make it a good idea. The larger the exposed surface of functions the harder it is to know what's safe to change. Anyway, on to specifics: * Please drop the whole register_custom_provider/get_custom_provider API. There is no reason other than debugging for a provider to have a name at all, and if we expect providers to have unique names then that creates a collision risk for independently-created extensions. AFAICS, it's sufficient for a function hooked into one of the add-a-path hooks to include a pointer to a struct-of-function-pointers in the Path object it returns, and similarly the CustomScan Plan object can contain a pointer inserted when it's created. I don't object to having a name field in the function pointer structs for debugging reasons, but I don't want any lookups being done on it. * The function-struct pointers should be marked const in the referencing nodes, to indicate that the core code won't be modifying or copying them. In practice they'd probably be statically allocated constants in the extensions anyway. * The patch does lots of violence to the separation between planner and executor, starting with the decision to include nodes/relation.h in executor.h. That will not do at all. I see that you did that because you wanted to make ExecSupportsMarkRestore take a Path, but we need some other answer. One slightly grotty answer is to invent two different customscan Plan types, one that supports mark/restore and one that doesn't, so that ExecSupportsMarkRestore could still just look at the Plan type tag. (BTW, path->pathtype is supposed to contain the node tag of the Plan node that the path would produce. Putting T_CustomPath in it is entirely tone-deaf.) Another way would be to remove ExecSupportsMarkRestore in favor of some new function in the planner; but it's good to keep it in execAmi.c since that has other knowledge of which plan types support mark/restore. * More generally, I'm not convinced that exactly one Path type and exactly one Plan type is going to get us very far. It seems rather ugly to use the same Plan type for both scan and join nodes, and what will happen if somebody wants to build a custom Append node, or something else that has neither zero nor two subplans? * nodeCustom.h is being completely abused. That should only export the functions in nodeCustom.c, which are going to be pretty much one-liners anyway. The right place to put the function pointer struct definitions is someplace else. I'd be inclined to start by separating the function pointers into two structs, one for functions needed for a Path and one for functions needed for a Plan, so that you don't have this problem of having to import everything the planner knows into an executor header or vice versa. Most likely you could just put the Path function pointer struct declaration next to CustomPath in relation.h, and the one for Plans next to CustomPlan (or the variants thereof) in plannodes.h. * The set of fields provided in CustomScan seems nonsensical. I'm not even sure that it should be derived from Scan; that's okay for plan types that actually are scans of a base relation, but it's confusing overhead for anything that's say a join, or a custom sort node, or anything like that. Maybe one argument for multiple plan node types is that one would be derived from Scan and one directly from Plan. * More generally, what this definition for CustomScan exposes is that we have no idea whatever what fields a custom plan node might need. I'm inclined to think that what we should be assuming is that any custom path or plan node is really an object of a struct type known only to its providing extension, whose first field is the CustomPath or CustomPlan struct known to the core backend. (Think C++ subclassing.) This would imply that copyfuncs/equalfuncs/outfuncs support would have to be provided by the extension, which is in principle possible if we add function pointers for those operations to the struct linked to from the path/plan object. (Notationally this might be a bit of a pain, since the macros that we use in the functions in copyfuncs.c etc aren't public. Not sure if it's worth exposing those somewhere, or if people should just copy/paste them.) This approach would probably also avoid the need for the klugy bitmapset representation you propose in patch 3. * This also implies that create_customscan_plan is completely bogus. A custom plan provider *must* provide a callback function for that, because only it will know how big a node to palloc. There are far too many assumptions in create_customscan_plan anyway; I think there is probably nothing at all in that function that shouldn't be getting done by the custom provider instead. * Likewise, there is way too much hard-wired stuff in explain.c. It should not be optional whether a custom plan provider provides an explain support function, and that function should be doing pretty much everything involved in printing the node. * I don't believe in the hard-wired implementation in setrefs.c either. * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc). That's at best badly thought out, and at worst a source of new bugs. Under what circumstances is a custom plan node going to contain any Vars that don't reduce to either scan or input-plan variables? None, because that would imply that it was doing something unrelated to the requested query. * The API for add_join_path_hook seems overcomplex, as well as too full of implementation details that should remain private to joinpath.c. I particularly object to passing the mergeclause lists, which seem unlikely to be of interest for non-mergejoin plan types anyway. More generally, it seems likely that this hook is at the wrong level of abstraction; it forces the hook function to concern itself with a lot of stuff about join legality and parameterization (which I note your patch3 code fails to do; but that's not an optional thing). * After a bit of reflection I think the best thing to do might be to ditch add_join_path_hook for 9.4, on these grounds: 1. You've got enough to do to get the rest of the patch committable. 2. Like Stephen, I feel that the proposed usage as a substitute for FDW-based foreign join planning is not the direction we want to travel. 3. Without that, the use case for new join nodes seems pretty marginal, as opposed to say alternative sort-node implementations (a case not supported by this patch, except to the extent that you could use them in explicit-sort mergejoins if you duplicated large parts of joinpath.c). * Getting nitpicky ... what is the rationale for the doubled underscore in the CUSTOM__ flag names? That's just a typo waiting to happen IMO. * Why is there more than one call site for add_scan_path_hook? I don't see the need for the calling macro with the randomly inconsistent name, either. * The test arrangement for contrib/ctidscan is needlessly complex, and testing it in the core tests is a bogus idea anyway. Why not just let it contain its own test script like most other contrib modules? That's all I've got for now. I've not really looked at the extension code in either patch2 or patch3, just at the changes in the core code. regards, tom lane
On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote: > Alright- so do you feel that the simple ctidscan use-case is a > sufficient justification and example of how this can be generally > useful that we should be adding these hooks to core..? I'm willing to > work through the patch and clean it up this weekend if we agree that > it's useful and unlikely to immediately be broken by expected changes.. Yeah, I think it's useful. But based on Tom's concurrently-posted review, I think there's probably a good deal of work left here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
* Robert Haas (robertmhaas@gmail.com) wrote: > On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote: > > Alright- so do you feel that the simple ctidscan use-case is a > > sufficient justification and example of how this can be generally > > useful that we should be adding these hooks to core..? I'm willing to > > work through the patch and clean it up this weekend if we agree that > > it's useful and unlikely to immediately be broken by expected changes.. > > Yeah, I think it's useful. But based on Tom's concurrently-posted > review, I think there's probably a good deal of work left here. Yeah, it certainly looks like it. KaiGai- will you have time to go over and address Tom's concerns..? Thanks! Stephen
2014-03-05 5:52 GMT+09:00 Stephen Frost <sfrost@snowman.net>: > * Robert Haas (robertmhaas@gmail.com) wrote: >> On Tue, Mar 4, 2014 at 2:34 PM, Stephen Frost <sfrost@snowman.net> wrote: >> > Alright- so do you feel that the simple ctidscan use-case is a >> > sufficient justification and example of how this can be generally >> > useful that we should be adding these hooks to core..? I'm willing to >> > work through the patch and clean it up this weekend if we agree that >> > it's useful and unlikely to immediately be broken by expected changes.. >> >> Yeah, I think it's useful. But based on Tom's concurrently-posted >> review, I think there's probably a good deal of work left here. > > Yeah, it certainly looks like it. > > KaiGai- will you have time to go over and address Tom's concerns..? > Yes, I need to do. Let me take it through the later half of this week and the weekend. So, I'd like to submit revised one by next Monday. Thanks, -- KaiGai Kohei <kaigai@kaigai.gr.jp>
Tom, thanks for your detailed comments. > I apologize for not having paid much attention to this thread so far. > It kept getting stuck on my "to look at later" queue. Anyway, I've taken > a preliminary look at the v7 patch now. > > While the patch seems roughly along the lines of what we talked about last > PGCon, I share Stephen's unease about a lot of the details. It's not > entirely clear that these hooks are really good for anything, and it's even > less clear what APIs the hook functions should be expected to depend on. > I really do not like the approach embodied in the later patches of "oh, > we'll just expose whatever static planner functions seem convenient". > That's not an approach that's available when doing actual external > development of an extension, and even if it were that doesn't make it a > good idea. The larger the exposed surface of functions the harder it is > to know what's safe to change. > Hmm. It needs a clear reasoning to expose the function rather than its convenience. > Anyway, on to specifics: > > * Please drop the whole register_custom_provider/get_custom_provider API. > There is no reason other than debugging for a provider to have a name at > all, and if we expect providers to have unique names then that creates a > collision risk for independently-created extensions. AFAICS, it's > sufficient for a function hooked into one of the add-a-path hooks to include > a pointer to a struct-of-function-pointers in the Path object it returns, > and similarly the CustomScan Plan object can contain a pointer inserted > when it's created. I don't object to having a name field in the function > pointer structs for debugging reasons, but I don't want any lookups being > done on it. > One thing I was worrying about is how copyObject() and nodeToString() support set of function pointer tables around custom-scan node, however, you suggested to change the assumption here. So, I think these functions become unnecessary. > * The function-struct pointers should be marked const in the referencing > nodes, to indicate that the core code won't be modifying or copying them. > In practice they'd probably be statically allocated constants in the > extensions anyway. > OK, > * The patch does lots of violence to the separation between planner and > executor, starting with the decision to include nodes/relation.h in > executor.h. That will not do at all. I see that you did that because you > wanted to make ExecSupportsMarkRestore take a Path, but we need some other > answer. One slightly grotty answer is to invent two different customscan > Plan types, one that supports mark/restore and one that doesn't, so that > ExecSupportsMarkRestore could still just look at the Plan type tag. > (BTW, path->pathtype is supposed to contain the node tag of the Plan node > that the path would produce. Putting T_CustomPath in it is entirely > tone-deaf.) Another way would be to remove ExecSupportsMarkRestore in > favor of some new function in the planner; but it's good to keep it in > execAmi.c since that has other knowledge of which plan types support > mark/restore. > OK, I'll add one derivative node delivertive plan node type, CustomScanMarkRestore for instance. Probably, it shall be populated on the create_customscan_plan() according to the flag being set on the CustomPath. > * More generally, I'm not convinced that exactly one Path type and exactly > one Plan type is going to get us very far. It seems rather ugly to use > the same Plan type for both scan and join nodes, and what will happen if > somebody wants to build a custom Append node, or something else that has > neither zero nor two subplans? > In the previous discussion, CustomJoin will be nonsense because we know limited number of join algorithms: nest-loop, hash-join and merge-join, unlike variation of logic to scan relations. Also, IIUC, someone didn't want to add custom- something node types for each built-in types. So, we concluded to put CustomScan node to replace built-in join / scan at that time. Regarding to the Append node, it probably needs to be enhanced to have list of subplans on CustomScan, or add individual CustomAppend node, or opaque "CustomPlan" may be sufficient if it handles EXPLAIN by itself. > * nodeCustom.h is being completely abused. That should only export the > functions in nodeCustom.c, which are going to be pretty much one-liners > anyway. The right place to put the function pointer struct definitions > is someplace else. I'd be inclined to start by separating the function > pointers into two structs, one for functions needed for a Path and one for > functions needed for a Plan, so that you don't have this problem of having > to import everything the planner knows into an executor header or vice versa. > Most likely you could just put the Path function pointer struct declaration > next to CustomPath in relation.h, and the one for Plans next to CustomPlan > (or the variants thereof) in plannodes.h. > Yes. I didn't have clear idea where we should put the definition of interfaces. Probably, InitCustomScanPlan (maybe, CreateCustomScanPlan) shall be moved to relation.h, and rest of callbacks shall be moved to plannodes.h. > * The set of fields provided in CustomScan seems nonsensical. I'm not even > sure that it should be derived from Scan; that's okay for plan types that > actually are scans of a base relation, but it's confusing overhead for > anything that's say a join, or a custom sort node, or anything like that. > Maybe one argument for multiple plan node types is that one would be derived > from Scan and one directly from Plan. > The reason why CustomScan is derived from Scan is, some of backend code wants to know rtindex of the relation to be referenced by this CustomScan. The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c and explain.c. The usage in nodeCustom.c is just for service routines, and the usage in setrefs.c can be moved to the extension according to your suggestion. We need to investigate the usage in explain.c; ExplainPreScanNode() walks around the nodes to collect relids referenced in this query. If we don't want to put a callback for this specific usage, it is a reasonable choice to show the backend the associated scanrelid of CustomScan. Is it a confusable rule, if extension has to set 0 when a particular relation is not scanned in this CustomScan. > * More generally, what this definition for CustomScan exposes is that we > have no idea whatever what fields a custom plan node might need. I'm > inclined to think that what we should be assuming is that any custom path > or plan node is really an object of a struct type known only to its providing > extension, whose first field is the CustomPath or CustomPlan struct known > to the core backend. (Think C++ subclassing.) This would imply that > copyfuncs/equalfuncs/outfuncs support would have to be provided by the > extension, which is in principle possible if we add function pointers for > those operations to the struct linked to from the path/plan object. > (Notationally this might be a bit of a pain, since the macros that we use > in the functions in copyfuncs.c etc aren't public. Not sure if it's worth > exposing those somewhere, or if people should just copy/paste them.) This > approach would probably also avoid the need for the klugy bitmapset > representation you propose in patch 3. > It's a breakthrough! Probably, Path node needs to have a callback on outfuncs.c. Also, Plan node needs to have a callback on copyfuncs.c and outfuncs.c. I think, prototype of the callback functions are not specific to custom- scan node, so it should be declared in the nodes/nodes.h. > * This also implies that create_customscan_plan is completely bogus. > A custom plan provider *must* provide a callback function for that, because > only it will know how big a node to palloc. There are far too many > assumptions in create_customscan_plan anyway; I think there is probably > nothing at all in that function that shouldn't be getting done by the custom > provider instead. > OK, InitCustomScanPlan shall become CreateCustomScanPlan probably, to return an object being palloc()'ed with arbitrary size. > * Likewise, there is way too much hard-wired stuff in explain.c. It should > not be optional whether a custom plan provider provides an explain support > function, and that function should be doing pretty much everything involved > in printing the node. > Probably, the hunk around show_customscan_info() call can be entirely moved to the extension side. If so, I want ExplainNode() being an extern function, because it allows extensions to print underlying plan-nodes. > * I don't believe in the hard-wired implementation in setrefs.c either. > Are you saying the hard-wired portion in setrefs.c can be moved to the extension side? If fix_scan_expr() become extern function, I think it is feasible. > * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc). > That's at best badly thought out, and at worst a source of new bugs. > Under what circumstances is a custom plan node going to contain any Vars > that don't reduce to either scan or input-plan variables? None, because > that would imply that it was doing something unrelated to the requested > query. > I also want to rid it, if we have alternative idea to solve the issue. The varno of Var-node originally has rt-index of the relation being referenced, then setrefs.c adjusts it according to the offset and relations type to be referenced. In case of Var-node that references joined-relations, it shall be replaced to either INNER_VAR or OUTER_VAR according the location of underlying relations. It eventually makes ExecEvalScalarVar() to reference either ecxt_innertuple or ecxt_outertuple, however, it is problematic if we already consolidated tuples come from the both side into one. For example, the enhanced postgres_fdw fetches the result set of remote join query, thus a tuple contains the fields come from both side. In this case, what varno shall be suitable to put? > * The API for add_join_path_hook seems overcomplex, as well as too full > of implementation details that should remain private to joinpath.c. > I particularly object to passing the mergeclause lists, which seem unlikely > to be of interest for non-mergejoin plan types anyway. > More generally, it seems likely that this hook is at the wrong level of > abstraction; it forces the hook function to concern itself with a lot of > stuff about join legality and parameterization (which I note your patch3 > code fails to do; but that's not an optional thing). > I'd like to see if you have idea where the hook shall be located, and which kind of abstraction is suitable. > * After a bit of reflection I think the best thing to do might be to ditch > add_join_path_hook for 9.4, on these grounds: > 1. You've got enough to do to get the rest of the patch committable. > 2. Like Stephen, I feel that the proposed usage as a substitute for FDW-based > foreign join planning is not the direction we want to travel. > 3. Without that, the use case for new join nodes seems pretty marginal, > as opposed to say alternative sort-node implementations (a case not > supported by this patch, except to the extent that you could use them in > explicit-sort mergejoins if you duplicated large parts of joinpath.c). > Are you suggesting an alternative merge join path that uses sort-node on top of custom-scan node, aren't you? Probably, this path shall be constructed using existing MergeJoin path that has one or two CustomScan node for sorting or scan and sorting. Can I share your suggestion correctly? > * Getting nitpicky ... what is the rationale for the doubled underscore > in the CUSTOM__ flag names? That's just a typo waiting to happen IMO. > Sorry, I intended the double underline as a separator towards prefix portion of this label. > * Why is there more than one call site for add_scan_path_hook? I don't > see the need for the calling macro with the randomly inconsistent name, > either. > Where is the best place to do? Even though I cannot imagine the situation to run sub-query or cte by extensions, its path is constructed during set_rel_size(), so I had to put the hook for each set_xxxx_pathlist() functions. > * The test arrangement for contrib/ctidscan is needlessly complex, and > testing it in the core tests is a bogus idea anyway. Why not just let it > contain its own test script like most other contrib modules? > I thought the regression test of CustomScan interface is useful in the core test. However, it seems to me a reasonable suggestion to implement this test as usual contrib regression test. So, I'll adjust it. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: >> * Please drop the whole register_custom_provider/get_custom_provider API. > One thing I was worrying about is how copyObject() and nodeToString() support > set of function pointer tables around custom-scan node, however, you suggested > to change the assumption here. So, I think these functions become unnecessary. If we allow the extension to control copyObject behavior, it can do what it likes with the function-struct pointer. I think the typical case would be that it's a simple pointer to a never-copied static constant. But you could imagine that it's a pointer to a struct embedded further down in the custom path or plan node, if the extension wants different functions for different plans. If we had to support stringToNode() for paths or plans, things would get much more complicated, but we don't (and there are already lots of other things that would be difficult for that). > The reason why CustomScan is derived from Scan is, some of backend code > wants to know rtindex of the relation to be referenced by this CustomScan. > The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c and > explain.c. The usage in nodeCustom.c is just for service routines, and the > usage in setrefs.c can be moved to the extension according to your suggestion. > We need to investigate the usage in explain.c; ExplainPreScanNode() walks > around the nodes to collect relids referenced in this query. If we don't > want to put a callback for this specific usage, it is a reasonable choice > to show the backend the associated scanrelid of CustomScan. I think we have to add another callback for that :-(. It's a pain since it's such a trivial point; but the existing code cannot support a custom node referencing more than one RTE, which seems possible for join or append type cases. > Probably, the hunk around show_customscan_info() call can be entirely moved > to the extension side. If so, I want ExplainNode() being an extern function, > because it allows extensions to print underlying plan-nodes. I haven't looked at what explain.c would have to expose to make this workable, but yeah, we will probably have to export a few things. > Are you saying the hard-wired portion in setrefs.c can be moved to the > extension side? If fix_scan_expr() become extern function, I think it > is feasible. My recollection is that fix_scan_expr() is a bit specialized. Not sure exactly what we'd have to export there --- but we'd have to do it anyway. What you had in the patch was a hook that could be called, but no way for it to do what it would likely need to do. >> * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc). > In case of Var-node that references joined-relations, it shall be replaced to > either INNER_VAR or OUTER_VAR according the location of underlying > relations. It eventually makes ExecEvalScalarVar() to reference either > ecxt_innertuple or ecxt_outertuple, however, it is problematic if we already > consolidated tuples come from the both side into one. So? If you did that, then you wouldn't have renumbered the Vars as INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if it is needed, then there would also be a need for an additional tuple slot in executor contexts, which you haven't provided. > For example, the enhanced postgres_fdw fetches the result set of remote > join query, thus a tuple contains the fields come from both side. > In this case, what varno shall be suitable to put? That would be a scan situation, and the vars could reference the scan tuple slot. Which in fact was the implementation you were using, so how is CUSTOM_VAR adding anything? >> * Why is there more than one call site for add_scan_path_hook? I don't >> see the need for the calling macro with the randomly inconsistent name, >> either. > Where is the best place to do? Even though I cannot imagine the situation > to run sub-query or cte by extensions, its path is constructed during > set_rel_size(), so I had to put the hook for each set_xxxx_pathlist() > functions. Hm. We could still call the hook in set_rel_pathlist, if we were to get rid of the individual calls to set_cheapest and do that in one spot at the bottom of set_rel_pathlist (after the hook call). Calling set_cheapest in one place seems more consistent anyway. regards, tom lane
> Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > >> * Please drop the whole register_custom_provider/get_custom_provider > API. > > > One thing I was worrying about is how copyObject() and nodeToString() > > support set of function pointer tables around custom-scan node, > > however, you suggested to change the assumption here. So, I think these > functions become unnecessary. > > If we allow the extension to control copyObject behavior, it can do what > it likes with the function-struct pointer. I think the typical case would > be that it's a simple pointer to a never-copied static constant. But you > could imagine that it's a pointer to a struct embedded further down in the > custom path or plan node, if the extension wants different functions for > different plans. > > If we had to support stringToNode() for paths or plans, things would get > much more complicated, but we don't (and there are already lots of other > things that would be difficult for that). > I expected to include simple function pointers for copying and text-output as follows: typedef struct { Plan plan; : NodeCopy_function node_copy; NodeTextOut_function node_textout; }Custom; Then, sub-routine in copyfuncs.c shall be: static Custom * _copyCustom(const Custom *from) { return from->node_copy(from);} Can I share same image for this? It allows Custom node to have polymorphism on the node it enhanced. Sorry, I got little bit confused. Is the function-struct pointer you mentioned something different from usual function pointer? > > The reason why CustomScan is derived from Scan is, some of backend > > code wants to know rtindex of the relation to be referenced by this > CustomScan. > > The scanrelid of Scan is used in three points: nodeCustom.c, setrefs.c > > and explain.c. The usage in nodeCustom.c is just for service routines, > > and the usage in setrefs.c can be moved to the extension according to > your suggestion. > > We need to investigate the usage in explain.c; ExplainPreScanNode() > > walks around the nodes to collect relids referenced in this query. If > > we don't want to put a callback for this specific usage, it is a > > reasonable choice to show the backend the associated scanrelid of > CustomScan. > > I think we have to add another callback for that :-(. It's a pain since > it's such a trivial point; but the existing code cannot support a custom > node referencing more than one RTE, which seems possible for join or append > type cases. > It's more generic approach, I like this. Probably, it can kill the characteristic as Scan of CustomScan from the view of core backend. It shall perform as an opaque Plan node that may scan, join, sort or something, so more appropriate its name may be CustomPlan or simply Custom. > > Probably, the hunk around show_customscan_info() call can be entirely > > moved to the extension side. If so, I want ExplainNode() being an > > extern function, because it allows extensions to print underlying > plan-nodes. > > I haven't looked at what explain.c would have to expose to make this workable, > but yeah, we will probably have to export a few things. > OK, > > Are you saying the hard-wired portion in setrefs.c can be moved to the > > extension side? If fix_scan_expr() become extern function, I think it > > is feasible. > > My recollection is that fix_scan_expr() is a bit specialized. Not sure > exactly what we'd have to export there --- but we'd have to do it anyway. > What you had in the patch was a hook that could be called, but no way for > it to do what it would likely need to do. > It probably needs to be exported. It walks on the supplied node tree and eventually calls record_plan_function_dependency() for each functions being found. It should not be invented in the extension again. Anyway, my reworking on the patch will make clear which static functions need to be exposed. Please wait for a while. > >> * Get rid of the CUSTOM_VAR business, too (including custom_tupdesc). > > > In case of Var-node that references joined-relations, it shall be > > replaced to either INNER_VAR or OUTER_VAR according the location of > > underlying relations. It eventually makes ExecEvalScalarVar() to > > reference either ecxt_innertuple or ecxt_outertuple, however, it is > > problematic if we already consolidated tuples come from the both side > into one. > > So? If you did that, then you wouldn't have renumbered the Vars as > INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if it > is needed, then there would also be a need for an additional tuple slot > in executor contexts, which you haven't provided. > > > For example, the enhanced postgres_fdw fetches the result set of > > remote join query, thus a tuple contains the fields come from both side. > > In this case, what varno shall be suitable to put? > > That would be a scan situation, and the vars could reference the scan tuple > slot. Which in fact was the implementation you were using, so how is > CUSTOM_VAR adding anything? > Let me sort out the points. If custom-node performs as join node with two underlying relations thus it could retrieve two tuples, here is no matter because INNER/OUTER_VAR can reference individual tuple slot. Also, custom-node performs as scan node with one underlying relations, here is also no matter because all the Var nodes in the target-list references attributes of a particular relation. An confusing scenarios is that custom-node performs as scan node as an alternative of built-in join, thus Var-nodes in the target-list may reference multiple relations, however, it will have only one tupleslot as like remote-join in postgres_fdw doing. In this case, we may be able to use existing INNER/OUTER/INDEX var if we renumber the varattno and use an appropriate slot, instead of adding a special varno for this. I'd like to investigate it little more, but I'm inclined to conclude CUSTOM_VAR might not be necessary, as you suggested. > >> * Why is there more than one call site for add_scan_path_hook? I > >> don't see the need for the calling macro with the randomly > >> inconsistent name, either. > > > Where is the best place to do? Even though I cannot imagine the > > situation to run sub-query or cte by extensions, its path is > > constructed during set_rel_size(), so I had to put the hook for each > > set_xxxx_pathlist() functions. > > Hm. We could still call the hook in set_rel_pathlist, if we were to get > rid of the individual calls to set_cheapest and do that in one spot at the > bottom of set_rel_pathlist (after the hook call). Calling set_cheapest > in one place seems more consistent anyway. > OK, I'll try to move the set_cheapest() to set_rel_pathlist from the current positions; being distributed to several functions. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > I expected to include simple function pointers for copying and text-output > as follows: > typedef struct { > Plan plan; > : > NodeCopy_function node_copy; > NodeTextOut_function node_textout; > } Custom; I was thinking more like typedef struct CustomPathFuncs {const char *name; /* used for debugging purposes only */NodeCopy_function node_copy;NodeTextOut_functionnode_textout;... etc etc etc ... } CustomPathFuncs; typedef struct CustomPath {Path path;const CustomPathFuncs *funcs;... maybe a few more fields here, but not too darn many... } CustomPath; and similarly for CustomPlan. The advantage of this way is it's very cheap for (what I expect will be) the common case where an extension has a fixed set of support functions for its custom paths and plans. It just declares a static constant CustomPathFuncs struct, and puts a pointer to that into its paths. If an extension really needs to set the support functions on a per-object basis, it can do this: typdef struct MyCustomPath { CustomPath cpath; CustomPathFuncs funcs; ... more fields ... } MyCustomPath; and then initialization of a MyCustomPath would include mypath->cpath.funcs = &mypath->funcs;mypath->funcs.node_copy = MyCustomPathCopy;... etc etc ... In this case we're arguably wasting one pointer worth of space in the path, but considering the number of function pointers such a path will be carrying, I don't think that's much of an objection. >> So? If you did that, then you wouldn't have renumbered the Vars as >> INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if it >> is needed, then there would also be a need for an additional tuple slot >> in executor contexts, which you haven't provided. > For example, the enhanced postgres_fdw fetches the result set of > remote join query, thus a tuple contains the fields come from both side. > In this case, what varno shall be suitable to put? Not sure what we'd do for the general case, but CUSTOM_VAR isn't the solution. Consider for example a join where both tables supply columns named "id" --- if you put them both in one tupledesc then there's no non-kluge way to identify them. Possibly the route to a solution involves adding another plan-node callback function that ruleutils.c would use for printing Vars in custom join nodes. Or maybe we could let the Vars keep their original RTE numbers, though that would complicate life at execution time. Anyway, if we're going to punt on add_join_path_hook for the time being, this problem can probably be left to solve later. It won't arise for simple table-scan cases, nor for single-input plan nodes such as sorts. regards, tom lane
> I was thinking more like > > typedef struct CustomPathFuncs { > const char *name; /* used for debugging purposes only */ > NodeCopy_function node_copy; > NodeTextOut_function node_textout; > ... etc etc etc ... > } CustomPathFuncs; > > typedef struct CustomPath { > Path path; > const CustomPathFuncs *funcs; > ... maybe a few more fields here, but not too darn many ... > } CustomPath; > > and similarly for CustomPlan. > > The advantage of this way is it's very cheap for (what I expect will be) > the common case where an extension has a fixed set of support functions > for its custom paths and plans. It just declares a static constant > CustomPathFuncs struct, and puts a pointer to that into its paths. > > If an extension really needs to set the support functions on a per-object > basis, it can do this: > > typdef struct MyCustomPath { > CustomPath cpath; > CustomPathFuncs funcs; > ... more fields ... > } MyCustomPath; > > and then initialization of a MyCustomPath would include > > mypath->cpath.funcs = &mypath->funcs; > mypath->funcs.node_copy = MyCustomPathCopy; > ... etc etc ... > > In this case we're arguably wasting one pointer worth of space in the path, > but considering the number of function pointers such a path will be carrying, > I don't think that's much of an objection. > That is exactly same as my expectation, and no objection here. Thanks for your clarification. > >> So? If you did that, then you wouldn't have renumbered the Vars as > >> INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if > >> it is needed, then there would also be a need for an additional tuple > >> slot in executor contexts, which you haven't provided. > > > For example, the enhanced postgres_fdw fetches the result set of > > remote join query, thus a tuple contains the fields come from both side. > > In this case, what varno shall be suitable to put? > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't the solution. > Consider for example a join where both tables supply columns named "id" > --- if you put them both in one tupledesc then there's no non-kluge way > to identify them. > > Possibly the route to a solution involves adding another plan-node callback > function that ruleutils.c would use for printing Vars in custom join nodes. > Or maybe we could let the Vars keep their original RTE numbers, though that > would complicate life at execution time. > My preference is earlier one, because complication in execution time may make performance degradation. Once two tuples get joined in custom-node, only extension can know which relation is the origin of a particular attribute in the unified tuple. So, it seems to me reasonable extension has to provide a hint to resolve the Var naming. Probably, another callback that provides a translation table from a Var node that reference custom-plan but originated from either of subtree. (It looks like a translated_vars in prepunion.c) For example, let's assume here is a Var node with INDEX_VAR in the tlist of custom-plan. It eventually references ecxt_scantuple in the execution time, and this tuple-slot will keep a joined tuple being originated from two relations. If its varattno=9 came from the column varno=1/varatno=3, we like to print its original name. If we can have a translation table like translated_vars, it allows to translate an attribute number on the custom-plan into its original ones. Even it might be abuse of INDEX_VAR, it seems to me a good idea. Also, I don't like to re-define the meaning of INNER_VAR/OUTER_VAR because custom-plan may have both of left-/right-subtree, thus it makes sense to support a case when both of tupleslots are available. > Anyway, if we're going to punt on add_join_path_hook for the time being, > this problem can probably be left to solve later. It won't arise for simple > table-scan cases, nor for single-input plan nodes such as sorts. > Yes, it is a problem if number of input plans is larger then 1. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
Hello, The attached two patches are the revised custom-plan interface and example usage that implements existing MergeJoin on top of this interface. According to the discussion last week, I revised the portion where custom-node is expected to perform a particular kind of task, like scanning a relation, by putting polymorphism with a set of callbacks set by custom-plan provider. So, the core backend can handle this custom-plan node just an abstracted plan-node with no anticipation. Even though the subject of this message says "custom-scan", I'd like to name the interface "custom-plan" instead, because it became fully arbitrary of extension whether it scan on a particular relation. Definition of CustomXXXX data types were simplified: typedef struct CustomPath { Path path; const struct CustomPathMethods *methods; } CustomPath; typedef struct CustomPlan { Plan plan; const struct CustomPlanMethods *methods; } CustomPlan; typedef struct CustomPlanState { PlanState ps; const CustomPlanMethods *methods; } CustomPlanState; Each types have a base class and a set of function pointers that characterize the behavior of this custom-plan node. In usual use-cases, extension is expected to extend these classes to keep their private data fields needed to implement its own functionalities. Most of the methods are designed to work as a thin layer towards existing planner / executor functions, so custom-plan provides has to be responsible to implement its method to communicate with core backend as built-in ones doing. Regarding to the topic we discussed last week, * CUSTOM_VAR has gone. The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN command output (including column names being referenced) even if a custom-plan node replaced a join but has no underlying subplans on left/right subtrees. A typical situation like this is a remote-join implementation that I tried to extend postgres_fdw on top of the previous interface. It retrieves a flat result set of the remote join execution, thus has no subplan locally. On the other hand, EXPLAIN tries to find out "actual" Var node from the underlying subplan if a Var node has special varno (INNER/OUTER/INDEX). I put a special method to solve the problem. GetSpecialCustomVar method is called if a certain Var node of custom-plan has a special varno, then custom-plan provider can inform the core backend an expression node to be referenced by this Var node. It allows to solve the column name without recursive walking on the subtrees, so it enables a custom-plan node that replaces a part of plan-tree. This method is optional, so available to adopt existing way if custom-plan provider does not do anything special. * Functions to be exposed, from static declaration Right now, static functions are randomly exposed on demand. So, we need more investigation which functions are needed, and which others are not. According to my trial, the part-2 patch that is MergeJoin on top of the custom-plan interface, class of functions that recursively walk on subplan tree have to be exposed. Like, ExplainPreScanNode, create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan. In case when custom-plan performs like built-in Append node, it keeps a list of sub-plans in its private field, so the core backend cannot know existence of sub-plans, thus its unavailable to make subplan, unavailable to output EXPLAIN and so on. It does not make sense to reworking on the extension side again. Also, createplan.c has many useful functions to construct plan-node, however, most of them are static because all the built-in plan-node are constructed by the routines in this file, we didn't need to expose them to others. I think, functions in createplan.c being called by create_xxxx_plan() functions to construct plan-node should be exposed for extension's convenient. * Definition of add_join_path_hook I didn't have idea to improve the definition and location of this hook, so it is still on the tail of the add_paths_to_joinrel(). Its definition was a bit adjusted according to the feedback on the pgsql-hackers. I omitted the "mergeclause_list" and " semifactors" from the argument list. Indeed, these are specific to the built-in MergeJoin logic and easy to reproduce. * Hook location of add_scan_path_hook I moved the add_scan_path_hook and set_cheapest() into set_base_rel_pathlists() from various caller locations; set_xxxx_pathlist() functions typically. It enabled to consolidate the location to add custom-path for base relations. * CustomMergeJoin as a proof-of-concept The contrib module in the part-2 portion is, a merge-join implementation on top of custom-plan interface, even though 99% of its implementation is identical with built-in ones. Its purpose is to demonstrate a custom join logic can be implemented using custom-plan interface, even if custom-plan node has underlying sub-plans unlike previous my examples. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, March 07, 2014 3:09 AM > To: Kaigai Kouhei(海外 浩平) > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert > Haas; PgHacker; Peter Eisentraut > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > > I expected to include simple function pointers for copying and > > text-output as follows: > > > typedef struct { > > Plan plan; > > : > > NodeCopy_function node_copy; > > NodeTextOut_function node_textout; > > } Custom; > > I was thinking more like > > typedef struct CustomPathFuncs { > const char *name; /* used for debugging purposes only */ > NodeCopy_function node_copy; > NodeTextOut_function node_textout; > ... etc etc etc ... > } CustomPathFuncs; > > typedef struct CustomPath { > Path path; > const CustomPathFuncs *funcs; > ... maybe a few more fields here, but not too darn many ... > } CustomPath; > > and similarly for CustomPlan. > > The advantage of this way is it's very cheap for (what I expect will be) > the common case where an extension has a fixed set of support functions > for its custom paths and plans. It just declares a static constant > CustomPathFuncs struct, and puts a pointer to that into its paths. > > If an extension really needs to set the support functions on a per-object > basis, it can do this: > > typdef struct MyCustomPath { > CustomPath cpath; > CustomPathFuncs funcs; > ... more fields ... > } MyCustomPath; > > and then initialization of a MyCustomPath would include > > mypath->cpath.funcs = &mypath->funcs; > mypath->funcs.node_copy = MyCustomPathCopy; > ... etc etc ... > > In this case we're arguably wasting one pointer worth of space in the path, > but considering the number of function pointers such a path will be carrying, > I don't think that's much of an objection. > > >> So? If you did that, then you wouldn't have renumbered the Vars as > >> INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; if > >> it is needed, then there would also be a need for an additional tuple > >> slot in executor contexts, which you haven't provided. > > > For example, the enhanced postgres_fdw fetches the result set of > > remote join query, thus a tuple contains the fields come from both side. > > In this case, what varno shall be suitable to put? > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't the solution. > Consider for example a join where both tables supply columns named "id" > --- if you put them both in one tupledesc then there's no non-kluge way > to identify them. > > Possibly the route to a solution involves adding another plan-node callback > function that ruleutils.c would use for printing Vars in custom join nodes. > Or maybe we could let the Vars keep their original RTE numbers, though that > would complicate life at execution time. > > Anyway, if we're going to punt on add_join_path_hook for the time being, > this problem can probably be left to solve later. It won't arise for simple > table-scan cases, nor for single-input plan nodes such as sorts. > > regards, tom lane
Attachment
Hello, I adjusted the custom-plan interface patch little bit for the cache-only scan patch; that is a demonstration module for vacuum-page hook on top of the custom-plan interface. fix_scan_expr() looks to me useful for custom-plan providers that want to implement its own relation scan logic, even though they can implement it using fix_expr_common() being already exposed. Also, I removed the hardcoded portion from the nodeCustom.c although, it may make sense to provide a few template functions to be called by custom- plan providers, that performs usual common jobs like construction of expr- context, assignment of result-slot, open relations, and so on. I though the idea during implementation of BeginCustomPlan handler. (These template functions are not in the attached patch yet.) How about your opinion? The major portion of this patch is not changed from v10. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Wednesday, March 12, 2014 1:55 PM > To: Tom Lane > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert > Haas; PgHacker; Peter Eisentraut > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > Hello, > > The attached two patches are the revised custom-plan interface and example > usage that implements existing MergeJoin on top of this interface. > > According to the discussion last week, I revised the portion where > custom-node is expected to perform a particular kind of task, like scanning > a relation, by putting polymorphism with a set of callbacks set by > custom-plan provider. > So, the core backend can handle this custom-plan node just an abstracted > plan-node with no anticipation. > Even though the subject of this message says "custom-scan", I'd like to > name the interface "custom-plan" instead, because it became fully arbitrary > of extension whether it scan on a particular relation. > > Definition of CustomXXXX data types were simplified: > > typedef struct CustomPath > { > Path path; > const struct CustomPathMethods *methods; > } CustomPath; > > typedef struct CustomPlan > { > Plan plan; > const struct CustomPlanMethods *methods; > } CustomPlan; > > typedef struct CustomPlanState > { > PlanState ps; > const CustomPlanMethods *methods; > } CustomPlanState; > > Each types have a base class and a set of function pointers that characterize > the behavior of this custom-plan node. > In usual use-cases, extension is expected to extend these classes to keep > their private data fields needed to implement its own functionalities. > > Most of the methods are designed to work as a thin layer towards existing > planner / executor functions, so custom-plan provides has to be responsible > to implement its method to communicate with core backend as built-in ones > doing. > > Regarding to the topic we discussed last week, > > * CUSTOM_VAR has gone. > The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN command > output (including column names being referenced) even if a custom-plan node > replaced a join but has no underlying subplans on left/right subtrees. > A typical situation like this is a remote-join implementation that I tried > to extend postgres_fdw on top of the previous interface. > It retrieves a flat result set of the remote join execution, thus has no > subplan locally. On the other hand, EXPLAIN tries to find out "actual" Var > node from the underlying subplan if a Var node has special varno > (INNER/OUTER/INDEX). > I put a special method to solve the problem. GetSpecialCustomVar method > is called if a certain Var node of custom-plan has a special varno, then > custom-plan provider can inform the core backend an expression node to be > referenced by this Var node. > It allows to solve the column name without recursive walking on the subtrees, > so it enables a custom-plan node that replaces a part of plan-tree. > This method is optional, so available to adopt existing way if custom-plan > provider does not do anything special. > > > * Functions to be exposed, from static declaration > > Right now, static functions are randomly exposed on demand. > So, we need more investigation which functions are needed, and which others > are not. > According to my trial, the part-2 patch that is MergeJoin on top of the > custom-plan interface, class of functions that recursively walk on subplan > tree have to be exposed. Like, ExplainPreScanNode, create_plan_recurse, > set_plan_refs, fix_expr_common or finalize_plan. > In case when custom-plan performs like built-in Append node, it keeps a > list of sub-plans in its private field, so the core backend cannot know > existence of sub-plans, thus its unavailable to make subplan, unavailable > to output EXPLAIN and so on. > It does not make sense to reworking on the extension side again. > Also, createplan.c has many useful functions to construct plan-node, > however, most of them are static because all the built-in plan-node are > constructed by the routines in this file, we didn't need to expose them > to others. I think, functions in createplan.c being called by > create_xxxx_plan() functions to construct plan-node should be exposed for > extension's convenient. > > > * Definition of add_join_path_hook > > I didn't have idea to improve the definition and location of this hook, > so it is still on the tail of the add_paths_to_joinrel(). > Its definition was a bit adjusted according to the feedback on the > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors" > from the argument list. Indeed, these are specific to the built-in MergeJoin > logic and easy to reproduce. > > > * Hook location of add_scan_path_hook > > I moved the add_scan_path_hook and set_cheapest() into > set_base_rel_pathlists() from various caller locations; > set_xxxx_pathlist() functions typically. > It enabled to consolidate the location to add custom-path for base > relations. > > > * CustomMergeJoin as a proof-of-concept > > The contrib module in the part-2 portion is, a merge-join implementation > on top of custom-plan interface, even though 99% of its implementation is > identical with built-in ones. > Its purpose is to demonstrate a custom join logic can be implemented using > custom-plan interface, even if custom-plan node has underlying sub-plans > unlike previous my examples. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > Sent: Friday, March 07, 2014 3:09 AM > > To: Kaigai Kouhei(海外 浩平) > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; > > Robert Haas; PgHacker; Peter Eisentraut > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > > > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > > > I expected to include simple function pointers for copying and > > > text-output as follows: > > > > > typedef struct { > > > Plan plan; > > > : > > > NodeCopy_function node_copy; > > > NodeTextOut_function node_textout; > > > } Custom; > > > > I was thinking more like > > > > typedef struct CustomPathFuncs { > > const char *name; /* used for debugging purposes only */ > > NodeCopy_function node_copy; > > NodeTextOut_function node_textout; > > ... etc etc etc ... > > } CustomPathFuncs; > > > > typedef struct CustomPath { > > Path path; > > const CustomPathFuncs *funcs; > > ... maybe a few more fields here, but not too darn many ... > > } CustomPath; > > > > and similarly for CustomPlan. > > > > The advantage of this way is it's very cheap for (what I expect will > > be) the common case where an extension has a fixed set of support > > functions for its custom paths and plans. It just declares a static > > constant CustomPathFuncs struct, and puts a pointer to that into its paths. > > > > If an extension really needs to set the support functions on a > > per-object basis, it can do this: > > > > typdef struct MyCustomPath { > > CustomPath cpath; > > CustomPathFuncs funcs; > > ... more fields ... > > } MyCustomPath; > > > > and then initialization of a MyCustomPath would include > > > > mypath->cpath.funcs = &mypath->funcs; > > mypath->funcs.node_copy = MyCustomPathCopy; > > ... etc etc ... > > > > In this case we're arguably wasting one pointer worth of space in the > > path, but considering the number of function pointers such a path will > > be carrying, I don't think that's much of an objection. > > > > >> So? If you did that, then you wouldn't have renumbered the Vars as > > >> INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at all; > > >> if it is needed, then there would also be a need for an additional > > >> tuple slot in executor contexts, which you haven't provided. > > > > > For example, the enhanced postgres_fdw fetches the result set of > > > remote join query, thus a tuple contains the fields come from both side. > > > In this case, what varno shall be suitable to put? > > > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't the > solution. > > Consider for example a join where both tables supply columns named "id" > > --- if you put them both in one tupledesc then there's no non-kluge > > way to identify them. > > > > Possibly the route to a solution involves adding another plan-node > > callback function that ruleutils.c would use for printing Vars in custom > join nodes. > > Or maybe we could let the Vars keep their original RTE numbers, though > > that would complicate life at execution time. > > > > Anyway, if we're going to punt on add_join_path_hook for the time > > being, this problem can probably be left to solve later. It won't > > arise for simple table-scan cases, nor for single-input plan nodes such > as sorts. > > > > regards, tom lane
Attachment
Hello, > > * Definition of add_join_path_hook > > > > I didn't have idea to improve the definition and location of this > > hook, so it is still on the tail of the add_paths_to_joinrel(). > > Its definition was a bit adjusted according to the feedback on the > > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors" > > from the argument list. Indeed, these are specific to the built-in > > MergeJoin logic and easy to reproduce. > > > After the submission, I'm still investigating better way to put a hook to add custom join paths. Regarding to the comment from Tom: | * The API for add_join_path_hook seems overcomplex, as well as too full | of implementation details that should remain private to joinpath.c. | I particularly object to passing the mergeclause lists, which seem unlikely | to be of interest for non-mergejoin plan types anyway. | More generally, it seems likely that this hook is at the wrong level of | abstraction; it forces the hook function to concern itself with a lot of | stuff about join legality and parameterization (which I note your patch3 | code fails to do; but that's not an optional thing). | The earlier half was already done. My trouble is the later portion. The overall jobs of add_join_path_hook are below: 1. construction of parameterized path information; being saved at param_source_rel and extra_lateral_rels. 2. Try to add mergejoin paths with underlying Sort node 3. Try to add mergejoin/nestloop paths without underlying Sort node 4. Try to add hashjoin paths It seems to me the check for join legality and parameterization are built within individual routines for each join algorithm. (what does the "join legality check" actually mean?) Probably, it makes sense to provide a common utility function to be called back from the extension if we can find out a common part for all the join logics, however, I don't have clear idea to cut off the common portion. What's jobs can be done independent from the join algorithm?? I'd like to see ideas around this issue. Of course, I also think it is still an option to handle it by extension on the initial version. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > Sent: Monday, March 17, 2014 9:29 AM > To: Kaigai Kouhei(海外 浩平); Tom Lane > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert > Haas; PgHacker; Peter Eisentraut > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > Hello, > > I adjusted the custom-plan interface patch little bit for the cache-only > scan patch; that is a demonstration module for vacuum-page hook on top of > the custom-plan interface. > > fix_scan_expr() looks to me useful for custom-plan providers that want to > implement its own relation scan logic, even though they can implement it > using fix_expr_common() being already exposed. > > Also, I removed the hardcoded portion from the nodeCustom.c although, it > may make sense to provide a few template functions to be called by custom- > plan providers, that performs usual common jobs like construction of expr- > context, assignment of result-slot, open relations, and so on. > I though the idea during implementation of BeginCustomPlan handler. > (These template functions are not in the attached patch yet.) How about > your opinion? > > The major portion of this patch is not changed from v10. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > > Sent: Wednesday, March 12, 2014 1:55 PM > > To: Tom Lane > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; > > Robert Haas; PgHacker; Peter Eisentraut > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > > > Hello, > > > > The attached two patches are the revised custom-plan interface and > > example usage that implements existing MergeJoin on top of this interface. > > > > According to the discussion last week, I revised the portion where > > custom-node is expected to perform a particular kind of task, like > > scanning a relation, by putting polymorphism with a set of callbacks > > set by custom-plan provider. > > So, the core backend can handle this custom-plan node just an > > abstracted plan-node with no anticipation. > > Even though the subject of this message says "custom-scan", I'd like > > to name the interface "custom-plan" instead, because it became fully > > arbitrary of extension whether it scan on a particular relation. > > > > Definition of CustomXXXX data types were simplified: > > > > typedef struct CustomPath > > { > > Path path; > > const struct CustomPathMethods *methods; > > } CustomPath; > > > > typedef struct CustomPlan > > { > > Plan plan; > > const struct CustomPlanMethods *methods; > > } CustomPlan; > > > > typedef struct CustomPlanState > > { > > PlanState ps; > > const CustomPlanMethods *methods; > > } CustomPlanState; > > > > Each types have a base class and a set of function pointers that > > characterize the behavior of this custom-plan node. > > In usual use-cases, extension is expected to extend these classes to > > keep their private data fields needed to implement its own > functionalities. > > > > Most of the methods are designed to work as a thin layer towards > > existing planner / executor functions, so custom-plan provides has to > > be responsible to implement its method to communicate with core > > backend as built-in ones doing. > > > > Regarding to the topic we discussed last week, > > > > * CUSTOM_VAR has gone. > > The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN > > command output (including column names being referenced) even if a > > custom-plan node replaced a join but has no underlying subplans on > left/right subtrees. > > A typical situation like this is a remote-join implementation that I > > tried to extend postgres_fdw on top of the previous interface. > > It retrieves a flat result set of the remote join execution, thus has > > no subplan locally. On the other hand, EXPLAIN tries to find out > > "actual" Var node from the underlying subplan if a Var node has > > special varno (INNER/OUTER/INDEX). > > I put a special method to solve the problem. GetSpecialCustomVar > > method is called if a certain Var node of custom-plan has a special > > varno, then custom-plan provider can inform the core backend an > > expression node to be referenced by this Var node. > > It allows to solve the column name without recursive walking on the > > subtrees, so it enables a custom-plan node that replaces a part of > plan-tree. > > This method is optional, so available to adopt existing way if > > custom-plan provider does not do anything special. > > > > > > * Functions to be exposed, from static declaration > > > > Right now, static functions are randomly exposed on demand. > > So, we need more investigation which functions are needed, and which > > others are not. > > According to my trial, the part-2 patch that is MergeJoin on top of > > the custom-plan interface, class of functions that recursively walk on > > subplan tree have to be exposed. Like, ExplainPreScanNode, > > create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan. > > In case when custom-plan performs like built-in Append node, it keeps > > a list of sub-plans in its private field, so the core backend cannot > > know existence of sub-plans, thus its unavailable to make subplan, > > unavailable to output EXPLAIN and so on. > > It does not make sense to reworking on the extension side again. > > Also, createplan.c has many useful functions to construct plan-node, > > however, most of them are static because all the built-in plan-node > > are constructed by the routines in this file, we didn't need to expose > > them to others. I think, functions in createplan.c being called by > > create_xxxx_plan() functions to construct plan-node should be exposed > > for extension's convenient. > > > > > > * Definition of add_join_path_hook > > > > I didn't have idea to improve the definition and location of this > > hook, so it is still on the tail of the add_paths_to_joinrel(). > > Its definition was a bit adjusted according to the feedback on the > > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors" > > from the argument list. Indeed, these are specific to the built-in > > MergeJoin logic and easy to reproduce. > > > > > > * Hook location of add_scan_path_hook > > > > I moved the add_scan_path_hook and set_cheapest() into > > set_base_rel_pathlists() from various caller locations; > > set_xxxx_pathlist() functions typically. > > It enabled to consolidate the location to add custom-path for base > > relations. > > > > > > * CustomMergeJoin as a proof-of-concept > > > > The contrib module in the part-2 portion is, a merge-join > > implementation on top of custom-plan interface, even though 99% of its > > implementation is identical with built-in ones. > > Its purpose is to demonstrate a custom join logic can be implemented > > using custom-plan interface, even if custom-plan node has underlying > > sub-plans unlike previous my examples. > > > > Thanks, > > -- > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > > <kaigai@ak.jp.nec.com> > > > > > > > -----Original Message----- > > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > > Sent: Friday, March 07, 2014 3:09 AM > > > To: Kaigai Kouhei(海外 浩平) > > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; > > > Robert Haas; PgHacker; Peter Eisentraut > > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > > > > > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > > > > I expected to include simple function pointers for copying and > > > > text-output as follows: > > > > > > > typedef struct { > > > > Plan plan; > > > > : > > > > NodeCopy_function node_copy; > > > > NodeTextOut_function node_textout; > > > > } Custom; > > > > > > I was thinking more like > > > > > > typedef struct CustomPathFuncs { > > > const char *name; /* used for debugging purposes only */ > > > NodeCopy_function node_copy; > > > NodeTextOut_function node_textout; > > > ... etc etc etc ... > > > } CustomPathFuncs; > > > > > > typedef struct CustomPath { > > > Path path; > > > const CustomPathFuncs *funcs; > > > ... maybe a few more fields here, but not too darn many ... > > > } CustomPath; > > > > > > and similarly for CustomPlan. > > > > > > The advantage of this way is it's very cheap for (what I expect will > > > be) the common case where an extension has a fixed set of support > > > functions for its custom paths and plans. It just declares a static > > > constant CustomPathFuncs struct, and puts a pointer to that into its > paths. > > > > > > If an extension really needs to set the support functions on a > > > per-object basis, it can do this: > > > > > > typdef struct MyCustomPath { > > > CustomPath cpath; > > > CustomPathFuncs funcs; > > > ... more fields ... > > > } MyCustomPath; > > > > > > and then initialization of a MyCustomPath would include > > > > > > mypath->cpath.funcs = &mypath->funcs; > > > mypath->funcs.node_copy = MyCustomPathCopy; > > > ... etc etc ... > > > > > > In this case we're arguably wasting one pointer worth of space in > > > the path, but considering the number of function pointers such a > > > path will be carrying, I don't think that's much of an objection. > > > > > > >> So? If you did that, then you wouldn't have renumbered the Vars > > > >> as INNER/OUTER. I don't believe that CUSTOM_VAR is necessary at > > > >> all; if it is needed, then there would also be a need for an > > > >> additional tuple slot in executor contexts, which you haven't > provided. > > > > > > > For example, the enhanced postgres_fdw fetches the result set of > > > > remote join query, thus a tuple contains the fields come from both > side. > > > > In this case, what varno shall be suitable to put? > > > > > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't the > > solution. > > > Consider for example a join where both tables supply columns named "id" > > > --- if you put them both in one tupledesc then there's no non-kluge > > > way to identify them. > > > > > > Possibly the route to a solution involves adding another plan-node > > > callback function that ruleutils.c would use for printing Vars in > > > custom > > join nodes. > > > Or maybe we could let the Vars keep their original RTE numbers, > > > though that would complicate life at execution time. > > > > > > Anyway, if we're going to punt on add_join_path_hook for the time > > > being, this problem can probably be left to solve later. It won't > > > arise for simple table-scan cases, nor for single-input plan nodes > > > such > > as sorts. > > > > > > regards, tom lane
Hello, Because of patch conflict towards the latest master branch, I rebased the custom-plan interface patches; no functional difference from the v11. Brief summary of the current approach that has been revised from my original submission through the discussion on pgsql-hackers: The plannode was renamed to CustomPlan, instead of CustomScan, because it dropped all the hardcoded portion that assumes the custom-node shall perform as alternative scan or join method, because it prevents this custom-node to perform as other stuff; like sort or append potentially. According to the suggestion by Tom, I put a structure that contains several function pointers on the new CustomPlan node, and extension will allocate an object that extends CustomPlan node. It looks like polymorphism in object oriented programming language. The core backend knows abstracted set of methods defined in the tables of function pointers, and extension can implement its own logic on the callback, using private state on the extended object. Some issues are still under discussion: * Design of add_join_path_hook Tom suggested that core backend can support to check join legality and parameterization stuff, however, it looks to me existing code handles join legality checks within the function of individual join logic. So, I'm not certain whether we can have a common legality check for all the (potential) alternative join implementation. The part-2 is a demonstration module that implemented existing merge- join logic, but on top of this interface. * Functions to be exposed for extensions Some utility functions useful to implement plan node are declared as static functions, because most of the stuff are implemented within createplan.c, setrefs.c and so on, thus these were not needed to expose other stuff. However, extension will become a caller for these functions due to custom-plan interface, even though they are implemented out of the core. In my investigation, class of functions that recursively walk on subplan tree have to be exposed, like ExplainPreScanNode, create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan. Do we have other criteria to determine what function shall be exposed? Any help and feedback are welcome. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com> > -----Original Message----- > From: Kaigai Kouhei(海外 浩平) > Sent: Thursday, March 20, 2014 10:46 AM > To: Kaigai Kouhei(海外 浩平); Tom Lane > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; Robert > Haas; PgHacker; Peter Eisentraut > Subject: RE: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > Hello, > > > > * Definition of add_join_path_hook > > > > > > I didn't have idea to improve the definition and location of this > > > hook, so it is still on the tail of the add_paths_to_joinrel(). > > > Its definition was a bit adjusted according to the feedback on the > > > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors" > > > from the argument list. Indeed, these are specific to the built-in > > > MergeJoin logic and easy to reproduce. > > > > > > After the submission, I'm still investigating better way to put a hook to > add custom join paths. > > Regarding to the comment from Tom: > | * The API for add_join_path_hook seems overcomplex, as well as too > | full of implementation details that should remain private to joinpath.c. > | I particularly object to passing the mergeclause lists, which seem > | unlikely to be of interest for non-mergejoin plan types anyway. > | More generally, it seems likely that this hook is at the wrong level > | of abstraction; it forces the hook function to concern itself with a > | lot of stuff about join legality and parameterization (which I note > | your patch3 code fails to do; but that's not an optional thing). > | > The earlier half was already done. My trouble is the later portion. > > The overall jobs of add_join_path_hook are below: > 1. construction of parameterized path information; being saved at > param_source_rel and extra_lateral_rels. > 2. Try to add mergejoin paths with underlying Sort node 3. Try to add > mergejoin/nestloop paths without underlying Sort node 4. Try to add hashjoin > paths > > It seems to me the check for join legality and parameterization are built > within individual routines for each join algorithm. > (what does the "join legality check" actually mean?) > > Probably, it makes sense to provide a common utility function to be called > back from the extension if we can find out a common part for all the join > logics, however, I don't have clear idea to cut off the common portion. > What's jobs can be done independent from the join algorithm?? > > I'd like to see ideas around this issue. Of course, I also think it is still > an option to handle it by extension on the initial version. > > Thanks, > -- > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > <kaigai@ak.jp.nec.com> > > > > -----Original Message----- > > From: pgsql-hackers-owner@postgresql.org > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei Kaigai > > Sent: Monday, March 17, 2014 9:29 AM > > To: Kaigai Kouhei(海外 浩平); Tom Lane > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; > > Robert Haas; PgHacker; Peter Eisentraut > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > > > Hello, > > > > I adjusted the custom-plan interface patch little bit for the > > cache-only scan patch; that is a demonstration module for vacuum-page > > hook on top of the custom-plan interface. > > > > fix_scan_expr() looks to me useful for custom-plan providers that want > > to implement its own relation scan logic, even though they can > > implement it using fix_expr_common() being already exposed. > > > > Also, I removed the hardcoded portion from the nodeCustom.c although, > > it may make sense to provide a few template functions to be called by > > custom- plan providers, that performs usual common jobs like > > construction of expr- context, assignment of result-slot, open relations, > and so on. > > I though the idea during implementation of BeginCustomPlan handler. > > (These template functions are not in the attached patch yet.) How > > about your opinion? > > > > The major portion of this patch is not changed from v10. > > > > Thanks, > > -- > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > > <kaigai@ak.jp.nec.com> > > > > > > > -----Original Message----- > > > From: pgsql-hackers-owner@postgresql.org > > > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Kouhei > > > Kaigai > > > Sent: Wednesday, March 12, 2014 1:55 PM > > > To: Tom Lane > > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; > > > Robert Haas; PgHacker; Peter Eisentraut > > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > > > > > Hello, > > > > > > The attached two patches are the revised custom-plan interface and > > > example usage that implements existing MergeJoin on top of this > interface. > > > > > > According to the discussion last week, I revised the portion where > > > custom-node is expected to perform a particular kind of task, like > > > scanning a relation, by putting polymorphism with a set of callbacks > > > set by custom-plan provider. > > > So, the core backend can handle this custom-plan node just an > > > abstracted plan-node with no anticipation. > > > Even though the subject of this message says "custom-scan", I'd like > > > to name the interface "custom-plan" instead, because it became fully > > > arbitrary of extension whether it scan on a particular relation. > > > > > > Definition of CustomXXXX data types were simplified: > > > > > > typedef struct CustomPath > > > { > > > Path path; > > > const struct CustomPathMethods *methods; > > > } CustomPath; > > > > > > typedef struct CustomPlan > > > { > > > Plan plan; > > > const struct CustomPlanMethods *methods; > > > } CustomPlan; > > > > > > typedef struct CustomPlanState > > > { > > > PlanState ps; > > > const CustomPlanMethods *methods; > > > } CustomPlanState; > > > > > > Each types have a base class and a set of function pointers that > > > characterize the behavior of this custom-plan node. > > > In usual use-cases, extension is expected to extend these classes to > > > keep their private data fields needed to implement its own > > functionalities. > > > > > > Most of the methods are designed to work as a thin layer towards > > > existing planner / executor functions, so custom-plan provides has > > > to be responsible to implement its method to communicate with core > > > backend as built-in ones doing. > > > > > > Regarding to the topic we discussed last week, > > > > > > * CUSTOM_VAR has gone. > > > The reason why CUSTOM_VAR was needed is, we have to handle EXPLAIN > > > command output (including column names being referenced) even if a > > > custom-plan node replaced a join but has no underlying subplans on > > left/right subtrees. > > > A typical situation like this is a remote-join implementation that I > > > tried to extend postgres_fdw on top of the previous interface. > > > It retrieves a flat result set of the remote join execution, thus > > > has no subplan locally. On the other hand, EXPLAIN tries to find out > > > "actual" Var node from the underlying subplan if a Var node has > > > special varno (INNER/OUTER/INDEX). > > > I put a special method to solve the problem. GetSpecialCustomVar > > > method is called if a certain Var node of custom-plan has a special > > > varno, then custom-plan provider can inform the core backend an > > > expression node to be referenced by this Var node. > > > It allows to solve the column name without recursive walking on the > > > subtrees, so it enables a custom-plan node that replaces a part of > > plan-tree. > > > This method is optional, so available to adopt existing way if > > > custom-plan provider does not do anything special. > > > > > > > > > * Functions to be exposed, from static declaration > > > > > > Right now, static functions are randomly exposed on demand. > > > So, we need more investigation which functions are needed, and which > > > others are not. > > > According to my trial, the part-2 patch that is MergeJoin on top of > > > the custom-plan interface, class of functions that recursively walk > > > on subplan tree have to be exposed. Like, ExplainPreScanNode, > > > create_plan_recurse, set_plan_refs, fix_expr_common or finalize_plan. > > > In case when custom-plan performs like built-in Append node, it > > > keeps a list of sub-plans in its private field, so the core backend > > > cannot know existence of sub-plans, thus its unavailable to make > > > subplan, unavailable to output EXPLAIN and so on. > > > It does not make sense to reworking on the extension side again. > > > Also, createplan.c has many useful functions to construct plan-node, > > > however, most of them are static because all the built-in plan-node > > > are constructed by the routines in this file, we didn't need to > > > expose them to others. I think, functions in createplan.c being > > > called by > > > create_xxxx_plan() functions to construct plan-node should be > > > exposed for extension's convenient. > > > > > > > > > * Definition of add_join_path_hook > > > > > > I didn't have idea to improve the definition and location of this > > > hook, so it is still on the tail of the add_paths_to_joinrel(). > > > Its definition was a bit adjusted according to the feedback on the > > > pgsql-hackers. I omitted the "mergeclause_list" and " semifactors" > > > from the argument list. Indeed, these are specific to the built-in > > > MergeJoin logic and easy to reproduce. > > > > > > > > > * Hook location of add_scan_path_hook > > > > > > I moved the add_scan_path_hook and set_cheapest() into > > > set_base_rel_pathlists() from various caller locations; > > > set_xxxx_pathlist() functions typically. > > > It enabled to consolidate the location to add custom-path for base > > > relations. > > > > > > > > > * CustomMergeJoin as a proof-of-concept > > > > > > The contrib module in the part-2 portion is, a merge-join > > > implementation on top of custom-plan interface, even though 99% of > > > its implementation is identical with built-in ones. > > > Its purpose is to demonstrate a custom join logic can be implemented > > > using custom-plan interface, even if custom-plan node has underlying > > > sub-plans unlike previous my examples. > > > > > > Thanks, > > > -- > > > NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei > > > <kaigai@ak.jp.nec.com> > > > > > > > > > > -----Original Message----- > > > > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > > > Sent: Friday, March 07, 2014 3:09 AM > > > > To: Kaigai Kouhei(海外 浩平) > > > > Cc: Kohei KaiGai; Stephen Frost; Shigeru Hanada; Jim Mlodgenski; > > > > Robert Haas; PgHacker; Peter Eisentraut > > > > Subject: Re: Custom Scan APIs (Re: [HACKERS] Custom Plan node) > > > > > > > > Kouhei Kaigai <kaigai@ak.jp.nec.com> writes: > > > > > I expected to include simple function pointers for copying and > > > > > text-output as follows: > > > > > > > > > typedef struct { > > > > > Plan plan; > > > > > : > > > > > NodeCopy_function node_copy; > > > > > NodeTextOut_function node_textout; > > > > > } Custom; > > > > > > > > I was thinking more like > > > > > > > > typedef struct CustomPathFuncs { > > > > const char *name; /* used for debugging purposes only */ > > > > NodeCopy_function node_copy; > > > > NodeTextOut_function node_textout; > > > > ... etc etc etc ... > > > > } CustomPathFuncs; > > > > > > > > typedef struct CustomPath { > > > > Path path; > > > > const CustomPathFuncs *funcs; > > > > ... maybe a few more fields here, but not too darn many ... > > > > } CustomPath; > > > > > > > > and similarly for CustomPlan. > > > > > > > > The advantage of this way is it's very cheap for (what I expect > > > > will > > > > be) the common case where an extension has a fixed set of support > > > > functions for its custom paths and plans. It just declares a > > > > static constant CustomPathFuncs struct, and puts a pointer to that > > > > into its > > paths. > > > > > > > > If an extension really needs to set the support functions on a > > > > per-object basis, it can do this: > > > > > > > > typdef struct MyCustomPath { > > > > CustomPath cpath; > > > > CustomPathFuncs funcs; > > > > ... more fields ... > > > > } MyCustomPath; > > > > > > > > and then initialization of a MyCustomPath would include > > > > > > > > mypath->cpath.funcs = &mypath->funcs; > > > > mypath->funcs.node_copy = MyCustomPathCopy; > > > > ... etc etc ... > > > > > > > > In this case we're arguably wasting one pointer worth of space in > > > > the path, but considering the number of function pointers such a > > > > path will be carrying, I don't think that's much of an objection. > > > > > > > > >> So? If you did that, then you wouldn't have renumbered the > > > > >> Vars as INNER/OUTER. I don't believe that CUSTOM_VAR is > > > > >> necessary at all; if it is needed, then there would also be a > > > > >> need for an additional tuple slot in executor contexts, which > > > > >> you haven't > > provided. > > > > > > > > > For example, the enhanced postgres_fdw fetches the result set of > > > > > remote join query, thus a tuple contains the fields come from > > > > > both > > side. > > > > > In this case, what varno shall be suitable to put? > > > > > > > > Not sure what we'd do for the general case, but CUSTOM_VAR isn't > > > > the > > > solution. > > > > Consider for example a join where both tables supply columns named > "id" > > > > --- if you put them both in one tupledesc then there's no > > > > non-kluge way to identify them. > > > > > > > > Possibly the route to a solution involves adding another plan-node > > > > callback function that ruleutils.c would use for printing Vars in > > > > custom > > > join nodes. > > > > Or maybe we could let the Vars keep their original RTE numbers, > > > > though that would complicate life at execution time. > > > > > > > > Anyway, if we're going to punt on add_join_path_hook for the time > > > > being, this problem can probably be left to solve later. It won't > > > > arise for simple table-scan cases, nor for single-input plan nodes > > > > such > > > as sorts. > > > > > > > > regards, tom lane
Attachment
On 24 March 2014 10:25, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > Brief summary of the current approach that has been revised from my > original submission through the discussion on pgsql-hackers: > > The plannode was renamed to CustomPlan, instead of CustomScan, because > it dropped all the hardcoded portion that assumes the custom-node shall > perform as alternative scan or join method, because it prevents this > custom-node to perform as other stuff; like sort or append potentially. > According to the suggestion by Tom, I put a structure that contains > several function pointers on the new CustomPlan node, and extension will > allocate an object that extends CustomPlan node. > It looks like polymorphism in object oriented programming language. > The core backend knows abstracted set of methods defined in the > tables of function pointers, and extension can implement its own logic > on the callback, using private state on the extended object. I just wanted to add some review comments here. I also apologise for not reviewing this earlier; I had misunderstood the maturity of the patch and had assumed it was a request for comments/WIP. Overall, I very much support the concept of providing for alternate scans. I like the placement of calls in the optimizer and we'll be able to do much with that. Other comments in order that I consider them important. * There is no declarative structure for this at all. I was expecting to see a way to declare that a specific table might have an alternate scan path, but we just call the plugin always and it has to separately make a cache lookup to see if anything extra is needed. The Index AM allows us to perform scans, yet indexes are very clearly declared and easily and clearly identifiable. We need the same thing for alternate plans. * There is no mechanism at all for maintaining other data structures. Are we supposed to use the Index AM? Triggers? Or? The lack of clarity there is disturbing, though I could be simply missing something big and obvious. * There is no catalog support. Complex APIs in Postgres typically have a structure like pg_am which allows the features to be clearly identified. I'd be worried about our ability to keep track of so many calls in such pivotal places without that. I want to be able to know what a plugin is doing, especially when it will likely come in binary form. I don't see an easy way to have plugins partially override each other or work together. What happens when I want to use Mr.X's clever new join plugin at the same time as Mr.Y's GPU accelerator? * How do we control security? What stops the Custom Scan API from overriding privileges? Shouldn't the alternate data structures be recognised as objects so we can grant privileges? Or do we simply say if an alternate data structure is linked to a heap then has implied privileges. It would be a shame to implement better security in one patch and then ignore it in another (from the same author). All of the above I can let pass in this release, but in the longer term we need to look for more structure around these ideas so we can manage and control what happens. The way this is now is quite raw - suitable for R&D but not for longer term production usage by a wider audience, IMHO. I wouldn't like to make commitments about the longevity of this API either; if we accept it, it should have a big "may change radically" sign hanging on it. Having said that, I am interested in progress here and I accept that things will look like this at this stage of the ideas process, so these are not things to cause delay. Some things I would like to see change on in this release are... * It's not clear to me when we load/create the alternate data structures. That can't happen at _init time. I was expecting this to look like an infrastructure for unlogged indexes, but it doesn't look like that either. * The name Custom makes me nervous. It sounds too generic, as if the design or objectives for this is are a little unclear. AlternateScan sounds like a better name since its clearer that we are scanning an alternate data structure rather than the main heap. * The prune hook makes me feel very uneasy. It seems weirdly specific implementation detail, made stranger by the otherwise lack of data maintenance API calls. Calling that for every dirty page sounds like an issue and my patch rejection indicator is flashing red around that. Two additional use cases I will be looking to explore will be * Code to make Mat Views recognised as alternate scan targets * Code to allow queries access to sampled data rather the fully detailed data, if the result would be within acceptable tolerance for user -- Simon Riggs http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Simon Riggs <simon@2ndQuadrant.com> writes: > [ assorted comments about custom-scan patch, but particularly ] > * The prune hook makes me feel very uneasy. It seems weirdly specific > implementation detail, made stranger by the otherwise lack of data > maintenance API calls. Calling that for every dirty page sounds like > an issue and my patch rejection indicator is flashing red around that. Yeah. After a fast review of the custom-scan and cache-scan patches, it seems to me that my original fears are largely confirmed: the custom scan patch is not going to be sufficient to allow development of any truly new plan type. Yeah, you can plug in some new execution node types, but actually doing anything interesting is going to require patching other parts of the system. Are we going to say to all comers, "sure, we'll put a hook call anywhere you like, just ask"? I can't see this as being the way to go. Another way of describing the problem is that it's not clear where the API boundaries are for potential users of a custom-scan feature. (Simon said several things that are closely related to this point.) One thing I don't like at all about the patch is its willingness to turn anything whatsoever into a publicly exported function, which basically says that the design attitude is there *are* no boundaries. But that's not going to lead to anything maintainable. We're certainly not going to want to guarantee that these suddenly-exported functions will all now have stable APIs forevermore. Overall I concur with Simon's conclusion that this might be of interest for R&D purposes, but it's hard to see anyone wanting to support a production feature built on this. It would be only marginally less painful than supporting a patch that just adds the equivalent code to the backend in the traditional way. So I'm feeling that this was kind of a dead end. It was worth doing the legwork to see if this sort of approach could be useful, but the answer seems like "no". regards, tom lane
> On 24 March 2014 10:25, Kouhei Kaigai <kaigai@ak.jp.nec.com> wrote: > > > Brief summary of the current approach that has been revised from my > > original submission through the discussion on pgsql-hackers: > > > > The plannode was renamed to CustomPlan, instead of CustomScan, because > > it dropped all the hardcoded portion that assumes the custom-node > > shall perform as alternative scan or join method, because it prevents > > this custom-node to perform as other stuff; like sort or append > potentially. > > According to the suggestion by Tom, I put a structure that contains > > several function pointers on the new CustomPlan node, and extension > > will allocate an object that extends CustomPlan node. > > It looks like polymorphism in object oriented programming language. > > The core backend knows abstracted set of methods defined in the tables > > of function pointers, and extension can implement its own logic on the > > callback, using private state on the extended object. > > I just wanted to add some review comments here. I also apologise for not > reviewing this earlier; I had misunderstood the maturity of the patch and > had assumed it was a request for comments/WIP. > Thanks for your interest and many comments. > Overall, I very much support the concept of providing for alternate scans. > I like the placement of calls in the optimizer and we'll be able to do much > with that. Other comments in order that I consider them important. > > * There is no declarative structure for this at all. I was expecting to > see a way to declare that a specific table might have an alternate scan > path, but we just call the plugin always and it has to separately make a > cache lookup to see if anything extra is needed. The Index AM allows us > to perform scans, yet indexes are very clearly declared and easily and > clearly identifiable. We need the same thing for alternate plans. > > * There is no mechanism at all for maintaining other data structures. > Are we supposed to use the Index AM? Triggers? Or? The lack of clarity there > is disturbing, though I could be simply missing something big and obvious. > > * There is no catalog support. Complex APIs in Postgres typically have a > structure like pg_am which allows the features to be clearly identified. > I'd be worried about our ability to keep track of so many calls in such > pivotal places without that. I want to be able to know what a plugin is > doing, especially when it will likely come in binary form. I don't see an > easy way to have plugins partially override each other or work together. > What happens when I want to use Mr.X's clever new join plugin at the same > time as Mr.Y's GPU accelerator? > It was a choice on implementation. I just followed usual PG's hook manner; that expects loaded extension saves the original function pointer and has secondary call towards the function on its invocation. Thus, it needs to walk on chain of extensions if multiple custom providers are loaded. Even though I initially chose this design, it is an option to have catalog support to track registered custom-scan providers and its metadata; what function generate paths, what flags are turned on or what kind of relations are supported...etc. Probably, optimizer skip some extensions that don't support the target relation obviously. > * How do we control security? What stops the Custom Scan API from overriding > privileges? Shouldn't the alternate data structures be recognised as > objects so we can grant privileges? Or do we simply say if an alternate > data structure is linked to a heap then has implied privileges. It would > be a shame to implement better security in one patch and then ignore it > in another (from the same author). > In general, we have no mechanism to prevent overriding privilege mechanism by c-binary extensions. Extension can override (existing) hooks and modify requiredPerms bits of RangeTblEntry; that eventually cause privilege bypass. But it is neutral for custom-scan API itself. Even though we implements an alternative scan logic on the API, the core backend still checks required privileges on ExecCheckRTPerms being called on the head of executor (unless author of extension does not do something strange). > All of the above I can let pass in this release, but in the longer term > we need to look for more structure around these ideas so we can manage and > control what happens. The way this is now is quite raw - suitable for R&D > but not for longer term production usage by a wider audience, IMHO. I > wouldn't like to make commitments about the longevity of this API either; > if we accept it, it should have a big "may change radically" sign hanging > on it. Having said that, I am interested in progress here and I accept that > things will look like this at this stage of the ideas process, so these > are not things to cause delay. > > Some things I would like to see change on in this release are... > > * It's not clear to me when we load/create the alternate data structures. > That can't happen at _init time. I was expecting this to look like an > infrastructure for unlogged indexes, but it doesn't look like that either. > I expected *_preload_libraries GUCs to load extensions. If we have catalog support, extension shall be loaded prior to the first invocation when optimizer asks the registered one capability of alternative scan. I love the idea. > * The name Custom makes me nervous. It sounds too generic, as if the design > or objectives for this is are a little unclear. AlternateScan sounds like > a better name since its clearer that we are scanning an alternate data > structure rather than the main heap. > I don't have special preference on its name. > * The prune hook makes me feel very uneasy. It seems weirdly specific > implementation detail, made stranger by the otherwise lack of data > maintenance API calls. Calling that for every dirty page sounds like an > issue and my patch rejection indicator is flashing red around that. > All I want to do is cache-invalidation on the timing when vacuum is running, but the proposed prune hook might not be an only answer. In case when extension manages its cache data structure, which way can we have to invalidate it? I never stick on existing proposition. > Two additional use cases I will be looking to explore will be > > * Code to make Mat Views recognised as alternate scan targets > * Code to allow queries access to sampled data rather the fully detailed > data, if the result would be within acceptable tolerance for user > Let me investigate how to implement it. Probably, the idea around materialized-view is more simple to do. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> Simon Riggs <simon@2ndQuadrant.com> writes: > > [ assorted comments about custom-scan patch, but particularly ] > > > * The prune hook makes me feel very uneasy. It seems weirdly specific > > implementation detail, made stranger by the otherwise lack of data > > maintenance API calls. Calling that for every dirty page sounds like > > an issue and my patch rejection indicator is flashing red around that. > > Yeah. After a fast review of the custom-scan and cache-scan patches, it > seems to me that my original fears are largely confirmed: the custom scan > patch is not going to be sufficient to allow development of any truly new > plan type. Yeah, you can plug in some new execution node types, but actually > doing anything interesting is going to require patching other parts of the > system. Are we going to say to all comers, "sure, we'll put a hook call > anywhere you like, just ask"? I can't see this as being the way to go. > Here is two different points to be discussed; one is generic to the custom- plan API, and other is specific to my cache-only scan implementation. Because existing plan/exec nodes are all built-in and some functional stuffs are consolidated to a particular source file (like createplan.c, setrefs.c), so it does not make problems if commonly called functions are declared as static functions. Custom-plan API changes this assumption, in other words, it allows to have some portion of jobs in createplan.c or setrefs.c externally, so it needs to have the commonly used functions being external. Because I had try & error during development, I could not list up all the functions to be public at once. However, it is not a fundamental matter, should be solved during the discussion on pgsql-hackers. Regarding to the specific portion in the cache-only scan, it may happen if we want to create an extension that tracks vacuuming, independent from custom-scan. Usually, extension utilizes multiple hooks and interfaces to implement the feature they want to do. In case of cache-only scan, unfortunately, PG lacks a way to track heap vacuuming even though it needed to invalidate cached data. It is unrelated issue from the custom-scan API. We may see same problem if I tried to create an extension to count number of records being vacuumed. > Another way of describing the problem is that it's not clear where the API > boundaries are for potential users of a custom-scan feature. (Simon said > several things that are closely related to this point.) One thing I don't > like at all about the patch is its willingness to turn anything whatsoever > into a publicly exported function, which basically says that the design > attitude is there *are* no boundaries. But that's not going to lead to > anything maintainable. We're certainly not going to want to guarantee that > these suddenly-exported functions will all now have stable APIs > forevermore. > I'd like to have *several* existing static functions as (almost) stable APIs, but not all. Indeed, my patch randomly might pick up static functions to redefine as external functions, however, it does not mean custom-plan eventually requires all the functions being external. According to my investigation, here is two types of functions to be exposed. - A function that walks on plan/exec node tree recursively (Eg: create_plan_recurse) - A function that adjusts internal state of the core backend (Eg: fix_expr_common) At least, these functions are not majority. I don't think it should be a strong blocker of this new feature. (I may have oversights of course, please point out.) > Overall I concur with Simon's conclusion that this might be of interest > for R&D purposes, but it's hard to see anyone wanting to support a production > feature built on this. It would be only marginally less painful than > supporting a patch that just adds the equivalent code to the backend in > the traditional way. > As we adjusted FDW APIs through the first several releases, in general, any kind of interfaces takes time to stabilize. Even though it *initially* sticks on R&D purpose (I don't deny), it shall be brushed up to production stage. I think a feature for R&D purpose is a good start-point. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: >> [ assorted comments about custom-scan patch, but particularly ] > >> * The prune hook makes me feel very uneasy. It seems weirdly specific >> implementation detail, made stranger by the otherwise lack of data >> maintenance API calls. Calling that for every dirty page sounds like >> an issue and my patch rejection indicator is flashing red around that. > > Yeah. After a fast review of the custom-scan and cache-scan patches, it > seems to me that my original fears are largely confirmed: the custom scan > patch is not going to be sufficient to allow development of any truly new > plan type. Yeah, you can plug in some new execution node types, but > actually doing anything interesting is going to require patching other > parts of the system. Are we going to say to all comers, "sure, we'll put > a hook call anywhere you like, just ask"? I can't see this as being the > way to go. Without prejudice to the rest of what you said, this argument doesn't hold much water with me. I mean, anything that our extensibility mechanism doesn't support today will require new hooks, but does that mean we're never going to add any more hooks? I sure hope not. When hooks are proposed here, we evaluate on them on their merits and attempt to judge the likelihood that a hook in a particular place will be useful, but generally we're not averse to adding them, and as long as the paths aren't too performance-critical, I don't think we should be averse to adding them. We have a great system today for letting people add new data types and things of that sort, but anything that penetrates more deeply into the heart of the system pretty much can't be done; this is why various companies, such as our respective employers, have developed and maintained forks of the PostgreSQL code base instead of just hooking in to the existing code. We probably can't solve that problem completely, but that doesn't mean we should throw in the towel. And in particular, I think it's pretty normal that a new facility like custom scans might create additional demand for new hooks. If something was completely impossible before, and the new facility makes it almost-possible, then why shouldn't someone ask for a hook there? A prune hook probably has no business in the custom scan patch proper, but whether it's a good idea or a bad one should be decided on the merits. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Yeah. After a fast review of the custom-scan and cache-scan patches, it >> seems to me that my original fears are largely confirmed: the custom scan >> patch is not going to be sufficient to allow development of any truly new >> plan type. Yeah, you can plug in some new execution node types, but >> actually doing anything interesting is going to require patching other >> parts of the system. > Without prejudice to the rest of what you said, this argument doesn't > hold much water with me. I mean, anything that our extensibility > mechanism doesn't support today will require new hooks, but does that > mean we're never going to add any more hooks? I sure hope not. No, that's not what I said. ISTM that the argument for the custom-scan API is that it allows interesting new things to be done *without further modifying the core code*. But the example application (cache_scan) fails to demonstrate that, and indeed seems to be a counterexample. Whether we'd accept cache_scan on its own merits is a separate question. The problem for me is that custom-scan isn't showing that it can support what was claimed without doing serious damage to modularity and maintainability of the core code. What this may mean is that we need more attention to refactoring of the core code. But just removing "static" from any function that looks like it might be handy isn't my idea of well-considered refactoring. More the opposite in fact: if those things turn into APIs that we have to support, it's going to kill any ability to do such refactoring. A concrete example here is setrefs.c, whose responsibilities tend to change from release to release. I think if we committed custom-scan as is, we'd have great difficulty changing setrefs.c's transformations ever again, at least if we hoped to not break users of the custom-scan API. I'm not sure what the solution is --- but turning setrefs into a white box instead of a black box isn't it. regards, tom lane
* Tom Lane (tgl@sss.pgh.pa.us) wrote: > A concrete example here is setrefs.c, whose responsibilities tend to > change from release to release. I think if we committed custom-scan > as is, we'd have great difficulty changing setrefs.c's transformations > ever again, at least if we hoped to not break users of the custom-scan > API. I'm not sure what the solution is --- but turning setrefs into > a white box instead of a black box isn't it. Yeah, this was my (general) complaint as well and the answer that I kept getting back is "well, it's ok, you can still break it between major releases and the custom scan users will just have to deal with it". I'm a bit on the fence about that, itself, but the other half of that coin is that we could end up with parts of the *core* code that think it's ok to go pulling in these functions, once they're exposed, and that could end up making things quite ugly and difficult to maintain going forward. Thanks, Stephen
On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Yeah. After a fast review of the custom-scan and cache-scan patches, it >>> seems to me that my original fears are largely confirmed: the custom scan >>> patch is not going to be sufficient to allow development of any truly new >>> plan type. Yeah, you can plug in some new execution node types, but >>> actually doing anything interesting is going to require patching other >>> parts of the system. > >> Without prejudice to the rest of what you said, this argument doesn't >> hold much water with me. I mean, anything that our extensibility >> mechanism doesn't support today will require new hooks, but does that >> mean we're never going to add any more hooks? I sure hope not. > > No, that's not what I said. ISTM that the argument for the custom-scan > API is that it allows interesting new things to be done *without further > modifying the core code*. But the example application (cache_scan) fails > to demonstrate that, and indeed seems to be a counterexample. Whether > we'd accept cache_scan on its own merits is a separate question. The > problem for me is that custom-scan isn't showing that it can support what > was claimed without doing serious damage to modularity and maintainability > of the core code. I think there's two separate things in there, one of which I agree with and one of which I disagree with. I agree that we must avoid damaging the modularity and maintainability of the core code; I don't agree that custom-scan needs to be able to do interesting things with zero additional changes to the core code. If we come up with three interesting applications for custom scan that require 5 new hooks between them, I'll consider that a major success - assuming those hooks don't unduly limit future changes we may wish to make in the core code. I think your concern about exposing APIs that may not be terribly stable is well-founded, but I don't think that means we shouldn't expose *anything*. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Hi, On 2014-04-15 11:07:11 -0400, Robert Haas wrote: > On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > [ discussion ] What I think this discussion shows that this patch isn't ready for 9.4. The first iteration of the patch came in 2013-11-06. Imo that's pretty damn late for a relatively complex patch. And obviously we don't have agreement on the course forward. I don't think we need to stop discussing, but I think it's pretty clear that this isn't 9.4 material. And that it's far from "Ready for Committer". Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Andres Freund <andres@2ndquadrant.com> writes: > What I think this discussion shows that this patch isn't ready for > 9.4. The first iteration of the patch came in 2013-11-06. Imo that's > pretty damn late for a relatively complex patch. And obviously we don't > have agreement on the course forward. > I don't think we need to stop discussing, but I think it's pretty clear > that this isn't 9.4 material. And that it's far from "Ready for Committer". Yeah. I'm still not exactly convinced that custom-scan will ever allow independent development of new plan types (which, with all due respect to Robert, is what it was being sold as last year in Ottawa). But I'm not opposed in principle to committing it, if we can find a way to have a cleaner API for things like setrefs.c. It seems like late-stage planner processing in general is an issue for this patch (createplan.c and subselect.c are also looking messy). EXPLAIN isn't too great either. I'm not sure exactly what to do about those cases, but I wonder whether things would get better if we had the equivalent of expression_tree_walker/mutator capability for plan nodes. The state of affairs in setrefs and subselect, at least, is a bit reminiscent of the bad old days when we had lots of different bespoke code for traversing expression trees. regards, tom lane
> On Tue, Apr 15, 2014 at 10:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Robert Haas <robertmhaas@gmail.com> writes: > >> On Mon, Apr 14, 2014 at 4:43 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >>> Yeah. After a fast review of the custom-scan and cache-scan > >>> patches, it seems to me that my original fears are largely > >>> confirmed: the custom scan patch is not going to be sufficient to > >>> allow development of any truly new plan type. Yeah, you can plug in > >>> some new execution node types, but actually doing anything > >>> interesting is going to require patching other parts of the system. > > > >> Without prejudice to the rest of what you said, this argument doesn't > >> hold much water with me. I mean, anything that our extensibility > >> mechanism doesn't support today will require new hooks, but does that > >> mean we're never going to add any more hooks? I sure hope not. > > > > No, that's not what I said. ISTM that the argument for the > > custom-scan API is that it allows interesting new things to be done > > *without further modifying the core code*. But the example > > application (cache_scan) fails to demonstrate that, and indeed seems > > to be a counterexample. Whether we'd accept cache_scan on its own > > merits is a separate question. The problem for me is that custom-scan > > isn't showing that it can support what was claimed without doing > > serious damage to modularity and maintainability of the core code. > > I think there's two separate things in there, one of which I agree with > and one of which I disagree with. I agree that we must avoid damaging the > modularity and maintainability of the core code; I don't agree that > custom-scan needs to be able to do interesting things with zero additional > changes to the core code. If we come up with three interesting applications > for custom scan that require 5 new hooks between them, I'll consider that > a major success - assuming those hooks don't unduly limit future changes > we may wish to make in the core code. I think your concern about exposing > APIs that may not be terribly stable is well-founded, but I don't think > that means we shouldn't expose *anything*. > I agree 100%. We usually change hook definition release-by-release, and it is author's responsibility to follow the newer interface if he continues to maintain his extension on the newer release also. Probably, it is a gray stuff neither black nor white. If we can design a perfect interface, it might be good but has no evolution further. Of course, it does not justify poor designed interface, but an important stuff is to find out a best way at this moment. It may take core refactoring, not just exposing static functions. What I tried to implement is the only way to implement it. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
> Andres Freund <andres@2ndquadrant.com> writes: > > What I think this discussion shows that this patch isn't ready for > > 9.4. The first iteration of the patch came in 2013-11-06. Imo that's > > pretty damn late for a relatively complex patch. And obviously we > > don't have agreement on the course forward. > > I don't think we need to stop discussing, but I think it's pretty > > clear that this isn't 9.4 material. And that it's far from "Ready for > Committer". > Yep, today is the expected feature freeze date towards v9.4. It is little bit late to include v9.4 features, unfortunately. > Yeah. I'm still not exactly convinced that custom-scan will ever allow > independent development of new plan types (which, with all due respect to > Robert, is what it was being sold as last year in Ottawa). But I'm not > opposed in principle to committing it, if we can find a way to have a cleaner > API for things like setrefs.c. It seems like late-stage planner processing > in general is an issue for this patch (createplan.c and subselect.c are > also looking messy). EXPLAIN isn't too great either. > > I'm not sure exactly what to do about those cases, but I wonder whether > things would get better if we had the equivalent of > expression_tree_walker/mutator capability for plan nodes. The state of > affairs in setrefs and subselect, at least, is a bit reminiscent of the > bad old days when we had lots of different bespoke code for traversing > expression trees. > Hmm. If we have something like expression_tree_walker/mutator for plan nodes, we can pass a walker/mutator function's pointer instead of exposing static functions that takes recursive jobs. If custom-plan provider (that has sub-plans) got a callback with walker/ mutator pointer, all it has to do for sub-plans are calling this new plan-tree walking support routine with supplied walker/mutator. It seems to me more simple design than what I did. Thanks, -- NEC OSS Promotion Center / PG-Strom Project KaiGai Kohei <kaigai@ak.jp.nec.com>
>> Yeah. I'm still not exactly convinced that custom-scan will ever allow >> independent development of new plan types (which, with all due respect to >> Robert, is what it was being sold as last year in Ottawa). But I'm not >> opposed in principle to committing it, if we can find a way to have a cleaner >> API for things like setrefs.c. It seems like late-stage planner processing >> in general is an issue for this patch (createplan.c and subselect.c are >> also looking messy). EXPLAIN isn't too great either. >> >> I'm not sure exactly what to do about those cases, but I wonder whether >> things would get better if we had the equivalent of >> expression_tree_walker/mutator capability for plan nodes. The state of >> affairs in setrefs and subselect, at least, is a bit reminiscent of the >> bad old days when we had lots of different bespoke code for traversing >> expression trees. >> > Hmm. If we have something like expression_tree_walker/mutator for plan nodes, > we can pass a walker/mutator function's pointer instead of exposing static > functions that takes recursive jobs. > If custom-plan provider (that has sub-plans) got a callback with walker/ > mutator pointer, all it has to do for sub-plans are calling this new > plan-tree walking support routine with supplied walker/mutator. > It seems to me more simple design than what I did. > I tried to code the similar walker/mutator functions on plan-node tree, however, it was not available to implement these routines enough simple, because the job of walker/mutator functions are not uniform thus caller side also must have a large switch-case branches. I picked up setrefs.c for my investigation. The set_plan_refs() applies fix_scan_list() on the expression tree being appeared in the plan node if it is delivered from Scan, however, it also applies set_join_references() for subclass of Join, or set_dummy_tlist_references() for some other plan nodes. It implies that the walker/mutator functions of Plan node has to apply different operation according to the type of Plan node. I'm not certain how much different forms are needed. (In addition, set_plan_refs() performs usually like a walker, but often performs as a mutator if trivial subquery....) I'm expecting the function like below. It allows to call plan_walker function for each plan-node and also allows to call expr_walker function for each expression-node on the plan node. bool plan_tree_walker(Plan *plan, bool (*plan_walker) (), bool (*expr_walker)(), void *context) I'd like to see if something other form to implement this routine. One alternative idea to give custom-plan provider a chance to handle its subplans is, to give function pointers (1) to handle recursion of plan-tree and (2) to set up backend's internal state. In case of setrefs.c, set_plan_refs() and fix_expr_common() are minimum necessity for extensions. It also kills necessity to export static functions. How about your thought? -- KaiGai Kohei <kaigai@kaigai.gr.jp>