WITH RECURSIVE patch V0.1 - Mailing list pgsql-patches
From | Tatsuo Ishii |
---|---|
Subject | WITH RECURSIVE patch V0.1 |
Date | |
Msg-id | 20080518.205129.86993930.t-ishii@sraoss.co.jp Whole thread Raw |
Responses |
Re: WITH RECURSIVE patch V0.1
Updated patch (Re: WITH RECURSIVE patch V0.1) |
List | pgsql-patches |
WITH RECURSIVE patch V0.1 Here are patches to implement WITH RECURSIVE clause. There are some limitiations and TODO items(see the "Current limitations" section below). Comments are welcome. 1. Credit These patches were developed by Yoshiyuki Asaba (y-asab@sraoss.co.jp) with some discussions with Tatsuo Ishii (ishii@sraoss.co.jp). - prior patches: http://archives.postgresql.org/pgsql-patches/2008-03/msg00327.php The patches include some extentions to the patches above. 2. Design proposales See: http://archives.postgresql.org/pgsql-hackers/2008-02/msg00642.php 3. Implementation details - Parsing recursive queries We convert WITH clause to following structure in analyzer. /* * WithClause - * reprensentation of WITH clause */ typedef struct WithClause { NodeTag type; List *subquery; /* query list */ bool recursive; /* true = WITH RECURSIVE */ } WithClause; If recursive is false, WITH clause is converted to a subquery. Next we call transformWithClause() to check if the query defined in WITH RECURSIVE clause is a liner recursive query. transformWithClause() does followings: 1) Detect query which is not allowed in SQL standard 2) Create a name dependency graphs transoform* functions checks if target tables actually exist and throw an error if tables do not exist. For example in following query: WITH RECURSIVE x AS (SELECT * from y), y AS (SELECT * FROM pg_class) SELECT * FROM x; x's definition referes to y and y is defined after x's definition. If we evaluate x and y in the order of above, an error will occur while evaluating x since y is not defined yet. To avoid the problem we create new function: void checkWellFormedCte(ParseState *pstate, WithClause *withClause); which performs topological sort on graphs. Non recursive term will be the start of the graph, checks type checking one by one. names appear in WITH RECURSIVE clause form a name dependency graph. The nodes in the graph is represented by CteNode type: typedef struct CteNode; struct CteNode { Alias *alias; CteList *dependency; /* if NULL the node does not call other node */ bool self_reference; bool non_recursive; } CteNode; If there is loop in the dependency graph, a matual recursion exists and an error occurs (error code 0A00). * Note that this part is not implemented in the V1.0 patches yet. Tables defined in WITH RECURSIVE clause is identified as RTE_RECURSIVE. A name space is added to p_recursive_namespace in ParseState structure. The added item's structure is following: typedef struct RangeRecursive { NodeTag type; Node *subquery; /* whole subquery */ Alias *alias; /* table alias & optional column aliases */ List *coldeflist; /* list of ColumnDef nodes to describe result * of function returning RECORD */ Node *non_recursive_term; /* anaylze result of non recursive term */ bool recursive; /* reserved */ } RangeRecursive; non_recursive_term keeps the result of analyze on non recursive term. Using this, the type of recursive query is inferenced. - Generating a plan New plan node "Recursion" and "Recursive scan" is added. Recursion scan is a plan which returns WT. Recursion is a plan which execute recursive query. The subplan of Recursion is always "Append". Below is an example EXPLAIN output. EXPLAIN WITH RECURSIVE subdepartment AS ( -- non recursive term SELECT * FROM department WHERE name = 'A' UNION ALL -- recursive term SELECT d.* FROM department AS d, subdepartment WHERE d.parent_department = subdepartment.id ) SELECT * FROM subdepartment; QUERY PLAN ------------------------------------------------------------------------------------------- Recursion on subdepartment (cost=0.00..50.76 rows=12 width=40) -> Append (cost=0.00..50.64 rows=12 width=40) -> Seq Scan on department (cost=0.00..24.50 rows=6 width=40) <-- non recusrive term Filter: (name = 'A'::text) -> Hash Join (cost=0.01..26.02 rows=6 width=40) <-- recursive term Hash Cond: (d.parent_department = subdepartment.id) -> Seq Scan on department d (cost=0.00..21.60 rows=1160 width=40) -> Hash (cost=0.00..0.00 rows=1 width=4) -> Recursive Scan on subdepartment (cost=0.00..0.00 rows=1 width=4) - Executor new files nodeRecursion.c and nodeRecursivescan.c are added. The structure which represents the Recursion plan is as follows: typedef struct Recursion { Scan scan; Plan *subplan; List *subrtable; /* temporary workspace for planner */ } Recursion; typedef struct RecursionState { ScanState ss; /* its first field is NodeTag */ PlanState *subplan; bool recursive_empty; /* if recursive term does exist, true */ Tuplestorestate *intermediate_tuplestorestate; /* intermediate table */ bool init_done; /* initialization flag */ } RecursiveState; Following function are defined. TupleTableSlot * ExecRecursion(RecursionState *) { ... if (non recursive term) { slot = ExecProcNode(non recurisve term) tuplestore_puttuple(WT, slot); return slot; } /* calculate recursive term */ slot = ExecProcNode(recursive term) if (!TupIsNull(slot) { slot = ExecProcNode(Nested Loop); if (slot is empty) swap(intermediate_tuplestorestate, WT) else tuplestore_puttuple(intermediate_tuplestorestate, slot) } ... } - Executing RecursiveScan Returns tuples store in a work table. Work table is stored in Estate structure. typedef struct EState { ... Tuplestorestate *es_tuplestorestate; /* Stuff used for recursive query */ ... } EState; RecursiveScan and RecursiveScanState structures. typedef struct RecursiveScan { Plan *plan; Plan *subplan; } RecursiveScan; typedef struct RecursiveScanState { ScanState ss; TupleDesc tupdesc; bool done_init; } RecursiveScanState; RecursiveScanState initialized RecursionState the initialize the type information. Since Recursive plan is an upper plan of RecursiveScan, initialization should be delayed. For this purpose we use done_init flag. Scan functions: static TupleTableSlot * RecursivescanNext(RecursiveScanState *node) { ... if (tuplestore_gettupleslot(node->ss.ps->state->es_tuplestorestate, true, slot)) return slot; ... } TupleTableSlot * ExecRecursiveScan(RecursiveScanState *) { /* * Delay initializing type info because type inference did not do * in ExecInitRecursiveScan. */ if (!node->init_done) { ExecAssignScanType(&node->ss, node->ss.ps.state->es_rscan_tupledesc); /* * Initialize result tuple type and projection info. */ ExecAssignResultTypeFromTL(&node->ss.ps); ExecAssignScanProjectionInfo(&node->ss); node->init_done = true; } return ExecScan(&node->ss, (ExecScanAccessMtd) RecursivescanNext); } 4. New source codes: - src/backend/parser/parse_cte.c parse CTE (not completed) - src/backend/executor/nodeRecursion.c - src/include/nodeRecursion.h Execute Recursion plan - src/backend/executor/nodeRecursivescan.c - src/include/nodeRecursivescan.h Execute Recursive scan plan 5. Limitations and TODO 1) Only the last SELECT of UNION ALL can include self recursion name. 2) Cost of Recursion and Recursivescan plan are always 0. 3) Some recursion type checking is not complete. 4) Outer join for recursive name and tables does not work. 5) Need regression test cases. 6) Documentation needed. 6. Executing WITH RECURSIVE cluase examples DROP TABLE IF EXISTS department; DROP TABLE CREATE TABLE department ( id INT PRIMARY KEY, -- department ID parent_department INT REFERENCES department, -- upper department ID name TEXT -- department name ); psql:recursive-e.sql:6: NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "department_pkey" for table "department" CREATE TABLE INSERT INTO department VALUES (0, NULL, 'ROOT'); INSERT 0 1 INSERT INTO department VALUES (1, 0, 'A'); INSERT 0 1 INSERT INTO department VALUES (2, 1, 'B'); INSERT 0 1 INSERT INTO department VALUES (3, 2, 'C'); INSERT 0 1 INSERT INTO department VALUES (4, 2, 'D'); INSERT 0 1 INSERT INTO department VALUES (5, 0, 'E'); INSERT 0 1 INSERT INTO department VALUES (6, 4, 'F'); INSERT 0 1 INSERT INTO department VALUES (7, 5, 'G'); INSERT 0 1 -- department structure represented here is as follows: -- -- ROOT-+->A-+->B-+->C -- | | -- | +->D-+->F -- +->E-+->G EXPLAIN WITH RECURSIVE subdepartment AS ( -- non recursive term SELECT * FROM department WHERE name = 'A' UNION ALL -- recursive term SELECT d.* FROM department AS d, subdepartment WHERE d.parent_department = subdepartment.id ) SELECT * FROM subdepartment; QUERY PLAN ------------------------------------------------------------------------------------------- Recursion on subdepartment (cost=0.00..50.76 rows=12 width=40) -> Append (cost=0.00..50.64 rows=12 width=40) -> Seq Scan on department (cost=0.00..24.50 rows=6 width=40) Filter: (name = 'A'::text) -> Hash Join (cost=0.01..26.02 rows=6 width=40) Hash Cond: (d.parent_department = subdepartment.id) -> Seq Scan on department d (cost=0.00..21.60 rows=1160 width=40) -> Hash (cost=0.00..0.00 rows=1 width=4) -> Recursive Scan on subdepartment (cost=0.00..0.00 rows=1 width=4) (9 rows) -- extract all departments under 'A'. Result should be A, B, C, D and F WITH RECURSIVE subdepartment AS ( -- non recursive term SELECT * FROM department WHERE name = 'A' UNION ALL -- recursive term SELECT d.* FROM department AS d, subdepartment WHERE d.parent_department = subdepartment.id ) SELECT * FROM subdepartment ORDER BY name; id | parent_department | name ----+-------------------+------ 1 | 0 | A 2 | 1 | B 3 | 2 | C 4 | 2 | D 6 | 4 | F (5 rows) -- extract all departments under 'E'. Result should be E and G. WITH RECURSIVE subdepartment AS ( -- non recursive term SELECT * FROM department WHERE name = 'E' UNION ALL -- recursive term SELECT d.* FROM department AS d, subdepartment WHERE d.parent_department = subdepartment.id ) SELECT * FROM subdepartment ORDER BY name; id | parent_department | name ----+-------------------+------ 5 | 0 | E 7 | 5 | G (2 rows) -- Tatsuo Ishii SRA OSS, Inc. Japan
Attachment
pgsql-patches by date: