Re: Row pattern recognition - Mailing list pgsql-hackers
| From | Tatsuo Ishii |
|---|---|
| Subject | Re: Row pattern recognition |
| Date | |
| Msg-id | 20251127.101642.1466569288708851703.ishii@postgresql.org Whole thread Raw |
| In response to | Re: Row pattern recognition (Chao Li <li.evan.chao@gmail.com>) |
| Responses |
Re: Row pattern recognition
|
| List | pgsql-hackers |
> I just finished reviewing 0007 and 0008. This patch set really demonstrates the full lifecycle of adding a SQL
feature,from changing the syntax in gram.y all the way down to the executor, including tests and docs. I learned a lot
fromit. Thanks!
You are welcome!
> 23 - 0007
>
> As you mentioned earlier, pattern regular expression support *, + and ?, but I don’t see ? is tested.
Good catch. I will add the test case. In the mean time I found a bug
with gram.y part which handles '?' quantifier. gram.y can be
successfully compiled but there's no way to put '?' quantifier in a
SQL statement. We cannot write directly "ColId '?'" like other
quantifier '*' and '+' in the grammer because '?' is not a "self"
token. So we let '?' matches Op first then check if it's '?' or
not.
| ColId Op
{
if (strcmp("?", $2))
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
errmsg("unsupported quantifier"),
parser_errposition(@2));
$$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
}
Another bug was with executor (nodeWindowAgg.c). The code to check
greedy quantifiers missed the case '?'.
> 24 - 0007
>
> I don’t see negative tests for unsupported quantifiers, like PATTERN (A+?).
I will add such test cases.
> 25 - 0007
> ```
> +-- basic test with none greedy pattern
> ```
>
> Typo: “none greedy” -> non-greedy
Will fix.
> No comment for 0008.
Thanks again for the review. I will post v35 patch set soon. Attached
is the diff from v34.
Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
diff --git a/doc/src/sgml/advanced.sgml b/doc/src/sgml/advanced.sgml
index 7514f2a3848..eec2a0a9346 100644
--- a/doc/src/sgml/advanced.sgml
+++ b/doc/src/sgml/advanced.sgml
@@ -559,13 +559,14 @@ DEFINE
DOWN AS price < PREV(price)
</programlisting>
- Note that <function>PREV</function> returns the price column in the
- previous row if it's called in a context of row pattern recognition. Thus
- in the second line the definition variable "UP" is <literal>TRUE</literal>
- when the price column in the current row is greater than the price column
- in the previous row. Likewise, "DOWN" is <literal>TRUE</literal> when the
- price column in the current row is lower than the price column in the
- previous row.
+ Note that <function>PREV</function> returns the <literal>price</literal>
+ column in the previous row if it's called in a context of row pattern
+ recognition. Thus in the second line the definition variable "UP"
+ is <literal>TRUE</literal> when the price column in the current row is
+ greater than the price column in the previous row. Likewise, "DOWN"
+ is <literal>TRUE</literal> when the
+ <literal>price</literal> column in the current row is lower than
+ the <literal>price</literal> column in the previous row.
</para>
<para>
Once <literal>DEFINE</literal> exists, <literal>PATTERN</literal> can be
@@ -624,10 +625,10 @@ FROM stock
<para>
When a query involves multiple window functions, it is possible to write
out each one with a separate <literal>OVER</literal> clause, but this is
- duplicative and error-prone if the same windowing behavior is wanted
- for several functions. Instead, each windowing behavior can be named
- in a <literal>WINDOW</literal> clause and then referenced in <literal>OVER</literal>.
- For example:
+ duplicative and error-prone if the same windowing behavior is wanted for
+ several functions. Instead, each windowing behavior can be named in
+ a <literal>WINDOW</literal> clause and then referenced
+ in <literal>OVER</literal>. For example:
<programlisting>
SELECT sum(salary) OVER w, avg(salary) OVER w
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 428bd4f7372..aebc797545e 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -937,7 +937,6 @@ WINDOW <replaceable class="parameter">window_name</replaceable> AS ( <replaceabl
[ PARTITION BY <replaceable class="parameter">expression</replaceable> [, ...] ]
[ ORDER BY <replaceable class="parameter">expression</replaceable> [ ASC | DESC | USING <replaceable
class="parameter">operator</replaceable>] [ NULLS { FIRST | LAST } ] [, ...] ]
[ <replaceable class="parameter">frame_clause</replaceable> ]
-[ <replaceable class="parameter">row_pattern_common_syntax</replaceable> ]
</synopsis>
</para>
@@ -1097,8 +1096,9 @@ EXCLUDE NO OTHERS
includes following subclauses.
<synopsis>
-[ AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW ]
-PATTERN <replaceable class="parameter">pattern_variable_name</replaceable>[+] [, ...]
+[ { AFTER MATCH SKIP PAST LAST ROW | AFTER MATCH SKIP TO NEXT ROW } ]
+[ INITIAL | SEEK ]
+PATTERN ( <replaceable class="parameter">pattern_variable_name</replaceable>[*|+|?] [, ...] )
DEFINE <replaceable class="parameter">definition_varible_name</replaceable> AS <replaceable
class="parameter">expression</replaceable>[, ...]
</synopsis>
<literal>AFTER MATCH SKIP PAST LAST ROW</literal> or <literal>AFTER MATCH
@@ -1107,9 +1107,16 @@ DEFINE <replaceable class="parameter">definition_varible_name</replaceable> AS <
ROW</literal> (the default) next row position is next to the last row of
previous match. On the other hand, with <literal>AFTER MATCH SKIP TO NEXT
ROW</literal> next row position is always next to the last row of previous
- match. <literal>DEFINE</literal> defines definition variables along with a
- boolean expression. <literal>PATTERN</literal> defines a sequence of rows
- that satisfies certain conditions using variables defined
+ match. <literal>INITIAL</literal> or <literal>SEEK</literal> defines how a
+ successful pattern matching starts from which row in a
+ frame. If <literal>INITIAL</literal> is specified, the match must start
+ from the first row in the frame. If <literal>SEEK</literal> is specified,
+ the set of matching rows do not necessarily start from the first row. The
+ default is <literal>INITIAL</literal>. Currently
+ only <literal>INITIAL</literal> is supported. <literal>DEFINE</literal>
+ defines definition variables along with a boolean
+ expression. <literal>PATTERN</literal> defines a sequence of rows that
+ satisfies certain conditions using variables defined
in <literal>DEFINE</literal> clause. If the variable is not defined in
the <literal>DEFINE</literal> clause, it is implicitly assumed following
is defined in the <literal>DEFINE</literal> clause.
diff --git a/src/backend/executor/nodeWindowAgg.c b/src/backend/executor/nodeWindowAgg.c
index 9ce072434b4..d230f43e607 100644
--- a/src/backend/executor/nodeWindowAgg.c
+++ b/src/backend/executor/nodeWindowAgg.c
@@ -4754,7 +4754,7 @@ update_reduced_frame(WindowObject winobj, int64 pos)
{
char *quantifier = strVal(lfirst(lc1));
- if (*quantifier == '+' || *quantifier == '*')
+ if (*quantifier == '+' || *quantifier == '*' || *quantifier == '?')
{
greedy = true;
break;
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index fccc26964a0..fdfe14d54e5 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -16825,7 +16825,21 @@ row_pattern_term:
ColId { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "", (Node *)makeString($1), NULL, @1); }
| ColId '*' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "*", (Node *)makeString($1), NULL, @1); }
| ColId '+' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", (Node *)makeString($1), NULL, @1); }
- | ColId '?' { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1); }
+/*
+ * '?' quantifier. We cannot write this directly "ColId '?'" because '?' is
+ * not a "self" token. So we let '?' matches Op first then check if it's '?'
+ * or not.
+ */
+ | ColId Op
+ {
+ if (strcmp("?", $2))
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unsupported quantifier"),
+ parser_errposition(@2));
+
+ $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "?", (Node *)makeString($1), NULL, @1);
+ }
;
row_pattern_definition_list:
@@ -17982,6 +17996,7 @@ unreserved_keyword:
| DECLARE
| DEFAULTS
| DEFERRED
+ | DEFINE
| DEFINER
| DELETE_P
| DELIMITER
@@ -18402,7 +18417,6 @@ reserved_keyword:
| CURRENT_USER
| DEFAULT
| DEFERRABLE
- | DEFINE
| DESC
| DISTINCT
| DO
diff --git a/src/backend/parser/parse_clause.c b/src/backend/parser/parse_clause.c
index 3521d2780e2..8e4dc732861 100644
--- a/src/backend/parser/parse_clause.c
+++ b/src/backend/parser/parse_clause.c
@@ -3920,7 +3920,7 @@ transformRPR(ParseState *pstate, WindowClause *wc, WindowDef *windef,
if ((wc->frameOptions & FRAMEOPTION_START_CURRENT_ROW) == 0)
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
- errmsg("FRAME must start at current row when row patttern recognition is used")));
+ errmsg("FRAME must start at current row when row pattern recognition is used")));
/* Transform AFTER MACH SKIP TO clause */
wc->rpSkipTo = windef->rpCommonSyntax->rpSkipTo;
@@ -3949,7 +3949,7 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
List **targetlist)
{
/* DEFINE variable name initials */
- static char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz";
+ static const char *defineVariableInitials = "abcdefghijklmnopqrstuvwxyz";
ListCell *lc,
*l;
@@ -3959,7 +3959,7 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
List *defineClause;
char *name;
int initialLen;
- int i;
+ int numinitials;
/*
* If Row Definition Common Syntax exists, DEFINE clause must exist. (the
@@ -4030,8 +4030,12 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
* equivalent.
*/
restargets = NIL;
+ numinitials = 0;
+ initialLen = strlen(defineVariableInitials);
foreach(lc, windef->rpCommonSyntax->rpDefs)
{
+ char initial[2];
+
restarget = (ResTarget *) lfirst(lc);
name = restarget->name;
@@ -4071,26 +4075,12 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
name),
parser_errposition(pstate, exprLocation((Node *) r))));
}
- restargets = lappend(restargets, restarget);
- }
- list_free(restargets);
-
- /*
- * Create list of row pattern DEFINE variable name's initial. We assign
- * [a-z] to them (up to 26 variable names are allowed).
- */
- restargets = NIL;
- i = 0;
- initialLen = strlen(defineVariableInitials);
-
- foreach(lc, windef->rpCommonSyntax->rpDefs)
- {
- char initial[2];
- restarget = (ResTarget *) lfirst(lc);
- name = restarget->name;
-
- if (i >= initialLen)
+ /*
+ * Create list of row pattern DEFINE variable name's initial. We
+ * assign [a-z] to them (up to 26 variable names are allowed).
+ */
+ if (numinitials >= initialLen)
{
ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
@@ -4099,12 +4089,16 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
parser_errposition(pstate,
exprLocation((Node *) restarget))));
}
- initial[0] = defineVariableInitials[i++];
+ initial[0] = defineVariableInitials[numinitials++];
initial[1] = '\0';
wc->defineInitial = lappend(wc->defineInitial,
makeString(pstrdup(initial)));
+
+ restargets = lappend(restargets, restarget);
}
+ list_free(restargets);
+ /* turns a list of ResTarget's into a list of TargetEntry's */
defineClause = transformTargetList(pstate, windef->rpCommonSyntax->rpDefs,
EXPR_KIND_RPR_DEFINE);
@@ -4120,6 +4114,8 @@ transformDefineClause(ParseState *pstate, WindowClause *wc, WindowDef *windef,
/*
* transformPatternClause
* Process PATTERN clause and return PATTERN clause in the raw parse tree
+ *
+ * windef->rpCommonSyntax must exist.
*/
static void
transformPatternClause(ParseState *pstate, WindowClause *wc,
@@ -4127,11 +4123,7 @@ transformPatternClause(ParseState *pstate, WindowClause *wc,
{
ListCell *lc;
- /*
- * Row Pattern Common Syntax clause exists?
- */
- if (windef->rpCommonSyntax == NULL)
- return;
+ Assert(windef->rpCommonSyntax != NULL);
wc->patternVariable = NIL;
wc->patternRegexp = NIL;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 9e17abbaec5..3cc0ce720b2 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -1621,7 +1621,7 @@ typedef struct WindowClause
Index winref; /* ID referenced by window functions */
/* did we copy orderClause from refname? */
bool copiedOrder pg_node_attr(query_jumble_ignore);
- /* Row Pattern AFTER MACH SKIP clause */
+ /* Row Pattern AFTER MATCH SKIP clause */
RPSkipTo rpSkipTo; /* Row Pattern Skip To type */
bool initial; /* true if <row pattern initial or seek> is
* initial */
diff --git a/src/include/parser/kwlist.h b/src/include/parser/kwlist.h
index 7c60b9b44a8..89dc2a4b95a 100644
--- a/src/include/parser/kwlist.h
+++ b/src/include/parser/kwlist.h
@@ -129,7 +129,7 @@ PG_KEYWORD("default", DEFAULT, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("defaults", DEFAULTS, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("deferrable", DEFERRABLE, RESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("deferred", DEFERRED, UNRESERVED_KEYWORD, BARE_LABEL)
-PG_KEYWORD("define", DEFINE, RESERVED_KEYWORD, BARE_LABEL)
+PG_KEYWORD("define", DEFINE, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("definer", DEFINER, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("delete", DELETE_P, UNRESERVED_KEYWORD, BARE_LABEL)
PG_KEYWORD("delimiter", DELIMITER, UNRESERVED_KEYWORD, BARE_LABEL)
diff --git a/src/test/regress/expected/rpr.out b/src/test/regress/expected/rpr.out
index 59cfed180e7..312b62fb489 100644
--- a/src/test/regress/expected/rpr.out
+++ b/src/test/regress/expected/rpr.out
@@ -165,7 +165,45 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
company2 | 07-10-2023 | 1300 | | |
(20 rows)
--- basic test with none greedy pattern
+-- basic test using PREV. Use '?'
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w,
+ nth_value(tdate, 2) OVER w AS nth_second
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP? DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+ company | tdate | price | first_value | last_value | nth_second
+----------+------------+-------+-------------+------------+------------
+ company1 | 07-01-2023 | 100 | 100 | 140 | 07-02-2023
+ company1 | 07-02-2023 | 200 | | |
+ company1 | 07-03-2023 | 150 | | |
+ company1 | 07-04-2023 | 140 | | |
+ company1 | 07-05-2023 | 150 | 150 | 90 | 07-06-2023
+ company1 | 07-06-2023 | 90 | | |
+ company1 | 07-07-2023 | 110 | 110 | 120 | 07-08-2023
+ company1 | 07-08-2023 | 130 | | |
+ company1 | 07-09-2023 | 120 | | |
+ company1 | 07-10-2023 | 130 | | |
+ company2 | 07-01-2023 | 50 | 50 | 1400 | 07-02-2023
+ company2 | 07-02-2023 | 2000 | | |
+ company2 | 07-03-2023 | 1500 | | |
+ company2 | 07-04-2023 | 1400 | | |
+ company2 | 07-05-2023 | 1500 | 1500 | 60 | 07-06-2023
+ company2 | 07-06-2023 | 60 | | |
+ company2 | 07-07-2023 | 1100 | 1100 | 1200 | 07-08-2023
+ company2 | 07-08-2023 | 1300 | | |
+ company2 | 07-09-2023 | 1200 | | |
+ company2 | 07-10-2023 | 1300 | | |
+(20 rows)
+
+-- basic test with none-greedy pattern
SELECT company, tdate, price, count(*) OVER w
FROM stock
WINDOW w AS (
@@ -927,7 +965,7 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
ERROR: aggregate functions are not allowed in DEFINE
LINE 11: LOWPRICE AS price < count(*)
^
--- FRAME must start at current row when row patttern recognition is used
+-- FRAME must start at current row when row pattern recognition is used
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
FROM stock
WINDOW w AS (
@@ -941,7 +979,7 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
-ERROR: FRAME must start at current row when row patttern recognition is used
+ERROR: FRAME must start at current row when row pattern recognition is used
-- SEEK is not supported
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
FROM stock
@@ -977,3 +1015,38 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
DOWN AS price < PREV(1)
);
ERROR: row pattern navigation operation's argument must include at least one column reference
+-- Unsupported quantifier
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UP~ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(1),
+ DOWN AS price < PREV(1)
+);
+ERROR: unsupported quantifier
+LINE 9: PATTERN (START UP~ DOWN+)
+ ^
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UP+? DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(1),
+ DOWN AS price < PREV(1)
+);
+ERROR: unsupported quantifier
+LINE 9: PATTERN (START UP+? DOWN+)
+ ^
diff --git a/src/test/regress/sql/rpr.sql b/src/test/regress/sql/rpr.sql
index 47e67334994..ffcf5476198 100644
--- a/src/test/regress/sql/rpr.sql
+++ b/src/test/regress/sql/rpr.sql
@@ -75,7 +75,22 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
DOWN AS price < PREV(price)
);
--- basic test with none greedy pattern
+-- basic test using PREV. Use '?'
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w,
+ nth_value(tdate, 2) OVER w AS nth_second
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ INITIAL
+ PATTERN (START UP? DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(price),
+ DOWN AS price < PREV(price)
+);
+
+-- basic test with none-greedy pattern
SELECT company, tdate, price, count(*) OVER w
FROM stock
WINDOW w AS (
@@ -438,7 +453,7 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
LOWPRICE AS price < count(*)
);
--- FRAME must start at current row when row patttern recognition is used
+-- FRAME must start at current row when row pattern recognition is used
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
FROM stock
WINDOW w AS (
@@ -484,3 +499,34 @@ SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER
UP AS price > PREV(1),
DOWN AS price < PREV(1)
);
+
+-- Unsupported quantifier
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UP~ DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(1),
+ DOWN AS price < PREV(1)
+);
+
+SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER w
+ FROM stock
+ WINDOW w AS (
+ PARTITION BY company
+ ORDER BY tdate
+ ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
+ AFTER MATCH SKIP TO NEXT ROW
+ INITIAL
+ PATTERN (START UP+? DOWN+)
+ DEFINE
+ START AS TRUE,
+ UP AS price > PREV(1),
+ DOWN AS price < PREV(1)
+);
pgsql-hackers by date: