Thread: Fix for FETCH FIRST syntax problems
Per bug #15200, our support for sql2008 FETCH FIRST syntax is incomplete to the extent that it should be regarded as a bug; the spec quite clearly allows parameters/host variables in addition to constants, but we don't. Attached is a draft patch for fixing that, which additionally fixes the ugly wart that OFFSET <x> ROW/ROWS and FETCH FIRST [<x>] ROW/ROWS ONLY had very different productions for <x>; both now accept c_expr there. Shift/reduce conflict is avoided by taking advantage of the fact that ONLY is a fully reserved word. I've checked that this change would not make it any harder to add (post-2008 features) WITH TIES or PERCENT in the event that someone wants to do that. I think a case can be made that this should be backpatched; thoughts? (While I can't find any visible change for existing working queries, one change that does occur is that FETCH FIRST -1 ROWS ONLY now returns a different error message; but that was already inconsistent with the error from OFFSET -1 ROWS.) -- Andrew (irc:RhodiumToad) diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index b5d3d3a071..330adb8c37 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -1399,10 +1399,11 @@ OFFSET <replaceable class="parameter">start</replaceable> OFFSET <replaceable class="parameter">start</replaceable> { ROW | ROWS } FETCH { FIRST | NEXT } [ <replaceable class="parameter">count</replaceable> ] { ROW | ROWS } ONLY </synopsis> - In this syntax, to write anything except a simple integer constant for + In this syntax, to write anything except a simple integer constant, + parameter, or variable name for <replaceable class="parameter">start</replaceable> or <replaceable - class="parameter">count</replaceable>, you must write parentheses - around it. + class="parameter">count</replaceable>, you should write parentheses + around it (this is a <productname>PostgreSQL</productname> extension). If <replaceable class="parameter">count</replaceable> is omitted in a <literal>FETCH</literal> clause, it defaults to 1. <literal>ROW</literal> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index babb62dae1..e441c555a4 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -451,7 +451,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <node> fetch_args limit_clause select_limit_value offset_clause select_offset_value - select_offset_value2 opt_select_fetch_first_value + select_fetch_first_value %type <ival> row_or_rows first_or_next %type <list> OptSeqOptList SeqOptList OptParenthesizedSeqOptList @@ -11570,15 +11570,23 @@ limit_clause: parser_errposition(@1))); } /* SQL:2008 syntax */ - | FETCH first_or_next opt_select_fetch_first_value row_or_rows ONLY + /* to avoid shift/reduce conflicts, handle the optional value with + * a separate production rather than an opt_ expression. The fact + * that ONLY is fully reserved means that this way, we defer any + * decision about what rule reduces ROW or ROWS to the point where + * we can see the ONLY token in the lookahead slot. + */ + | FETCH first_or_next select_fetch_first_value row_or_rows ONLY { $$ = $3; } + | FETCH first_or_next row_or_rows ONLY + { $$ = makeIntConst(1, -1); } ; offset_clause: OFFSET select_offset_value { $$ = $2; } /* SQL:2008 syntax */ - | OFFSET select_offset_value2 row_or_rows + | OFFSET select_fetch_first_value row_or_rows { $$ = $2; } ; @@ -11597,21 +11605,16 @@ select_offset_value: /* * Allowing full expressions without parentheses causes various parsing - * problems with the trailing ROW/ROWS key words. SQL only calls for - * constants, so we allow the rest only with parentheses. If omitted, - * default to 1. - */ -opt_select_fetch_first_value: - SignedIconst { $$ = makeIntConst($1, @1); } - | '(' a_expr ')' { $$ = $2; } - | /*EMPTY*/ { $$ = makeIntConst(1, -1); } - ; - -/* - * Again, the trailing ROW/ROWS in this case prevent the full expression - * syntax. c_expr is the best we can do. + * problems with the trailing ROW/ROWS key words. SQL spec only calls for + * <simple value specification>, which is either a literal or a parameter (but + * an <SQL parameter reference> could be an identifier, bringing up conflicts + * with ROW/ROWS). We solve this by leveraging the presense of ONLY (see above) + * to determine whether the expression is missing rather than trying to make it + * optional in this rule. + * + * c_expr covers all the spec-required cases (and more). */ -select_offset_value2: +select_fetch_first_value: c_expr { $$ = $1; } ;
Andrew Gierth <andrew@tao11.riddles.org.uk> writes: > Attached is a draft patch for fixing that, which additionally fixes the > ugly wart that OFFSET <x> ROW/ROWS and FETCH FIRST [<x>] ROW/ROWS ONLY > had very different productions for <x>; both now accept c_expr there. LGTM, except please s/presense/presence/ in grammar comment. Also, why'd you back off "must write" to "should write" in the docs? This doesn't make the parens any more optional than before. > I think a case can be made that this should be backpatched; thoughts? Meh, -0.5. This is not really a bug, because it's operating as designed. You've found a reasonably painless way to improve the grammar, which is great, but it seems more like a feature addition than a bug fix. I'd be fine with sneaking this into v11, though. regards, tom lane
On 20/05/18 00:57, Tom Lane wrote: > Andrew Gierth <andrew@tao11.riddles.org.uk> writes: >> Attached is a draft patch for fixing that, which additionally fixes the >> ugly wart that OFFSET <x> ROW/ROWS and FETCH FIRST [<x>] ROW/ROWS ONLY >> had very different productions for <x>; both now accept c_expr there. > > LGTM, except please s/presense/presence/ in grammar comment. > Also, why'd you back off "must write" to "should write" in the docs? > This doesn't make the parens any more optional than before. It certainly does. It allows this now: PREPARE foo AS TABLE bar FETCH FIRST $1 ROWS ONLY; >> I think a case can be made that this should be backpatched; thoughts? > > Meh, -0.5. This is not really a bug, because it's operating as designed. > You've found a reasonably painless way to improve the grammar, which is > great, but it seems more like a feature addition than a bug fix. > > I'd be fine with sneaking this into v11, though. I'm +1 for backpatching it. It may be operating as designed by PeterE ten years ago, but it's not operating as designed by the SQL standard. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 20/05/18 01:27, Vik Fearing wrote: >> Also, why'd you back off "must write" to "should write" in the docs? >> This doesn't make the parens any more optional than before. >> It certainly does. It allows this now: > > PREPARE foo AS TABLE bar FETCH FIRST $1 ROWS ONLY; Please disregard this comment. My +1 for backpatching still stands. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Vik Fearing <vik.fearing@2ndquadrant.com> writes: > On 20/05/18 00:57, Tom Lane wrote: >> Also, why'd you back off "must write" to "should write" in the docs? >> This doesn't make the parens any more optional than before. > It certainly does. It allows this now: > PREPARE foo AS TABLE bar FETCH FIRST $1 ROWS ONLY; No, it makes the parens omittable in the cases specified in the previous sentence. The sentence I'm complaining about is describing other cases, in which they're still required. > I'm +1 for backpatching it. It may be operating as designed by PeterE > ten years ago, but it's not operating as designed by the SQL standard. By that argument, *anyplace* where we're missing a SQL-spec feature is a back-patchable bug. I don't buy it. It may be that this fix is simple and safe enough that the risk/reward tradeoff favors back-patching, but I think you have to argue it as a favorable tradeoff rather than just saying "this isn't per standard". Consider: if Andrew had completely rewritten gram.y to get the same visible effect, would you think that was back-patchable? regards, tom lane
On Sat, May 19, 2018 at 07:41:10PM -0400, Tom Lane wrote: > Vik Fearing <vik.fearing@2ndquadrant.com> writes: >> I'm +1 for backpatching it. It may be operating as designed by PeterE >> ten years ago, but it's not operating as designed by the SQL standard. > > By that argument, *anyplace* where we're missing a SQL-spec feature > is a back-patchable bug. I don't buy it. +1. -- Michael
Attachment
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: Tom> Also, why'd you back off "must write" to "should write" in the docs? Tom> This doesn't make the parens any more optional than before. Currently, the requirement for parens is inconsistent - FETCH FIRST wants them for absolutely anything that's not a literal constant, but OFFSET x ROWS allows any c_expr (which covers a whole lot of ground in addition to the spec's requirements). The docs don't distinguish these two and just say "must write" parens even though some cases clearly work without. (There's also the slight wart that OFFSET -1 ROWS is a syntax error, whereas the spec defines it as valid syntax but a semantic error. Do we care?) With the patch, c_exprs would be accepted without parens, so saying "must write" parens in the docs clearly isn't entirely accurate. I'm open to better phrasing. I found some more warts: OFFSET +1 ROWS isn't accepted (but should be), and FETCH FIRST 10000000000 ROWS ONLY fails on 32bit and Windows builds. The patch as posted fixes the second of those but makes FETCH FIRST +1 fail instead; I'm working on that. -- Andrew (irc:RhodiumToad)
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: >> I'm +1 for backpatching it. It may be operating as designed by >> PeterE ten years ago, but it's not operating as designed by the SQL >> standard. Tom> By that argument, *anyplace* where we're missing a SQL-spec Tom> feature is a back-patchable bug. I don't buy it. But this is a feature we already claimed to actually support (it's listed in sql_features with a bunch of unqualified YES entries), but in fact doesn't work properly. -- Andrew (irc:RhodiumToad)
Updated patch. This one explicitly accepts +/- ICONST/FCONST in addition to c_expr for both offset and limit, removing the inconsistencies mentioned previously. I changed the wording of the docs part a bit; does that look better? or worse? Old behavior: select 1 offset +1 rows; -- ERROR: syntax error at or near "rows" select 1 fetch first +1 rows only; -- works select 1 offset -1 rows; -- ERROR: syntax error at or near "rows" select 1 fetch first -1 rows only; -- ERROR: LIMIT must not be negative New behavior: select 1 offset +1 rows; -- works select 1 fetch first +1 rows only; -- works select 1 offset -1 rows; -- ERROR: OFFSET must not be negative select 1 fetch first -1 rows only; -- ERROR: LIMIT must not be negative -- Andrew (irc:RhodiumToad) diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index b5d3d3a071..3d59b0c3e5 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -1399,10 +1399,12 @@ OFFSET <replaceable class="parameter">start</replaceable> OFFSET <replaceable class="parameter">start</replaceable> { ROW | ROWS } FETCH { FIRST | NEXT } [ <replaceable class="parameter">count</replaceable> ] { ROW | ROWS } ONLY </synopsis> - In this syntax, to write anything except a simple integer constant for - <replaceable class="parameter">start</replaceable> or <replaceable - class="parameter">count</replaceable>, you must write parentheses - around it. + In this syntax, the <replaceable class="parameter">start</replaceable> + or <replaceable class="parameter">count</replaceable> value is required by + the standard to be a literal constant, a parameter, or a variable name; + as a <productname>PostgreSQL</productname> extension, other expressions + are allowed, but will generally need to be enclosed in parentheses to avoid + ambiguity. If <replaceable class="parameter">count</replaceable> is omitted in a <literal>FETCH</literal> clause, it defaults to 1. <literal>ROW</literal> diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index babb62dae1..2ef3bdecc7 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -451,7 +451,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); %type <node> fetch_args limit_clause select_limit_value offset_clause select_offset_value - select_offset_value2 opt_select_fetch_first_value + select_fetch_first_value I_or_F_const %type <ival> row_or_rows first_or_next %type <list> OptSeqOptList SeqOptList OptParenthesizedSeqOptList @@ -11570,15 +11570,23 @@ limit_clause: parser_errposition(@1))); } /* SQL:2008 syntax */ - | FETCH first_or_next opt_select_fetch_first_value row_or_rows ONLY + /* to avoid shift/reduce conflicts, handle the optional value with + * a separate production rather than an opt_ expression. The fact + * that ONLY is fully reserved means that this way, we defer any + * decision about what rule reduces ROW or ROWS to the point where + * we can see the ONLY token in the lookahead slot. + */ + | FETCH first_or_next select_fetch_first_value row_or_rows ONLY { $$ = $3; } + | FETCH first_or_next row_or_rows ONLY + { $$ = makeIntConst(1, -1); } ; offset_clause: OFFSET select_offset_value { $$ = $2; } /* SQL:2008 syntax */ - | OFFSET select_offset_value2 row_or_rows + | OFFSET select_fetch_first_value row_or_rows { $$ = $2; } ; @@ -11597,22 +11605,28 @@ select_offset_value: /* * Allowing full expressions without parentheses causes various parsing - * problems with the trailing ROW/ROWS key words. SQL only calls for - * constants, so we allow the rest only with parentheses. If omitted, - * default to 1. + * problems with the trailing ROW/ROWS key words. SQL spec only calls for + * <simple value specification>, which is either a literal or a parameter (but + * an <SQL parameter reference> could be an identifier, bringing up conflicts + * with ROW/ROWS). We solve this by leveraging the presence of ONLY (see above) + * to determine whether the expression is missing rather than trying to make it + * optional in this rule. + * + * c_expr covers almost all the spec-required cases (and more), but it doesn't + * cover signed numeric literals, which are allowed by the spec. So we include + * those here explicitly. */ -opt_select_fetch_first_value: - SignedIconst { $$ = makeIntConst($1, @1); } - | '(' a_expr ')' { $$ = $2; } - | /*EMPTY*/ { $$ = makeIntConst(1, -1); } +select_fetch_first_value: + c_expr { $$ = $1; } + | '+' I_or_F_const + { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", NULL, $2, @1); } + | '-' I_or_F_const + { $$ = doNegate($2, @1); } ; -/* - * Again, the trailing ROW/ROWS in this case prevent the full expression - * syntax. c_expr is the best we can do. - */ -select_offset_value2: - c_expr { $$ = $1; } +I_or_F_const: + Iconst { $$ = makeIntConst($1,@1); } + | FCONST { $$ = makeFloatConst($1,@1); } ; /* noise words */
On 20/05/18 01:41, Tom Lane wrote: > Vik Fearing <vik.fearing@2ndquadrant.com> writes: >> On 20/05/18 00:57, Tom Lane wrote: >> I'm +1 for backpatching it. It may be operating as designed by PeterE >> ten years ago, but it's not operating as designed by the SQL standard. > > By that argument, *anyplace* where we're missing a SQL-spec feature > is a back-patchable bug. I don't buy it. Only features we claim to support. I obviously wouldn't consider backpatching ASSERTIONs, for example. > It may be that this fix is simple and safe enough that the risk/reward > tradeoff favors back-patching, but I think you have to argue it as a > favorable tradeoff rather than just saying "this isn't per standard". > Consider: if Andrew had completely rewritten gram.y to get the same > visible effect, would you think that was back-patchable? Is the decision to backpatch based on behavior, or code churn? -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 20/05/18 05:25, Andrew Gierth wrote: > +select_fetch_first_value: > + c_expr { $$ = $1; } > + | '+' I_or_F_const > + { $$ = (Node *) makeSimpleA_Expr(AEXPR_OP, "+", NULL, $2, @1); } > + | '-' I_or_F_const > + { $$ = doNegate($2, @1); } I think there should be a comment for why you're accepting FCONST when the value has to be integral. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Sun, May 20, 2018 at 01:39:27AM +0100, Andrew Gierth wrote: > >>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes: > > >> I'm +1 for backpatching it. It may be operating as designed by > >> PeterE ten years ago, but it's not operating as designed by the SQL > >> standard. > > Tom> By that argument, *anyplace* where we're missing a SQL-spec > Tom> feature is a back-patchable bug. I don't buy it. > > But this is a feature we already claimed to actually support (it's > listed in sql_features with a bunch of unqualified YES entries), but in > fact doesn't work properly. This looks like a bug fix to me, for what it's worth. Best, David. -- David Fetter <david(at)fetter(dot)org> http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Sat, May 19, 2018 at 4:41 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It may be that this fix is simple and safe enough that the risk/reward > tradeoff favors back-patching, but I think you have to argue it as a > favorable tradeoff rather than just saying "this isn't per standard". > Consider: if Andrew had completely rewritten gram.y to get the same > visible effect, would you think that was back-patchable? I strongly agree with the general principle that back-patching a bug fix needs to have a benefit that outweighs its cost. There have been cases where we chose to not back-patch an unambiguous bug fix even though it was clear that incorrect user-visible behavior remained. Conversely, there have been cases where we back-patched a commit that was originally introduced as a performance feature. Whether or not Andrew's patch is formally classified as a bug fix is subjective. I'm inclined to accept it as a bug fix, but I also think that it shouldn't matter very much. The practical implication is that I don't think it's completely out of the question to back-patch, but AFAICT nobody else thinks it's out of the question anyway. Why bother debating something that's inconsequential? FWIW, I am neutral on the important question of whether or not this patch should actually be back-patched. -- Peter Geoghegan
There have been
cases where we chose to not back-patch an unambiguous bug fix even
though it was clear that incorrect user-visible behavior remained.
The risk here is significantly reduced since the existing user-visible behavior is an error which presumably no one is relying upon. Between that and being able to conform to the standard syntax for a long-standing feature I would say the benefit outweighs the cost and risk.
+0.5 to back-patching
David J.
"David G. Johnston" <david.g.johnston@gmail.com> writes: > The risk here is significantly reduced since the existing user-visible > behavior is an error which presumably no one is relying upon. Between that > and being able to conform to the standard syntax for a long-standing > feature I would say the benefit outweighs the cost and risk. The risk you're ignoring is that this patch will break something that *did* work before. Given that the first version did exactly that, I do not think that risk should be considered negligible. I'm going to change my vote for back-patching from -0.5 to -1. regards, tom lane
Greetings, * Peter Geoghegan (pg@bowt.ie) wrote: > Whether or not Andrew's patch is formally classified as a bug fix is > subjective. I'm inclined to accept it as a bug fix, but I also think > that it shouldn't matter very much. The practical implication is that > I don't think it's completely out of the question to back-patch, but > AFAICT nobody else thinks it's out of the question anyway. Why bother > debating something that's inconsequential? Just to be clear, based on what I saw on IRC, this specifically came out of someone complaining that it didn't work and caused difficulty for them. As such, my inclination on this would be to back-patch it, but we'd need to be sure to test it and be confident that it won't break things which worked before. Thanks! Stephen
Attachment
>>>>> "Stephen" == Stephen Frost <sfrost@snowman.net> writes: Stephen> Just to be clear, based on what I saw on IRC, this Stephen> specifically came out of someone complaining that it didn't Stephen> work and caused difficulty for them. Specifically, as I said at the start, it's from bug #15200, see http://postgr.es/m/152647780335.27204.16895288237122418685@wrigleys.postgresql.org Stephen> As such, my inclination on this would be to back-patch it, but Stephen> we'd need to be sure to test it and be confident that it won't Stephen> break things which worked before. Well, that's kind of what I have been doing (and why I posted a second version of the patch). So let's go over the full detail. The old syntax is: OFFSET select_offset_value2 row_or_rows FETCH first_or_next opt_select_fetch_first_value row_or_rows ONLY opt_select_fetch_first_value: SignedIconst | '(' a_expr ')' | /*EMPTY*/; select_offset_value2: c_expr; SignedIconst: Iconst | '+' Iconst | '-' Iconst; The new syntax is: OFFSET select_fetch_first_value row_or_rows FETCH first_or_next select_fetch_first_value row_or_rows ONLY FETCH first_or_next row_or_rows ONLY select_fetch_first_value: c_expr | '+' I_or_F_const | '-' I_or_F_const; In both cases note that the sequence '(' a_expr ')' is a valid c_expr. The old syntax for OFFSET x ROWS clearly simplifies to OFFSET c_expr [ROW|ROWS] and the new syntax clearly accepts a strict superset of that, since it just adds +/- I_or_F_const as an alternative to c_expr, fixing the (probably inconsequential) bug that OFFSET +1 ROWS didn't work despite being legal in the spec. The old syntax for FETCH FIRST expands to: 1) FETCH first_or_next Iconst row_or_rows ONLY 2) FETCH first_or_next '+' Iconst row_or_rows ONLY 3) FETCH first_or_next '-' Iconst row_or_rows ONLY 4) FETCH first_or_next '(' a_expr ')' row_or_rows ONLY 5) FETCH first_or_next row_or_rows ONLY 5) is explicitly matched in the new syntax. 1) and 4) both match via the fact that Iconst and '(' a_expr ')' are valid for c_expr. 2) and 3) are matched in the new syntax with the addition of accepting FCONST as well as Iconst. So all input that satisfied the old syntax will be accepted by the new one. Of course, I have done actual tests comparing the old and new versions, with results consistent with the above. I do note that there seems to be no test coverage at all - none was added in commit 361bfc357 which created the feature, and nobody seems to have cared about it since other than some doc tweaks. I've also checked (by splitting into separate ROW and ROWS alternatives) that there aren't any grammar conflicts being "hidden" inappropriately by precedence (ROWS has a precedence assigned, but ROW does not). Inspection of the bison verbose output confirms this. Normally when messing with the grammar one would also have to consider the effect on dump/restore. But in this case, we never actually generate this syntax when deparsing - offset/limit clauses are always deparsed as the OFFSET x LIMIT y syntax instead. So we don't have to worry about, for example, dumping from a newer point release to an older one; the only time we see this syntax is if the client generates it. -- Andrew (irc:RhodiumToad)
On Sun, May 20, 2018 at 5:16 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "David G. Johnston" <david.g.johnston@gmail.com> writes: >> The risk here is significantly reduced since the existing user-visible >> behavior is an error which presumably no one is relying upon. Between that >> and being able to conform to the standard syntax for a long-standing >> feature I would say the benefit outweighs the cost and risk. > > The risk you're ignoring is that this patch will break something that > *did* work before. Given that the first version did exactly that, > I do not think that risk should be considered negligible. I'm going > to change my vote for back-patching from -0.5 to -1. I'm also -1 for back-patching, although it seems that the ship has already sailed. I don't think that the failure of something to work that could have been made to work if the original feature author had tried harder rises to the level of a bug. If we start routinely back-patching things that fall into that category, we will certainly manage to destabilize older releases on a regular basis. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
If we start routinely
back-patching things that fall into that category, we will certainly
manage to destabilize older releases on a regular basis.
Just because something is bad if done in excess doesn't mean specific moderate partaking is bad too.
We actually did backpatch the NaN stuff and reverted that because, for me, it was a silent change of functioning behavior. I find the decision to back-patch this syntax oversight considerably more obvious than that one was.
David J.