Thread: Add hint for function named "is"
CREATE FUNCTION pg_temp.is() RETURNS text LANGUAGE sql AS $$SELECT 'x'::text$$; SELECT 'x'||is(); ERROR: syntax error at or near "(" LINE 1: SELECT 'x'||is(); I was finally able to figure out this was because "is" needs to be quoted; is there a way this could be hinted? FWIW, the real-world case here comes from using pgTap, which has an is() function. I've used that countless times by itself without quoting, so it never occurred to me that the syntax error was due to lack of quotes. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On Aug 11, 2016, at 2:11 PM, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote: > CREATE FUNCTION pg_temp.is() RETURNS text LANGUAGE sql AS $$SELECT 'x'::text$$; > SELECT 'x'||is(); > ERROR: syntax error at or near "(" > LINE 1: SELECT 'x'||is(); > > I was finally able to figure out this was because "is" needs to be quoted; is there a way this could be hinted? > > FWIW, the real-world case here comes from using pgTap, which has an is() function. I've used that countless times by itselfwithout quoting, so it never occurred to me that the syntax error was due to lack of quotes. Why does it need quotation marks in this case? D
"David E. Wheeler" <david@justatheory.com> writes: > On Aug 11, 2016, at 2:11 PM, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote: >> SELECT 'x'||is(); >> ERROR: syntax error at or near "(" > Why does it need quotation marks in this case? It doesn't, if you do something like regression=# select is(); ERROR: function is() does not exist LINE 1: select is(); ^ which probably contributes to Jim's confusion. I think what is happening in the trouble case is that since IS has lower precedence than Op, the grammar decides it ought to resolve || as a postfix operator, and then it effectively has('x' ||) IS ... which leaves noplace to go except IS NULL and other IS-something syntaxes. You'd likely have similar problems with any other keyword that has lower precedence than Op; but a large fraction of those are fully-reserved words and so no one would have had any expectation of being able to leave them unquoted anyway. I'm not sure there's much we can do about this. Even if we had control of what Bison prints for syntax errors, which we don't really, it's hard to see what condition we could trigger the hint on that wouldn't result in false positives at least as often as something helpful. (Note that the grammar's behavior can't really depend on whether a function named is() actually exists in the catalogs.) regards, tom lane
On 8/11/16 4:54 PM, Tom Lane wrote: > which probably contributes to Jim's confusion. I think what is happening > in the trouble case is that since IS has lower precedence than Op, the > grammar decides it ought to resolve || as a postfix operator, and then > it effectively has > ('x' ||) IS ... FWIW, is() || is() blows up in the same way. > I'm not sure there's much we can do about this. Even if we had control of > what Bison prints for syntax errors, which we don't really, it's hard to > see what condition we could trigger the hint on that wouldn't result in > false positives at least as often as something helpful. (Note that the > grammar's behavior can't really depend on whether a function named is() > actually exists in the catalogs.) Is there a place in the error reporting path where we'd still have access to the 'is' token, and have enough control to look for a relevant function? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On Thu, Aug 11, 2016 at 10:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I think what is happening > in the trouble case is that since IS has lower precedence than Op, the > grammar decides it ought to resolve || as a postfix operator, and then > it effectively has > ('x' ||) IS ... > which leaves noplace to go except IS NULL and other IS-something syntaxes. I wonder whether it's really worth keeping postfix operators. They seem to keep causing these kinds of headaches and I wonder how much the grammar tables would be simplified by removing them. -- greg
Jim Nasby <Jim.Nasby@BlueTreble.com> writes: > Is there a place in the error reporting path where we'd still have > access to the 'is' token, and have enough control to look for a relevant > function? No. The grammar can't assume that it's being run inside a transaction (consider parsing START TRANSACTION, or ROLLBACK after a failure). So catalog access is out, full stop. regards, tom lane
On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark <stark@mit.edu> wrote: > On Thu, Aug 11, 2016 at 10:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I think what is happening >> in the trouble case is that since IS has lower precedence than Op, the >> grammar decides it ought to resolve || as a postfix operator, and then >> it effectively has >> ('x' ||) IS ... >> which leaves noplace to go except IS NULL and other IS-something syntaxes. > > I wonder whether it's really worth keeping postfix operators. They > seem to keep causing these kinds of headaches and I wonder how much > the grammar tables would be simplified by removing them. I've wondered the same thing at other times. The only postfix operator we ship in core is numeric_fac, and frankly it doesn't seem worth it just for that. If we decided that factorial(n) was adequate notation for that case, and that we didn't care about any hypothetical user-defined postfix operators either, I think we simplify or improve a number of things. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark <stark@mit.edu> wrote: >> I wonder whether it's really worth keeping postfix operators. They >> seem to keep causing these kinds of headaches and I wonder how much >> the grammar tables would be simplified by removing them. > I've wondered the same thing at other times. The only postfix > operator we ship in core is numeric_fac, and frankly it doesn't seem > worth it just for that. If we decided that factorial(n) was adequate > notation for that case, and that we didn't care about any hypothetical > user-defined postfix operators either, I think we simplify or improve > a number of things. [ quick experiment... ] Simply removing the two postfix-operator productions produces no meaningful savings (~0.5%), which is unsurprising because after all they're just two more productions to Bison. It's possible we could save more by simplifying the existing hacks elsewhere in the grammar that were needed to work around ambiguities induced by postfix operators. But that would take quite a bit of actual work, and I suspect we'd end up with a similar result that the tables don't actually get much smaller. You could argue for this on the grounds of some reduced intellectual complexity in gram.y, and more forcefully on the grounds of removing user surprise in cases like Jim's (especially if you can find some other cases where it matters). But I doubt that we'd get any kind of noticeable run-time or code-size win. regards, tom lane
On Fri, Aug 12, 2016 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark <stark@mit.edu> wrote: >>> I wonder whether it's really worth keeping postfix operators. They >>> seem to keep causing these kinds of headaches and I wonder how much >>> the grammar tables would be simplified by removing them. > >> I've wondered the same thing at other times. The only postfix >> operator we ship in core is numeric_fac, and frankly it doesn't seem >> worth it just for that. If we decided that factorial(n) was adequate >> notation for that case, and that we didn't care about any hypothetical >> user-defined postfix operators either, I think we simplify or improve >> a number of things. > > [ quick experiment... ] Simply removing the two postfix-operator > productions produces no meaningful savings (~0.5%), which is unsurprising > because after all they're just two more productions to Bison. It's > possible we could save more by simplifying the existing hacks elsewhere > in the grammar that were needed to work around ambiguities induced by > postfix operators. But that would take quite a bit of actual work, > and I suspect we'd end up with a similar result that the tables don't > actually get much smaller. You could argue for this on the grounds of > some reduced intellectual complexity in gram.y, and more forcefully on > the grounds of removing user surprise in cases like Jim's (especially if > you can find some other cases where it matters). But I doubt that we'd > get any kind of noticeable run-time or code-size win. Half a percent for two productions is not bad, but I think the real win would be in removing ambiguity from the grammar. We get periodic complaints about the fact that things like "SELECT 3 cache" don't work because cache is an unreserved keyword, and postfix operators are one of the reasons why we can't do better: /* * We support omitting AS only for column labels that aren't * any known keyword. Thereis an ambiguity against postfix * operators: is "a ! b" an infix expression, or a postfix * expressionand a column label? We prefer to resolve this * as an infix expression, which we accomplish by assigning * IDENT a precedence higher than POSTFIXOP. */ I think I experimented with this a while ago and found that even after removing postfix operators there was at least one other grammar problem that prevented us from accepting ColLabel there. I gave up and didn't dig further, but maybe we should. I sort of like the elegance of supporting user-defied prefix and postfix operators, but in practice this is a not-especially-infrequent problem for people migrating from other databases, a consideration that might be judged to outweigh that elegance. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > Half a percent for two productions is not bad, but I think the real > win would be in removing ambiguity from the grammar. We get periodic > complaints about the fact that things like "SELECT 3 cache" don't work > because cache is an unreserved keyword, and postfix operators are one > of the reasons why we can't do better: Agreed, if postfix operators were the only thing standing between us and fixing that, it would be a pretty strong argument for removing them. > I think I experimented with this a while ago and found that even after > removing postfix operators there was at least one other grammar > problem that prevented us from accepting ColLabel there. I gave up > and didn't dig further, but maybe we should. Yes, it would be good to find that out. I think there's a whole bunch of intertwined issues there, though; this isn't likely to be an easy change. The comment at gram.y lines 679ff lists several things that are relevant, and might or might not be simplifiable without postfix ops. regards, tom lane
I wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I think I experimented with this a while ago and found that even after >> removing postfix operators there was at least one other grammar >> problem that prevented us from accepting ColLabel there. I gave up >> and didn't dig further, but maybe we should. > Yes, it would be good to find that out. I poked at that a little bit, by dint of changing - | a_expr IDENT + | a_expr ColLabel in the target_el production and then seeing what Bison complained about. The majority of the problems boil down to variants of this: state 997 1691 character: CHARACTER . opt_varying VARYING shift, and go to state 1597 VARYING [reduce using rule 1698 (opt_varying)] $default reduce using rule 1698 (opt_varying) opt_varying go to state 1600 What this is telling us is that given input like, say, SELECT 'foo'::character varying Bison is no longer sure whether "varying" is meant as a type name modifier or a ColLabel. And indeed there is *no* principled answer to that that doesn't involve giving up the ability for "varying" to be a ColLabel. Just promoting it to a fully reserved word (which it is not today) wouldn't be enough, because right now even fully reserved words can be ColLabels. Another problem is here: state 1846 1762 a_expr: a_expr ISNULL . 2418 type_func_name_keyword: ISNULL . $end reduce using rule 1762 (a_expr) $end [reduce using rule 2418 (type_func_name_keyword)] pointing out that "SELECT 42 ISNULL" is now ambiguous. Since ISNULL is nonstandard, maybe dropping support for it would be feasible. There are some other problems that look probably fixable with refactoring, but AFAICS the ones above are fundamental. So we basically can't have "you can use anything at all as a ColLabel without AS". We could probably somewhat reduce the set of words you're not allowed to use that way, but we could not even promise that all words that are currently unreserved would work. It's likely that by rejiggering the precedence of productions, we could resolve these ambiguities in favor of "it's not a ColLabel if it appears in a context like this". And probably that wouldn't be too surprising most of the time. But depending on precedence to resolve fundamentally ambiguous grammar is always gonna bite you sometimes. People understand it (... usually ...) in the context of parsing multi-operator expressions, but for other things it's not a great tool. regards, tom lane
On Fri, Aug 12, 2016 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > pointing out that "SELECT 42 ISNULL" is now ambiguous. Since ISNULL > is nonstandard, maybe dropping support for it would be feasible. Isn't ISNULL one of the lexical tricks we used to effectively give bison two token lookahead? -- greg
Greg Stark <stark@mit.edu> writes: > On Fri, Aug 12, 2016 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> pointing out that "SELECT 42 ISNULL" is now ambiguous. Since ISNULL >> is nonstandard, maybe dropping support for it would be feasible. > Isn't ISNULL one of the lexical tricks we used to effectively give > bison two token lookahead? No, it's a SQL-exposed feature, though I think it's just there for compatibility with some random other DBMS. See also NOTNULL. regards, tom lane
On 8/12/16 1:40 PM, Tom Lane wrote: > What this is telling us is that given input like, say, > > SELECT 'foo'::character varying > > Bison is no longer sure whether "varying" is meant as a type name modifier > or a ColLabel. And indeed there is *no* principled answer to that that > doesn't involve giving up the ability for "varying" to be a ColLabel. > Just promoting it to a fully reserved word (which it is not today) > wouldn't be enough, because right now even fully reserved words can be > ColLabels. FWIW, I've always disliked how some types could contains spaces without being quoted. AFAIK nothing else in the system allows that, and I don't see why character varying and timestamp with* should get a special pass. I doubt we could get rid of this in CREATE TABLE, but I wonder how many people actually cast using the unquoted form. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
Jim Nasby <Jim.Nasby@bluetreble.com> writes: > FWIW, I've always disliked how some types could contains spaces without > being quoted. AFAIK nothing else in the system allows that, and I don't > see why character varying and timestamp with* should get a special pass. Because the SQL standard says so. If we were going to start ignoring SQL-standard spellings, this area would not be very high on my list, actually. Their insistence on inventing crazy new special syntaxes for things that could be normal function calls is what bugs me ... regards, tom lane
On Fri, Aug 12, 2016 at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > It's likely that by rejiggering the precedence of productions, we could > resolve these ambiguities in favor of "it's not a ColLabel if it appears > in a context like this". And probably that wouldn't be too surprising > most of the time. But depending on precedence to resolve fundamentally > ambiguous grammar is always gonna bite you sometimes. People understand > it (... usually ...) in the context of parsing multi-operator expressions, > but for other things it's not a great tool. I poked at this a little more today. It might be a little easier to shoot for changing "a_expr IDENT" to "a_expr ColId" than to shoot for allowing "a_expr ColLabel", or even just allowing both "a_expr IDENT" and "a_expr unreserved_keyword". With the rule for postfix operators is removed, following unreserved keywords are problematic: DAY FILTER HOUR MINUTE MONTH OVER SECOND VARYING WITHIN WITHOUT YEAR. I think allowing ColId would create further problems with PRECISION, CHAR, and CHARACTER as well. But they are all shift/reduce conflicts, which somehow scare me quite a bit less than the reduce/reduce conflicts one gets when trying to allow ColLabel. I think there would be a lot of value in coming up with some kind of incremental improvement here; this is a common annoyance for users migrating from other database systems (and one in particular). We currently have 440 keywords of which 290 are unreserved, 50 are column-name keywords, 23 are type-func-name keywords, and 77 are fully reserved. For more than 300 of those key words, the postfix operator rule is the only thing preventing us from allowing it as a column label without "AS". Eliminating just those -- or even a large subset of those -- would make things noticeably better, IMHO. Technically, that doesn't look hard to do: (1) remove the rule that allows postfix ops, or restrict it to operators beginning with ! or where OPERATOR() notation is used, or whatever; (2) add a new production target_el_keyword that includes some or all of the keywords that don't cause grammar conflicts, (3) add a rule that target_el can be "a expr target_el_keyword", (4) profit. Or, since that would make maintaining target_el_keyword a nuisance, split unreserved_keyword into two new categories, unreserved_keyword and very_slightly_reserved_keyword, and update elsewhere accordingly. However, I foresee that Tom will object to the idea of creating a new category of keywords, and I'm happy to do something else if we can figure out what that other thing is. I'm not immediately sure how to use operator precedence to resolve these ambiguities; I think that for that to work we'd have assign a precedence to every keyword that we want to allow here, just as we currently do for IDENT. And that seems like it could have unforeseen side effects. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I think there would be a lot of value in coming up with some kind of > incremental improvement here; this is a common annoyance for users > migrating from other database systems (and one in particular). Agreed, but ... > Technically, that doesn't look hard to do: (1) remove the rule that > allows postfix ops, or restrict it to operators beginning with ! or > where OPERATOR() notation is used, or whatever; (2) add a new > production target_el_keyword that includes some or all of the keywords > that don't cause grammar conflicts, (3) add a rule that target_el can > be "a expr target_el_keyword", (4) profit. Or, since that would make > maintaining target_el_keyword a nuisance, split unreserved_keyword > into two new categories, unreserved_keyword and > very_slightly_reserved_keyword, and update elsewhere accordingly. > However, I foresee that Tom will object to the idea of creating a new > category of keywords, and I'm happy to do something else if we can > figure out what that other thing is. Not *nearly* as much as I'd object to mostly-breaking postfix operators. Now admittedly, if the Berkeley guys had never put those in, nobody would miss them. But they're there, and I'm afraid that users are likely depending on them. Neither of your suggestions above would be any less damaging to such users than removing the feature altogether. > I'm not immediately sure how to > use operator precedence to resolve these ambiguities; Unfortunately, this thread got swapped out of my brain long ago, so I'm not sure what I was talking about either. I could take another look sometime. regards, tom lane
On Mon, Apr 30, 2018 at 2:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Not *nearly* as much as I'd object to mostly-breaking postfix operators. > Now admittedly, if the Berkeley guys had never put those in, nobody > would miss them. But they're there, and I'm afraid that users are > likely depending on them. Neither of your suggestions above would be > any less damaging to such users than removing the feature altogether. Well, that sounds different than what you said upthread in August of 2016: tgl> Agreed, if postfix operators were the only thing standing between us and tgl> fixing that, it would be a pretty strong argument for removing them. I agree that there could be some people using numeric_fac, the only postfix operator we ship. Probably not many, since it's an inefficient implementation of an operation that overflows for all but a few tens of thousands of possible input values, but some. But we could keep that operator working without supporting the facility in general. Do we have any evidence at all that anybody has got a user-defined postfix operator out there? I'm reluctant to keep a feature that's blocking such an important improvement unless we can come up with some real examples of it being used. I've never run across it, and a Google search didn't turn up any examples of anybody else using it, either. The basic problem here, for those following along from home, is that if we allow "SELECT h hour FROM tab" to mean select column h with column alias hour, then "SELECT hour + 24 * day" becomes ambiguous in the face of postfix operators: it could either mean what it obviously is intended to mean, or it could mean that you should look for a postfix * operator, apply that to 24, add hour to the result, and then call the resulting column "day". This is less obviously insane when you use a differently-named operator and columns -- who knows which way "SELECT thunk + squidge!!! snarkke FROM frobnitz" is intended to be interpreted. But I'm just a bit doubtful that anyone is going to stop using PostgreSQL because we take away their ability to make !!! a postfix operator, whereas I think the inability to alias columns without using AS when the column label happens to be a keyword IS a problem for adoption. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company