Thread: Add hint for function named "is"

Add hint for function named "is"

From

Jim Nasby

Date:

11 August 2016, 21:11:15

CREATE FUNCTION pg_temp.is() RETURNS text LANGUAGE sql AS $$SELECT 
'x'::text$$;
SELECT 'x'||is();
ERROR:  syntax error at or near "("
LINE 1: SELECT 'x'||is();

I was finally able to figure out this was because "is" needs to be 
quoted; is there a way this could be hinted?

FWIW, the real-world case here comes from using pgTap, which has an is() 
function. I've used that countless times by itself without quoting, so 
it never occurred to me that the syntax error was due to lack of quotes.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Add hint for function named "is"

From

"David E. Wheeler"

Date:

11 August 2016, 21:32:42

On Aug 11, 2016, at 2:11 PM, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:

> CREATE FUNCTION pg_temp.is() RETURNS text LANGUAGE sql AS $$SELECT 'x'::text$$;
> SELECT 'x'||is();
> ERROR:  syntax error at or near "("
> LINE 1: SELECT 'x'||is();
>
> I was finally able to figure out this was because "is" needs to be quoted; is there a way this could be hinted?
>
> FWIW, the real-world case here comes from using pgTap, which has an is() function. I've used that countless times by
itselfwithout quoting, so it never occurred to me that the syntax error was due to lack of quotes. 

Why does it need quotation marks in this case?

D

Re: Add hint for function named "is"

From

Tom Lane

Date:

11 August 2016, 21:55:04

"David E. Wheeler" <david@justatheory.com> writes:
> On Aug 11, 2016, at 2:11 PM, Jim Nasby <Jim.Nasby@BlueTreble.com> wrote:
>> SELECT 'x'||is();
>> ERROR:  syntax error at or near "("

> Why does it need quotation marks in this case?

It doesn't, if you do something like

regression=# select is();
ERROR:  function is() does not exist
LINE 1: select is();              ^

which probably contributes to Jim's confusion.  I think what is happening
in the trouble case is that since IS has lower precedence than Op, the
grammar decides it ought to resolve || as a postfix operator, and then
it effectively has('x' ||) IS ...
which leaves noplace to go except IS NULL and other IS-something syntaxes.
You'd likely have similar problems with any other keyword that has lower
precedence than Op; but a large fraction of those are fully-reserved words
and so no one would have had any expectation of being able to leave them
unquoted anyway.

I'm not sure there's much we can do about this.  Even if we had control of
what Bison prints for syntax errors, which we don't really, it's hard to
see what condition we could trigger the hint on that wouldn't result in
false positives at least as often as something helpful.  (Note that the
grammar's behavior can't really depend on whether a function named is()
actually exists in the catalogs.)
        regards, tom lane

Re: Add hint for function named "is"

From

Jim Nasby

Date:

12 August 2016, 03:30:49

On 8/11/16 4:54 PM, Tom Lane wrote:
> which probably contributes to Jim's confusion.  I think what is happening
> in the trouble case is that since IS has lower precedence than Op, the
> grammar decides it ought to resolve || as a postfix operator, and then
> it effectively has
>     ('x' ||) IS ...

FWIW, is() || is() blows up in the same way.

> I'm not sure there's much we can do about this.  Even if we had control of
> what Bison prints for syntax errors, which we don't really, it's hard to
> see what condition we could trigger the hint on that wouldn't result in
> false positives at least as often as something helpful.  (Note that the
> grammar's behavior can't really depend on whether a function named is()
> actually exists in the catalogs.)

Is there a place in the error reporting path where we'd still have 
access to the 'is' token, and have enough control to look for a relevant 
function?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Add hint for function named "is"

From

Greg Stark

Date:

12 August 2016, 13:41:32

On Thu, Aug 11, 2016 at 10:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> I think what is happening
> in the trouble case is that since IS has lower precedence than Op, the
> grammar decides it ought to resolve || as a postfix operator, and then
> it effectively has
>         ('x' ||) IS ...
> which leaves noplace to go except IS NULL and other IS-something syntaxes.

I wonder whether it's really worth keeping postfix operators. They
seem to keep causing these kinds of headaches and I wonder how much
the grammar tables would be simplified by removing them.

-- 
greg

Re: Add hint for function named "is"

From

Tom Lane

Date:

12 August 2016, 15:00:14

Jim Nasby <Jim.Nasby@BlueTreble.com> writes:
> Is there a place in the error reporting path where we'd still have 
> access to the 'is' token, and have enough control to look for a relevant 
> function?

No.  The grammar can't assume that it's being run inside a transaction
(consider parsing START TRANSACTION, or ROLLBACK after a failure).
So catalog access is out, full stop.
        regards, tom lane

Re: Add hint for function named "is"

From

Robert Haas

Date:

12 August 2016, 16:42:02

On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark <stark@mit.edu> wrote:
> On Thu, Aug 11, 2016 at 10:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> I think what is happening
>> in the trouble case is that since IS has lower precedence than Op, the
>> grammar decides it ought to resolve || as a postfix operator, and then
>> it effectively has
>>         ('x' ||) IS ...
>> which leaves noplace to go except IS NULL and other IS-something syntaxes.
>
> I wonder whether it's really worth keeping postfix operators. They
> seem to keep causing these kinds of headaches and I wonder how much
> the grammar tables would be simplified by removing them.

I've wondered the same thing at other times.  The only postfix
operator we ship in core is numeric_fac, and frankly it doesn't seem
worth it just for that.  If we decided that factorial(n) was adequate
notation for that case, and that we didn't care about any hypothetical
user-defined postfix operators either, I think we simplify or improve
a number of things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add hint for function named "is"

From

Tom Lane

Date:

12 August 2016, 16:57:51

Robert Haas <robertmhaas@gmail.com> writes:
> On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark <stark@mit.edu> wrote:
>> I wonder whether it's really worth keeping postfix operators. They
>> seem to keep causing these kinds of headaches and I wonder how much
>> the grammar tables would be simplified by removing them.

> I've wondered the same thing at other times.  The only postfix
> operator we ship in core is numeric_fac, and frankly it doesn't seem
> worth it just for that.  If we decided that factorial(n) was adequate
> notation for that case, and that we didn't care about any hypothetical
> user-defined postfix operators either, I think we simplify or improve
> a number of things.

[ quick experiment... ]  Simply removing the two postfix-operator
productions produces no meaningful savings (~0.5%), which is unsurprising
because after all they're just two more productions to Bison.  It's
possible we could save more by simplifying the existing hacks elsewhere
in the grammar that were needed to work around ambiguities induced by
postfix operators.  But that would take quite a bit of actual work,
and I suspect we'd end up with a similar result that the tables don't
actually get much smaller.  You could argue for this on the grounds of
some reduced intellectual complexity in gram.y, and more forcefully on
the grounds of removing user surprise in cases like Jim's (especially if
you can find some other cases where it matters).  But I doubt that we'd
get any kind of noticeable run-time or code-size win.
        regards, tom lane

Re: Add hint for function named "is"

From

Robert Haas

Date:

12 August 2016, 17:58:23

On Fri, Aug 12, 2016 at 12:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Fri, Aug 12, 2016 at 9:40 AM, Greg Stark <stark@mit.edu> wrote:
>>> I wonder whether it's really worth keeping postfix operators. They
>>> seem to keep causing these kinds of headaches and I wonder how much
>>> the grammar tables would be simplified by removing them.
>
>> I've wondered the same thing at other times.  The only postfix
>> operator we ship in core is numeric_fac, and frankly it doesn't seem
>> worth it just for that.  If we decided that factorial(n) was adequate
>> notation for that case, and that we didn't care about any hypothetical
>> user-defined postfix operators either, I think we simplify or improve
>> a number of things.
>
> [ quick experiment... ]  Simply removing the two postfix-operator
> productions produces no meaningful savings (~0.5%), which is unsurprising
> because after all they're just two more productions to Bison.  It's
> possible we could save more by simplifying the existing hacks elsewhere
> in the grammar that were needed to work around ambiguities induced by
> postfix operators.  But that would take quite a bit of actual work,
> and I suspect we'd end up with a similar result that the tables don't
> actually get much smaller.  You could argue for this on the grounds of
> some reduced intellectual complexity in gram.y, and more forcefully on
> the grounds of removing user surprise in cases like Jim's (especially if
> you can find some other cases where it matters).  But I doubt that we'd
> get any kind of noticeable run-time or code-size win.

Half a percent for two productions is not bad, but I think the real
win would be in removing ambiguity from the grammar.  We get periodic
complaints about the fact that things like "SELECT 3 cache" don't work
because cache is an unreserved keyword, and postfix operators are one
of the reasons why we can't do better:
           /*            * We support omitting AS only for column labels that aren't            * any known keyword.
Thereis an ambiguity against postfix            * operators: is "a ! b" an infix expression, or a postfix            *
expressionand a column label?  We prefer to resolve this            * as an infix expression, which we accomplish by
assigning           * IDENT a precedence higher than POSTFIXOP.            */
 

I think I experimented with this a while ago and found that even after
removing postfix operators there was at least one other grammar
problem that prevented us from accepting ColLabel there.  I gave up
and didn't dig further, but maybe we should.  I sort of like the
elegance of supporting user-defied prefix and postfix operators, but
in practice this is a not-especially-infrequent problem for people
migrating from other databases, a consideration that might be judged
to outweigh that elegance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add hint for function named "is"

From

Tom Lane

Date:

12 August 2016, 18:06:03

Robert Haas <robertmhaas@gmail.com> writes:
> Half a percent for two productions is not bad, but I think the real
> win would be in removing ambiguity from the grammar.  We get periodic
> complaints about the fact that things like "SELECT 3 cache" don't work
> because cache is an unreserved keyword, and postfix operators are one
> of the reasons why we can't do better:

Agreed, if postfix operators were the only thing standing between us and
fixing that, it would be a pretty strong argument for removing them.

> I think I experimented with this a while ago and found that even after
> removing postfix operators there was at least one other grammar
> problem that prevented us from accepting ColLabel there.  I gave up
> and didn't dig further, but maybe we should.

Yes, it would be good to find that out.  I think there's a whole bunch of
intertwined issues there, though; this isn't likely to be an easy change.
The comment at gram.y lines 679ff lists several things that are relevant,
and might or might not be simplifiable without postfix ops.
        regards, tom lane

Re: Add hint for function named "is"

From

Tom Lane

Date:

12 August 2016, 18:40:19

I wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> I think I experimented with this a while ago and found that even after
>> removing postfix operators there was at least one other grammar
>> problem that prevented us from accepting ColLabel there.  I gave up
>> and didn't dig further, but maybe we should.

> Yes, it would be good to find that out.

I poked at that a little bit, by dint of changing

-            | a_expr IDENT
+            | a_expr ColLabel

in the target_el production and then seeing what Bison complained about.
The majority of the problems boil down to variants of this:

state 997
 1691 character: CHARACTER . opt_varying
   VARYING  shift, and go to state 1597
   VARYING   [reduce using rule 1698 (opt_varying)]   $default  reduce using rule 1698 (opt_varying)
   opt_varying  go to state 1600

What this is telling us is that given input like, say,
SELECT 'foo'::character varying

Bison is no longer sure whether "varying" is meant as a type name modifier
or a ColLabel.  And indeed there is *no* principled answer to that that
doesn't involve giving up the ability for "varying" to be a ColLabel.
Just promoting it to a fully reserved word (which it is not today)
wouldn't be enough, because right now even fully reserved words can be
ColLabels.

Another problem is here:

state 1846
 1762 a_expr: a_expr ISNULL . 2418 type_func_name_keyword: ISNULL .
   $end       reduce using rule 1762 (a_expr)   $end       [reduce using rule 2418 (type_func_name_keyword)]

pointing out that "SELECT 42 ISNULL" is now ambiguous.  Since ISNULL
is nonstandard, maybe dropping support for it would be feasible.

There are some other problems that look probably fixable with
refactoring, but AFAICS the ones above are fundamental.

So we basically can't have "you can use anything at all as a ColLabel
without AS".  We could probably somewhat reduce the set of words you're
not allowed to use that way, but we could not even promise that all
words that are currently unreserved would work.

It's likely that by rejiggering the precedence of productions, we could
resolve these ambiguities in favor of "it's not a ColLabel if it appears
in a context like this".  And probably that wouldn't be too surprising
most of the time.  But depending on precedence to resolve fundamentally
ambiguous grammar is always gonna bite you sometimes.  People understand
it (... usually ...) in the context of parsing multi-operator expressions,
but for other things it's not a great tool.
        regards, tom lane

Re: Add hint for function named "is"

From

Greg Stark

Date:

13 August 2016, 00:00:42

On Fri, Aug 12, 2016 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> pointing out that "SELECT 42 ISNULL" is now ambiguous.  Since ISNULL
> is nonstandard, maybe dropping support for it would be feasible.

Isn't ISNULL one of the lexical tricks we used to effectively give
bison two token lookahead?

-- 
greg

Re: Add hint for function named "is"

From

Tom Lane

Date:

13 August 2016, 00:54:30

Greg Stark <stark@mit.edu> writes:
> On Fri, Aug 12, 2016 at 7:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> pointing out that "SELECT 42 ISNULL" is now ambiguous.  Since ISNULL
>> is nonstandard, maybe dropping support for it would be feasible.

> Isn't ISNULL one of the lexical tricks we used to effectively give
> bison two token lookahead?

No, it's a SQL-exposed feature, though I think it's just there for
compatibility with some random other DBMS.  See also NOTNULL.
        regards, tom lane

Re: Add hint for function named "is"

From

Jim Nasby

Date:

13 August 2016, 19:16:28

On 8/12/16 1:40 PM, Tom Lane wrote:
> What this is telling us is that given input like, say,
>
>     SELECT 'foo'::character varying
>
> Bison is no longer sure whether "varying" is meant as a type name modifier
> or a ColLabel.  And indeed there is *no* principled answer to that that
> doesn't involve giving up the ability for "varying" to be a ColLabel.
> Just promoting it to a fully reserved word (which it is not today)
> wouldn't be enough, because right now even fully reserved words can be
> ColLabels.

FWIW, I've always disliked how some types could contains spaces without 
being quoted. AFAIK nothing else in the system allows that, and I don't 
see why character varying and timestamp with* should get a special pass.

I doubt we could get rid of this in CREATE TABLE, but I wonder how many 
people actually cast using the unquoted form.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Add hint for function named "is"

From

Tom Lane

Date:

13 August 2016, 20:02:56

Jim Nasby <Jim.Nasby@bluetreble.com> writes:
> FWIW, I've always disliked how some types could contains spaces without 
> being quoted. AFAIK nothing else in the system allows that, and I don't 
> see why character varying and timestamp with* should get a special pass.

Because the SQL standard says so.  If we were going to start ignoring
SQL-standard spellings, this area would not be very high on my list,
actually.  Their insistence on inventing crazy new special syntaxes
for things that could be normal function calls is what bugs me ...
        regards, tom lane

Re: [HACKERS] Add hint for function named "is"

From

Robert Haas

Date:

30 April 2018, 21:00:23

On Fri, Aug 12, 2016 at 2:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> It's likely that by rejiggering the precedence of productions, we could
> resolve these ambiguities in favor of "it's not a ColLabel if it appears
> in a context like this".  And probably that wouldn't be too surprising
> most of the time.  But depending on precedence to resolve fundamentally
> ambiguous grammar is always gonna bite you sometimes.  People understand
> it (... usually ...) in the context of parsing multi-operator expressions,
> but for other things it's not a great tool.

I poked at this a little more today.  It might be a little easier to
shoot for changing "a_expr IDENT" to "a_expr ColId" than to shoot for
allowing "a_expr ColLabel", or even just allowing both "a_expr IDENT"
and "a_expr unreserved_keyword".  With the rule for postfix operators
is removed, following unreserved keywords are problematic: DAY FILTER
HOUR MINUTE MONTH OVER SECOND VARYING WITHIN WITHOUT YEAR.  I think
allowing ColId would create further problems with PRECISION, CHAR, and
CHARACTER as well.  But they are all shift/reduce conflicts, which
somehow scare me quite a bit less than the reduce/reduce conflicts one
gets when trying to allow ColLabel.

I think there would be a lot of value in coming up with some kind of
incremental improvement here; this is a common annoyance for users
migrating from other database systems (and one in particular).  We
currently have 440 keywords of which 290 are unreserved, 50 are
column-name keywords, 23 are type-func-name keywords, and 77 are fully
reserved.  For more than 300 of those key words, the postfix operator
rule is the only thing preventing us from allowing it as a column
label without "AS".  Eliminating just those -- or even a large subset
of those -- would make things noticeably better, IMHO.

Technically, that doesn't look hard to do: (1) remove the rule that
allows postfix ops, or restrict it to operators beginning with ! or
where OPERATOR() notation is used, or whatever; (2) add a new
production target_el_keyword that includes some or all of the keywords
that don't cause grammar conflicts, (3) add a rule that target_el can
be "a expr target_el_keyword", (4) profit.  Or, since that would make
maintaining target_el_keyword a nuisance, split unreserved_keyword
into two new categories, unreserved_keyword and
very_slightly_reserved_keyword, and update elsewhere accordingly.
However, I foresee that Tom will object to the idea of creating a new
category of keywords, and I'm happy to do something else if we can
figure out what that other thing is.  I'm not immediately sure how to
use operator precedence to resolve these ambiguities; I think that for
that to work we'd have assign a precedence to every keyword that we
want to allow here, just as we currently do for IDENT.  And that seems
like it could have unforeseen side effects.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Add hint for function named "is"

From

Tom Lane

Date:

30 April 2018, 21:14:32

Robert Haas <robertmhaas@gmail.com> writes:
> I think there would be a lot of value in coming up with some kind of
> incremental improvement here; this is a common annoyance for users
> migrating from other database systems (and one in particular).

Agreed, but ...

> Technically, that doesn't look hard to do: (1) remove the rule that
> allows postfix ops, or restrict it to operators beginning with ! or
> where OPERATOR() notation is used, or whatever; (2) add a new
> production target_el_keyword that includes some or all of the keywords
> that don't cause grammar conflicts, (3) add a rule that target_el can
> be "a expr target_el_keyword", (4) profit.  Or, since that would make
> maintaining target_el_keyword a nuisance, split unreserved_keyword
> into two new categories, unreserved_keyword and
> very_slightly_reserved_keyword, and update elsewhere accordingly.
> However, I foresee that Tom will object to the idea of creating a new
> category of keywords, and I'm happy to do something else if we can
> figure out what that other thing is.

Not *nearly* as much as I'd object to mostly-breaking postfix operators.
Now admittedly, if the Berkeley guys had never put those in, nobody
would miss them.  But they're there, and I'm afraid that users are
likely depending on them.  Neither of your suggestions above would be
any less damaging to such users than removing the feature altogether.

> I'm not immediately sure how to
> use operator precedence to resolve these ambiguities;

Unfortunately, this thread got swapped out of my brain long ago,
so I'm not sure what I was talking about either.  I could take
another look sometime.

            regards, tom lane

Re: [HACKERS] Add hint for function named "is"

From

Robert Haas

Date:

30 April 2018, 22:02:11

On Mon, Apr 30, 2018 at 2:14 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Not *nearly* as much as I'd object to mostly-breaking postfix operators.
> Now admittedly, if the Berkeley guys had never put those in, nobody
> would miss them.  But they're there, and I'm afraid that users are
> likely depending on them.  Neither of your suggestions above would be
> any less damaging to such users than removing the feature altogether.

Well, that sounds different than what you said upthread in August of 2016:

tgl> Agreed, if postfix operators were the only thing standing between us and
tgl> fixing that, it would be a pretty strong argument for removing them.

I agree that there could be some people using numeric_fac, the only
postfix operator we ship.  Probably not many, since it's an
inefficient implementation of an operation that overflows for all but
a few tens of thousands of possible input values, but some.  But we
could keep that operator working without supporting the facility in
general.  Do we have any evidence at all that anybody has got a
user-defined postfix operator out there?  I'm reluctant to keep a
feature that's blocking such an important improvement unless we can
come up with some real examples of it being used.  I've never run
across it, and a Google search didn't turn up any examples of anybody
else using it, either.

The basic problem here, for those following along from home, is that
if we allow "SELECT h hour FROM tab" to mean select column h with
column alias hour, then "SELECT hour + 24 * day" becomes ambiguous in
the face of postfix operators: it could either mean what it obviously
is intended to mean, or it could mean that you should look for a
postfix * operator, apply that to 24, add hour to the result, and then
call the resulting column "day".  This is less obviously insane when
you use a differently-named operator and columns -- who knows which
way "SELECT thunk + squidge!!! snarkke FROM frobnitz" is intended to
be interpreted.  But I'm just a bit doubtful that anyone is going to
stop using PostgreSQL because we take away their ability to make !!! a
postfix operator, whereas I think the inability to alias columns
without using AS when the column label happens to be a keyword IS a
problem for adoption.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company