Thread: BUG #5732: parsing of: "WHERE mycol=123AND ..."

BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

"Josh Kupershmidt"

Date:

28 October 2010, 20:46:47

The following bug has been logged online:

Bug reference:      5732
Logged by:          Josh Kupershmidt
Email address:      schmiddy@gmail.com
PostgreSQL version: 8.3 and HEAD
Operating system:   Linux and OS X
Description:        parsing of: "WHERE mycol=123AND ..."
Details:

I noticed that Postgres in many cases will happily tokenize WHERE clauses
having no space between a condition and "AND" or "OR".

For example:
  CREATE TABLE mytab (mycol int);
  INSERT INTO mytab (mycol) VALUES (1), (2);

  SELECT * FROM mytab WHERE mycol = 1AND true;
  SELECT * FROM mytab WHERE mycol = 2OR true;

although some cases produce an error, as I would expect, such as:
  SELECT * FROM mytab WHERE mycol = 2::intOR true;

I think it would be more consistent to raise syntax errors in all these
cases.

Josh

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Tom Lane

Date:

28 October 2010, 21:01:18

"Josh Kupershmidt" <schmiddy@gmail.com> writes:
> I noticed that Postgres in many cases will happily tokenize WHERE clauses
> having no space between a condition and "AND" or "OR".

This has nothing to do with AND or OR.  Any situation where you have
some digits followed by something that can't be part of a number will
be lexed as two separate tokens.  I'm not sure we could change that,
because it's a pretty basic aspect of a flex lexer; and even if we
could I'm not eager to face the wrath of all the users whose queries
would break.  Leaving out "unnecessary" spaces is a pretty common habit.

            regards, tom lane

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Jeff Davis

Date:

28 October 2010, 21:03:34

On Thu, 2010-10-28 at 23:46 +0000, Josh Kupershmidt wrote:
>   SELECT * FROM mytab WHERE mycol = 2OR true;

Is that inconsistent with the standard?

Other languages seem to allow similar things, such as ruby and perl. For
instance, in ruby:

  puts 1if(true)

seems to be acceptable.

> although some cases produce an error, as I would expect, such as:
>   SELECT * FROM mytab WHERE mycol = 2::intOR true;

That's not the same. In that example, there's no hope of distinguishing
the identifier "int" from the keyword "OR".

> I think it would be more consistent to raise syntax errors in all these
> cases.

I don't really see a "bug" here. Is this causing you some kind of
problem?

Regards,
    Jeff Davis

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Tom Lane

Date:

28 October 2010, 21:20:28

Jeff Davis <pgsql@j-davis.com> writes:
> On Thu, 2010-10-28 at 23:46 +0000, Josh Kupershmidt wrote:
>> SELECT * FROM mytab WHERE mycol = 2OR true;

> Is that inconsistent with the standard?

I was just looking at that.  The spec lumps both <unsigned numeric
literal> and <keyword> under <nondelimiter token>, and says that there
must be a <separator> (ie, whitespace or comment) between adjacent
<nondelimiter token>s.  However, I would tend to read that as
instructing users how to write portable SQL, not as instructing
implementations that they must throw an error when they find two tokens
that aren't separated by whitespace.  The actual behavior of an
implementation in such a case could be regarded as a spec extension.

I experimented a bit with mysql's behavior, and it seems that (at least
in 5.1.51) what they do is treat "1and" or "2or" as if it were an
identifier.  They're definitely not throwing an error, at least not on
that token --- they will of course spit up later if the remainder of
the input is inconsistent with the assumption that that part is an
identifier.  I don't have any other SQL DBMSes handy to experiment with,
but I wouldn't be surprised to find multiple behaviors out there.

            regards, tom lane

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Josh Kupershmidt

Date:

28 October 2010, 21:58:04

On Thu, Oct 28, 2010 at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Josh Kupershmidt" <schmiddy@gmail.com> writes:
>> I noticed that Postgres in many cases will happily tokenize WHERE clauses
>> having no space between a condition and "AND" or "OR".
>
> This has nothing to do with AND or OR. =A0Any situation where you have
> some digits followed by something that can't be part of a number will
> be lexed as two separate tokens.

Yeah, I hadn't tried to pinpoint how widespread this feature/bug is in
the syntax. Though note, you can see this with e.g. text columns as
well, such as in:
SELECT * FROM mytab WHERE mycol =3D 'abc def'AND true;

Josh

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Josh Kupershmidt

Date:

28 October 2010, 22:29:45

On Thu, Oct 28, 2010 at 8:03 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> I don't really see a "bug" here. Is this causing you some kind of
> problem?

I happened to notice it while fixing up some code using multi-line
strings which had forgotten to put spaces in the SQL across lines. I
was just surprised Postgres didn't throw an error.

The only mild concern I have is if this could possibly lead to
ambiguous parsing in some situations, though I've played with some
examples and I haven't seen any yet. It would be nice to have this
behavior documented somewhere though.

Josh

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Tom Lane

Date:

28 October 2010, 22:52:35

Josh Kupershmidt <schmiddy@gmail.com> writes:
> The only mild concern I have is if this could possibly lead to
> ambiguous parsing in some situations, though I've played with some
> examples and I haven't seen any yet. It would be nice to have this
> behavior documented somewhere though.

The fine manual currently says (at the head of section 4.1):

A token can be a key word, an identifier, a quoted identifier, a literal
(or constant), or a special character symbol. Tokens are normally
separated by whitespace (space, tab, newline), but need not be if there
is no ambiguity (which is generally only the case if a special character
is adjacent to some other token type).

The parenthetical remark at the end fails to point out the special case
of number-followed-by-identifier-that-doesn't-look-like-an-exponent.
But I'm not sure that it's reasonable to try to shoehorn in a mention
of the case.  Might be a good idea to change "generally" to "usually",
though, since "generally" might be read as implying that that's the
exact and only rule.

            regards, tom lane

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Greg Stark

Date:

30 October 2010, 15:14:40

On Thu, Oct 28, 2010 at 5:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I experimented a bit with mysql's behavior, and it seems that (at least
> in 5.1.51) what they do is treat "1and" or "2or" as if it were an
> identifier. =A0They're definitely not throwing an error, at least not on
>

I guess the eleant question is what the lexical elements section of
the standard says about identifiers. It pretty clearly declares that
they can't start with digits:

<identifier body>    ::=3D  <identifier start> [ <identifier part>... ]
<identifier part>      ::=3D   <identifier start>  | <identifier extend>
<identifier start>     ::=3D !! See the Syntax Rules
<identifier extend> ::=3D !! See the Syntax Rules

1) An <identifier start> is any character in the Unicode General
Category classes =93Lu=94, =93Ll=94, =93Lt=94, =93Lm=94, =93Lo=94, or =93Nl=
=94.
NOTE 70 =97 The Unicode General Category classes =93Lu=94, =93Ll=94, =93Lt=
=94, =93Lm=94,
=93Lo=94, and =93Nl=94 are assigned to Unicode characters that are,
respectively, upper-case letters, lower-case letters, title-case
letters, modifier letters, other letters, and letter numbers.

2) An <identifier extend> is U+00B7, =93Middle Dot=94, or any character in
the Unicode General Category classes =93Mn=94, =93Mc=94, =93Nd=94, =93Pc=94=
, or =93Cf=94.
NOTE 71 =97 The Unicode General Category classes =93Mn=94, =93Mc=94, =93Nd=
=94, =93Pc=94,
and =93Cf=94 are assigned to Unicode characters that are, respectively,
nonspacing marks, spacing combining marks, decimal numbers, connector
punctuations, and formatting codes.

--=20
greg

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Tom Lane

Date:

30 October 2010, 15:24:01

Greg Stark <gsstark@mit.edu> writes:
> On Thu, Oct 28, 2010 at 5:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I experimented a bit with mysql's behavior, and it seems that (at least
>> in 5.1.51) what they do is treat "1and" or "2or" as if it were an
>> identifier.  They're definitely not throwing an error, at least not on

> I guess the eleant question is what the lexical elements section of
> the standard says about identifiers. It pretty clearly declares that
> they can't start with digits:

Yeah.  The key point IMO is that this *input* is not spec-compliant.
So implementations can either throw an error, or define their own
spec extension as to how to interpret it.  I find mysql's behavior
interesting mostly because it shows that throwing an error isn't
necessarily common practice.  Anybody want to try Oracle, DB2, etc?

            regards, tom lane

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Gary Doades

Date:

30 October 2010, 15:46:44

MS SQL server 2008 has no problem with this:

select * from client where CLIENT_ID = 12AND SNAME='Smith'

Returns the expected row.

PostgreSQL 9.0 has no problem with it either, again throwing no error
and returning the expected result.

Regards,
Gary.

On 30/10/2010 7:23 PM, Tom Lane wrote:
> Greg Stark<gsstark@mit.edu>  writes:
>> On Thu, Oct 28, 2010 at 5:20 PM, Tom Lane<tgl@sss.pgh.pa.us>  wrote:
>>> I experimented a bit with mysql's behavior, and it seems that (at least
>>> in 5.1.51) what they do is treat "1and" or "2or" as if it were an
>>> identifier.  They're definitely not throwing an error, at least not on
>> I guess the eleant question is what the lexical elements section of
>> the standard says about identifiers. It pretty clearly declares that
>> they can't start with digits:
> Yeah.  The key point IMO is that this *input* is not spec-compliant.
> So implementations can either throw an error, or define their own
> spec extension as to how to interpret it.  I find mysql's behavior
> interesting mostly because it shows that throwing an error isn't
> necessarily common practice.  Anybody want to try Oracle, DB2, etc?
>
>             regards, tom lane
>

Re: BUG #5732: parsing of: "WHERE mycol=123AND ..."

From

Greg Stark

Date:

30 October 2010, 16:20:01

On Sat, Oct 30, 2010 at 11:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Anybody want to try Oracle, DB2, etc?

Oracle seems to behave like us:

SQL> select * from (select 1 as x from dual) where 1=1and x=1;

         X
----------
         1



--
greg