Thread: BUG #5732: parsing of: "WHERE mycol=123AND ..."
The following bug has been logged online: Bug reference: 5732 Logged by: Josh Kupershmidt Email address: schmiddy@gmail.com PostgreSQL version: 8.3 and HEAD Operating system: Linux and OS X Description: parsing of: "WHERE mycol=123AND ..." Details: I noticed that Postgres in many cases will happily tokenize WHERE clauses having no space between a condition and "AND" or "OR". For example: CREATE TABLE mytab (mycol int); INSERT INTO mytab (mycol) VALUES (1), (2); SELECT * FROM mytab WHERE mycol = 1AND true; SELECT * FROM mytab WHERE mycol = 2OR true; although some cases produce an error, as I would expect, such as: SELECT * FROM mytab WHERE mycol = 2::intOR true; I think it would be more consistent to raise syntax errors in all these cases. Josh
"Josh Kupershmidt" <schmiddy@gmail.com> writes: > I noticed that Postgres in many cases will happily tokenize WHERE clauses > having no space between a condition and "AND" or "OR". This has nothing to do with AND or OR. Any situation where you have some digits followed by something that can't be part of a number will be lexed as two separate tokens. I'm not sure we could change that, because it's a pretty basic aspect of a flex lexer; and even if we could I'm not eager to face the wrath of all the users whose queries would break. Leaving out "unnecessary" spaces is a pretty common habit. regards, tom lane
On Thu, 2010-10-28 at 23:46 +0000, Josh Kupershmidt wrote: > SELECT * FROM mytab WHERE mycol = 2OR true; Is that inconsistent with the standard? Other languages seem to allow similar things, such as ruby and perl. For instance, in ruby: puts 1if(true) seems to be acceptable. > although some cases produce an error, as I would expect, such as: > SELECT * FROM mytab WHERE mycol = 2::intOR true; That's not the same. In that example, there's no hope of distinguishing the identifier "int" from the keyword "OR". > I think it would be more consistent to raise syntax errors in all these > cases. I don't really see a "bug" here. Is this causing you some kind of problem? Regards, Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > On Thu, 2010-10-28 at 23:46 +0000, Josh Kupershmidt wrote: >> SELECT * FROM mytab WHERE mycol = 2OR true; > Is that inconsistent with the standard? I was just looking at that. The spec lumps both <unsigned numeric literal> and <keyword> under <nondelimiter token>, and says that there must be a <separator> (ie, whitespace or comment) between adjacent <nondelimiter token>s. However, I would tend to read that as instructing users how to write portable SQL, not as instructing implementations that they must throw an error when they find two tokens that aren't separated by whitespace. The actual behavior of an implementation in such a case could be regarded as a spec extension. I experimented a bit with mysql's behavior, and it seems that (at least in 5.1.51) what they do is treat "1and" or "2or" as if it were an identifier. They're definitely not throwing an error, at least not on that token --- they will of course spit up later if the remainder of the input is inconsistent with the assumption that that part is an identifier. I don't have any other SQL DBMSes handy to experiment with, but I wouldn't be surprised to find multiple behaviors out there. regards, tom lane
On Thu, Oct 28, 2010 at 8:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "Josh Kupershmidt" <schmiddy@gmail.com> writes: >> I noticed that Postgres in many cases will happily tokenize WHERE clauses >> having no space between a condition and "AND" or "OR". > > This has nothing to do with AND or OR. =A0Any situation where you have > some digits followed by something that can't be part of a number will > be lexed as two separate tokens. Yeah, I hadn't tried to pinpoint how widespread this feature/bug is in the syntax. Though note, you can see this with e.g. text columns as well, such as in: SELECT * FROM mytab WHERE mycol =3D 'abc def'AND true; Josh
On Thu, Oct 28, 2010 at 8:03 PM, Jeff Davis <pgsql@j-davis.com> wrote: > I don't really see a "bug" here. Is this causing you some kind of > problem? I happened to notice it while fixing up some code using multi-line strings which had forgotten to put spaces in the SQL across lines. I was just surprised Postgres didn't throw an error. The only mild concern I have is if this could possibly lead to ambiguous parsing in some situations, though I've played with some examples and I haven't seen any yet. It would be nice to have this behavior documented somewhere though. Josh
Josh Kupershmidt <schmiddy@gmail.com> writes: > The only mild concern I have is if this could possibly lead to > ambiguous parsing in some situations, though I've played with some > examples and I haven't seen any yet. It would be nice to have this > behavior documented somewhere though. The fine manual currently says (at the head of section 4.1): A token can be a key word, an identifier, a quoted identifier, a literal (or constant), or a special character symbol. Tokens are normally separated by whitespace (space, tab, newline), but need not be if there is no ambiguity (which is generally only the case if a special character is adjacent to some other token type). The parenthetical remark at the end fails to point out the special case of number-followed-by-identifier-that-doesn't-look-like-an-exponent. But I'm not sure that it's reasonable to try to shoehorn in a mention of the case. Might be a good idea to change "generally" to "usually", though, since "generally" might be read as implying that that's the exact and only rule. regards, tom lane
On Thu, Oct 28, 2010 at 5:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I experimented a bit with mysql's behavior, and it seems that (at least > in 5.1.51) what they do is treat "1and" or "2or" as if it were an > identifier. =A0They're definitely not throwing an error, at least not on > I guess the eleant question is what the lexical elements section of the standard says about identifiers. It pretty clearly declares that they can't start with digits: <identifier body> ::=3D <identifier start> [ <identifier part>... ] <identifier part> ::=3D <identifier start> | <identifier extend> <identifier start> ::=3D !! See the Syntax Rules <identifier extend> ::=3D !! See the Syntax Rules 1) An <identifier start> is any character in the Unicode General Category classes =93Lu=94, =93Ll=94, =93Lt=94, =93Lm=94, =93Lo=94, or =93Nl= =94. NOTE 70 =97 The Unicode General Category classes =93Lu=94, =93Ll=94, =93Lt= =94, =93Lm=94, =93Lo=94, and =93Nl=94 are assigned to Unicode characters that are, respectively, upper-case letters, lower-case letters, title-case letters, modifier letters, other letters, and letter numbers. 2) An <identifier extend> is U+00B7, =93Middle Dot=94, or any character in the Unicode General Category classes =93Mn=94, =93Mc=94, =93Nd=94, =93Pc=94= , or =93Cf=94. NOTE 71 =97 The Unicode General Category classes =93Mn=94, =93Mc=94, =93Nd= =94, =93Pc=94, and =93Cf=94 are assigned to Unicode characters that are, respectively, nonspacing marks, spacing combining marks, decimal numbers, connector punctuations, and formatting codes. --=20 greg
Greg Stark <gsstark@mit.edu> writes: > On Thu, Oct 28, 2010 at 5:20 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I experimented a bit with mysql's behavior, and it seems that (at least >> in 5.1.51) what they do is treat "1and" or "2or" as if it were an >> identifier. They're definitely not throwing an error, at least not on > I guess the eleant question is what the lexical elements section of > the standard says about identifiers. It pretty clearly declares that > they can't start with digits: Yeah. The key point IMO is that this *input* is not spec-compliant. So implementations can either throw an error, or define their own spec extension as to how to interpret it. I find mysql's behavior interesting mostly because it shows that throwing an error isn't necessarily common practice. Anybody want to try Oracle, DB2, etc? regards, tom lane
MS SQL server 2008 has no problem with this: select * from client where CLIENT_ID = 12AND SNAME='Smith' Returns the expected row. PostgreSQL 9.0 has no problem with it either, again throwing no error and returning the expected result. Regards, Gary. On 30/10/2010 7:23 PM, Tom Lane wrote: > Greg Stark<gsstark@mit.edu> writes: >> On Thu, Oct 28, 2010 at 5:20 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>> I experimented a bit with mysql's behavior, and it seems that (at least >>> in 5.1.51) what they do is treat "1and" or "2or" as if it were an >>> identifier. They're definitely not throwing an error, at least not on >> I guess the eleant question is what the lexical elements section of >> the standard says about identifiers. It pretty clearly declares that >> they can't start with digits: > Yeah. The key point IMO is that this *input* is not spec-compliant. > So implementations can either throw an error, or define their own > spec extension as to how to interpret it. I find mysql's behavior > interesting mostly because it shows that throwing an error isn't > necessarily common practice. Anybody want to try Oracle, DB2, etc? > > regards, tom lane >
On Sat, Oct 30, 2010 at 11:23 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Anybody want to try Oracle, DB2, etc? Oracle seems to behave like us: SQL> select * from (select 1 as x from dual) where 1=1and x=1; X ---------- 1 -- greg