Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level? |
Date | |
Msg-id | 5348.951068504@sss.pgh.pa.us Whole thread Raw |
In response to | Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level? (Hannu Krosing <hannu@tm.ee>) |
Responses |
Re: [HACKERS] Re: SQL compliance - why -- comments only at psql
level?
|
List | pgsql-hackers |
Hannu Krosing <hannu@tm.ee> writes: > Could you test with some other frontend (python, perl, tcl, C) ? Yup, psql is untrustworthy as a means of testing the backend's comment handling ;-). I committed lexer changes on Friday evening that I believe fix all of the backend's problems with \r versus \n. The issue with unterminated -- comments, which was Hannu's original complaint, was fixed awhile ago; but we still had problems with comments terminated with \r instead of \n, as well as some non-SQL-compliant behavior for -- comments between the segments of a multiline literal, etc etc. While fixing this I realized that there are some fundamental discrepancies between the way the backend recognizes comments and the way that psql does. These arise from the fact that the comment introducer sequences /* and -- are also legal as parts of operator names, and since the backend is based on lex which uses greedy longest- available-match rules, you get things like this: select *-- 123 ERROR: Can't find left op '*--' for type 23 (Parsing '*--' as an operator name wins over parsing just '*' as an operator name, so that '--' would be recognized on the next call.) More subtly, select /**/- 22 ERROR: parser: parse error at or near "" which is the backend's rather lame excuse for an "unterminated comment" error. What happens here is that the sequence /**/- is bit off as a single lexer token, then tested in this order to see if it is(a) a complete "/* ... */" comment (nope),(b) the start of acomment, "/* anything" (yup), or(c) an operator (which would succeed if it got the chance). There does not seem to be any way to persuade lex to stop at the "*/" if it has a chance to recognize a longer token by applying the operator rule. Both of these problems are easily avoided by inserting some whitespace, but I wonder whether we ought to try to fix them for real. One way that this could be done would be to alter the lexer rules so that operators are lexed a single character at a time, which'd eliminate lex's tendency to recognize a long operator name in place of a comment. Then we'd need a post-pass to recombine adjacent operator characters into a single token. (This would forever prevent anyone from using operator names that include '--' or '/*', but I'm not sure that's a bad thing.) The post-pass would also be a mighty convenient place to fix the NOT NULL problem that's giving us trouble in another thread: the post-pass would need one-token lookahead anyway, so it could very easily convert NOT followed by NULL into a single special token. Meanwhile, psql is using some ad-hoc code to recognize comments, rather than a lexer, and it thinks both of these sequences are indeed comments. I also find that it strips out the -- flavor of comment, but sends the /* */ flavor on through, which is just plain inconsistent. I suggest we change psql to not strip -- comments either. The only reason for psql to be in the comment-recognition business at all is so that it can determine whether a semicolon is end-of-query or just a character in a comment. Another thing I'd like to fix here is to get the backend to produce a more useful error message than 'parse error at or near ""' when it's presented with an unterminated comment or unterminated literal. The flex manual recommends coding like <quote><<EOF>> { error( "unterminated quote" ); yyterminate(); } but <<EOF>> is a flex-ism not supported by regular lex. We already tell people they have to use flex (though I'm not sure that's *really* necessary at present); do we want to set that requirement in stone? Or does anyone know another way to get this effect? regards, tom lane
pgsql-hackers by date: