Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level? - Mailing list pgsql-hackers
| From | Tom Lane |
|---|---|
| Subject | Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level? |
| Date | |
| Msg-id | 5348.951068504@sss.pgh.pa.us Whole thread Raw |
| In response to | Re: [HACKERS] Re: SQL compliance - why -- comments only at psql level? (Hannu Krosing <hannu@tm.ee>) |
| Responses |
Re: [HACKERS] Re: SQL compliance - why -- comments only at psql
level?
|
| List | pgsql-hackers |
Hannu Krosing <hannu@tm.ee> writes:
> Could you test with some other frontend (python, perl, tcl, C) ?
Yup, psql is untrustworthy as a means of testing the backend's comment
handling ;-).
I committed lexer changes on Friday evening that I believe fix all of
the backend's problems with \r versus \n. The issue with unterminated
-- comments, which was Hannu's original complaint, was fixed awhile ago;
but we still had problems with comments terminated with \r instead of
\n, as well as some non-SQL-compliant behavior for -- comments between
the segments of a multiline literal, etc etc.
While fixing this I realized that there are some fundamental
discrepancies between the way the backend recognizes comments and the
way that psql does. These arise from the fact that the comment
introducer sequences /* and -- are also legal as parts of operator
names, and since the backend is based on lex which uses greedy longest-
available-match rules, you get things like this:
select *-- 123
ERROR: Can't find left op '*--' for type 23
(Parsing '*--' as an operator name wins over parsing just '*' as an
operator name, so that '--' would be recognized on the next call.)
More subtly,
select /**/- 22
ERROR: parser: parse error at or near ""
which is the backend's rather lame excuse for an "unterminated comment"
error. What happens here is that the sequence /**/- is bit off as a
single lexer token, then tested in this order to see if it is(a) a complete "/* ... */" comment (nope),(b) the start of
acomment, "/* anything" (yup), or(c) an operator (which would succeed if it got the chance).
There does not seem to be any way to persuade lex to stop at the "*/"
if it has a chance to recognize a longer token by applying the operator
rule.
Both of these problems are easily avoided by inserting some whitespace,
but I wonder whether we ought to try to fix them for real. One way
that this could be done would be to alter the lexer rules so that
operators are lexed a single character at a time, which'd eliminate
lex's tendency to recognize a long operator name in place of a comment.
Then we'd need a post-pass to recombine adjacent operator characters into
a single token. (This would forever prevent anyone from using operator
names that include '--' or '/*', but I'm not sure that's a bad thing.)
The post-pass would also be a mighty convenient place to fix the NOT NULL
problem that's giving us trouble in another thread: the post-pass would
need one-token lookahead anyway, so it could very easily convert NOT
followed by NULL into a single special token.
Meanwhile, psql is using some ad-hoc code to recognize comments,
rather than a lexer, and it thinks both of these sequences are indeed
comments. I also find that it strips out the -- flavor of comment,
but sends the /* */ flavor on through, which is just plain inconsistent.
I suggest we change psql to not strip -- comments either. The only
reason for psql to be in the comment-recognition business at all is
so that it can determine whether a semicolon is end-of-query or just
a character in a comment.
Another thing I'd like to fix here is to get the backend to produce
a more useful error message than 'parse error at or near ""' when it's
presented with an unterminated comment or unterminated literal.
The flex manual recommends coding like
<quote><<EOF>> { error( "unterminated quote" ); yyterminate(); }
but <<EOF>> is a flex-ism not supported by regular lex. We already
tell people they have to use flex (though I'm not sure that's *really*
necessary at present); do we want to set that requirement in stone?
Or does anyone know another way to get this effect?
regards, tom lane
pgsql-hackers by date: