Re: [PATCHES] Patch for more readable parse error messages - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [PATCHES] Patch for more readable parse error messages |
Date | |
Msg-id | 16589.951095020@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: [PATCHES] Patch for more readable parse error messages
Re: [HACKERS] Re: [PATCHES] Patch for more readable parse error messages Re: Re: [PATCHES] Patch for more readable parse error messages |
List | pgsql-hackers |
Jeroen van Vianen <jeroen@design.nl> writes: >> Does this work with a non-bison parser? It looks mighty >> bison-dependent to me... > I'm not sure, but it probably is flex dependent (but Postgres always needed > flex anyway). I'm not aware of any yacc / byacc / bison dependencies. Don't > know if anybody has been successful building Postgres with another parser > generator. Um, you're right of course --- those are lexer not parser datastructures you're poking into. Sorry for my confusion. We do in fact work with non-bison parser generators, or did last time I tried it (around 6.5 release). I would not like us to stop working with non-bison yaccs, since bison's output depends on alloca() which is not available everywhere. I'm not sure about the situation with lexers. We have been saying for a long time that flex was required, but since we got rid of the scanner's use of trailing context ('/' rules) I think there is a better chance that it would work with vanilla lex. Anyone want to try that with current sources? > BTW, as we ship flex's output lex.yy.c (as scan.c) and bison's output > (gram.c) in the distribution, any user would be able to compile the > sources, but if they want to start hacking the .l or .y files, they'll > need appropriate tools. Right. I am not aware of any portability problems with flex's output as there are with bison's, so it may be that the concern is moot. We may just be able to say "use the prebuilt scan.c or get flex; we don't care about supporting vendor lexes anymore". I do see a potential problem with this patch that's not related to portability questions; it is that you're assuming that the lexer's furthest penetration into the source text is a good place to point at for parser errors. That may not be true always. In particular, I've been advocating solving some other problems by inserting a one-token lookahead buffer between the parser and the lexer. If that happens then you'd be off by (at least) one token in some cases. I think the way that this sort of thing is customarily handled in "real" compilers is that each token carries along an indication of just where it was found in the source, and then error messages can finger the right place without making assumptions about synchronization between different phases of the scanning/parsing process. That might be more work than we can justify for SQL queries; not sure. BTW, I think that the immediate problem of generating a good error message for unterminated comments and literals could be solved in other ways. This patch or something like it might be cool anyway, but you should bear in mind that printing out a query and then a marker that's supposed to line up with something in the query doesn't always work all that well. Consider a query that's several dozen lines long, such as a large table definition. If we had more control over the user interface and could highlight the offending token directly, I'd be more excited about doing something like this. (Actually, you could partially address that problem by only printing one line's worth of query text leading up to the error marker point. It would still be tricky to get it right in the presence of newlines, tabs, etc.) regards, tom lane
pgsql-hackers by date: