Re: Sync ECPG scanner with core - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Sync ECPG scanner with core |
Date | |
Msg-id | 17698.1542133289@sss.pgh.pa.us Whole thread Raw |
In response to | Sync ECPG scanner with core (John Naylor <jcnaylor@gmail.com>) |
Responses |
Re: Sync ECPG scanner with core
Re: Sync ECPG scanner with core |
List | pgsql-hackers |
John Naylor <jcnaylor@gmail.com> writes: > Commit 21c8ee7946 was one of several that sync'd the backend scanner > with the psql scanner, with the commit message stating "Someday it'd > be nice to make ecpg's pgc.l more easily diff'able too, but today is > not that day." Attached is a set of patches to make progress towards > that. The ECPG tests pass. There's probably more that could be done, > but this seems like a good base to start from. Looks like good stuff, pushed with a small additional amount of work. > -base_yyerror() is used once, but every other call site uses > mmerror(), so use that instead. Yeah, I think this is a bug, though minor. The other places that deal with unexpected EOF use mmfatal(PARSE_ERROR, ...) though, so I did it like that. > In the course of doing this, I noticed a few other possible ECPG > scanner cleanups to consider, although the payoff might not justify > the work involved: > -Copy process_integer_literal() from the core scanner (maybe only if > decimal_fail is also brought over). I went ahead and did both parts of that, mainly because as things stood the ecpg lexer would produce a different token sequence for "1..10" than the core would. I don't think this really matters right at the moment, but if for instance ecpg decided to insert spaces where it thought the token boundaries were, it would matter. > -Match core's handling of unicode escapes, even though pgc.l is not backup-free. Yeah, the amount of remaining diffs due to the omission of the anti-backup rules is a bit annoying and confusing. More generally, I wonder if it'd be practical to make pgc.l backup-free altogether. I'm not sure that the resulting gain in lexing speed would really be of interest to anyone, but just in terms of using uniform lexer design rules, it might be worthwhile. I tried a run with -b just to see, and indeed there are a bunch of backup cases in the ECPG-specific rules, so it'd take some work. > -Does ECPG still need its own functions for mm_alloc() and > mm_strdup(), if we have equivalents in src/common? The error handling isn't the same, so I don't think we can just replace them with the src/common functions. It is, however, potentially worth our trouble to fix things so that within pgc.l, palloc() and pstrdup() are macros for mm_alloc/mm_strdup, thereby further reducing the diffs to the core lexer. I'd be pretty tempted to also change all the hard references to base_yylval to go through a pointer variable, so that diffs like this go away: - yylval->str = pstrdup(yytext); + base_yylval.str = mm_strdup(yytext); I believe that that'd result in more compact code, too --- on most machines, references to global variables are more expensive in code bytes than indirecting through a local pointer variable. > -Use reentrant APIs? Hm. There's no functional value in that for ecpg, but maybe worth doing anyway to get rid of diffs like - addlit(yytext, yyleng, yyscanner); + addlit(yytext, yyleng); Guess it depends on how nitpicky you're feeling. In principle, reducing these remaining diffs would ease future maintenance of the lexers, but they're not something we change often. regards, tom lane
pgsql-hackers by date: