Thread: A couple of issues with psql variable substitution
On my way to do something else entirely, I came across a couple of things that are not very nice about psql's lexer: 1. Somebody broke the no-backtracking property back in 9.0 while adding quoted variable substitution. According to the flex manual, use of backtracking creates a performance penalty. We once measured the backend's lexer as being about a third faster with backtrack avoidance, and presumably it's about the same for psql's. This is not hard to fix, but should I consider it a bug fix and back-patch? We've not had complaints about psql getting slower as of 9.0. 2. The lexer rules associated with variable substitution think that variable names can consist only of ASCII letters and digits (and underscores). The psql manual is noncommittal about whether non-ASCII characters are allowed, but a reasonable person would think that the rules ought to be the same as the backend's idea of what an identifier is. Does anybody have a problem with improving that? (I'm not proposing this part as a bug fix, because it does look a little bit more invasive to fix, because of the way psql deals with unsafe multibyte encodings.) regards, tom lane
On Thu, Aug 25, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > On my way to do something else entirely, I came across a couple of > things that are not very nice about psql's lexer: > > 1. Somebody broke the no-backtracking property back in 9.0 while adding > quoted variable substitution. According to the flex manual, use of > backtracking creates a performance penalty. We once measured the > backend's lexer as being about a third faster with backtrack avoidance, > and presumably it's about the same for psql's. This is not hard to fix, > but should I consider it a bug fix and back-patch? We've not had > complaints about psql getting slower as of 9.0. That may well have been me. How would I have known that I broke it? Also, how invasive is the fix? > 2. The lexer rules associated with variable substitution think that > variable names can consist only of ASCII letters and digits (and > underscores). The psql manual is noncommittal about whether non-ASCII > characters are allowed, but a reasonable person would think that the > rules ought to be the same as the backend's idea of what an identifier > is. Does anybody have a problem with improving that? Nope. Or at least, I don't. > (I'm not > proposing this part as a bug fix, because it does look a little bit > more invasive to fix, because of the way psql deals with unsafe > multibyte encodings.) +1 for not back-patching this part. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Aug 25, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> 1. Somebody broke the no-backtracking property back in 9.0 while adding >> quoted variable substitution. According to the flex manual, use of >> backtracking creates a performance penalty. We once measured the >> backend's lexer as being about a third faster with backtrack avoidance, >> and presumably it's about the same for psql's. This is not hard to fix, >> but should I consider it a bug fix and back-patch? We've not had >> complaints about psql getting slower as of 9.0. > That may well have been me. [ checks "git blame" ] Well, you commmitted the patch anyway: d0cfc018. > How would I have known that I broke it? Per the header comments in the backend lexer, you should run flex with "-b" switch and verify that the resulting lex.backup file says "no backing up". I've occasionally thought about automating that, but I'm not sure if the output is entirely locale- and flex-version-independent. > Also, how invasive is the fix? We need to add a couple more rules that will match an unterminated quoted variable and do something reasonable (probably just throw back everything but the colon with yyless). I've not coded it but I think it can't be more than a dozen lines or so. regards, tom lane
Excerpts from Tom Lane's message of jue ago 25 14:00:57 -0300 2011: > Robert Haas <robertmhaas@gmail.com> writes: > > On Thu, Aug 25, 2011 at 12:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> 1. Somebody broke the no-backtracking property back in 9.0 while adding > >> quoted variable substitution. According to the flex manual, use of > >> backtracking creates a performance penalty. We once measured the > >> backend's lexer as being about a third faster with backtrack avoidance, > >> and presumably it's about the same for psql's. This is not hard to fix, > >> but should I consider it a bug fix and back-patch? We've not had > >> complaints about psql getting slower as of 9.0. > > > That may well have been me. > > [ checks "git blame" ] Well, you commmitted the patch anyway: d0cfc018. > > > How would I have known that I broke it? > > Per the header comments in the backend lexer, you should run flex with > "-b" switch and verify that the resulting lex.backup file says "no > backing up". I've occasionally thought about automating that, but I'm > not sure if the output is entirely locale- and flex-version-independent. It is locale dependent, though of course for the automated check you could just run flex under LC_ALL=C. $ /usr/bin/flex -Cfe -b /pgsql/source/REL8_4_STABLE/src/bin/psql/psqlscan.l $ cat lex.backup Sin retroceso. $ LC_ALL=C /usr/bin/flex -Cfe -b /pgsql/source/REL8_4_STABLE/src/bin/psql/psqlscan.l $ cat lex.backup No backing up. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 08/25/2011 01:16 PM, Alvaro Herrera wrote: > Excerpts from Tom Lane's message of jue ago 25 14:00:57 -0300 2011: >> Robert Haas<robertmhaas@gmail.com> writes: >>> On Thu, Aug 25, 2011 at 12:47 PM, Tom Lane<tgl@sss.pgh.pa.us> wrote: >>>> 1. Somebody broke the no-backtracking property back in 9.0 while adding >>>> quoted variable substitution. According to the flex manual, use of >>>> backtracking creates a performance penalty. We once measured the >>>> backend's lexer as being about a third faster with backtrack avoidance, >>>> and presumably it's about the same for psql's. This is not hard to fix, >>>> but should I consider it a bug fix and back-patch? We've not had >>>> complaints about psql getting slower as of 9.0. >>> That may well have been me. >> [ checks "git blame" ] Well, you commmitted the patch anyway: d0cfc018. >> >>> How would I have known that I broke it? >> Per the header comments in the backend lexer, you should run flex with >> "-b" switch and verify that the resulting lex.backup file says "no >> backing up". I've occasionally thought about automating that, but I'm >> not sure if the output is entirely locale- and flex-version-independent. > It is locale dependent, though of course for the automated check you > could just run flex under LC_ALL=C. > > $ /usr/bin/flex -Cfe -b /pgsql/source/REL8_4_STABLE/src/bin/psql/psqlscan.l > $ cat lex.backup > Sin retroceso. > $ LC_ALL=C /usr/bin/flex -Cfe -b /pgsql/source/REL8_4_STABLE/src/bin/psql/psqlscan.l > $ cat lex.backup > No backing up. > We could just add -b unconditionally to the flex flags and then count the number of lines in lex.backup. If it's greater that 1 whine loudly, or even fail, otherwise remove lex.backup. Would that avoid locale dependencies? cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > We could just add -b unconditionally to the flex flags and then count > the number of lines in lex.backup. If it's greater that 1 whine loudly, > or even fail, otherwise remove lex.backup. Would that avoid locale > dependencies? Hm, yeah, seems like that ought to work. I'm tempted to add "-p -p" also, even though that only results in some whinging on stderr. It would still probably get noticed by anyone who was changing the lexer. regards, tom lane
I wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> We could just add -b unconditionally to the flex flags and then count >> the number of lines in lex.backup. If it's greater that 1 whine loudly, >> or even fail, otherwise remove lex.backup. Would that avoid locale >> dependencies? > Hm, yeah, seems like that ought to work. Done in HEAD, but only for the Makefile-based build mechanism. Anybody want to add the comparable logic to the MSVC scripts? regards, tom lane
While I'm looking at this ... the current implementation has got a number of very inconsistent behaviors with respect to when it will expand a variable reference within a psql meta-command argument. Observe: regression=# \set foo 'value of foo' regression=# \set bar 'value of bar' regression=# \echo :foo value of foo regression=# \echo :foo@bar value of foo @bar (there shouldn't be a space before the @, IMO --- there is because this gets treated as two separate arguments, which seems bizarre) regression=# \echo :foo:bar value of foo value of bar (again, why is this two arguments not one?) regression=# \echo :foo@:bar value of foo @:bar (why isn't :bar expanded here, when it is in the previous case?) regression=# \echo foo:foo@:bar foo:foo@:bar (and now neither one gets expanded) ISTM the general rule ought to be that we attempt to substitute for a colon-construct regardless of where it appears within an argument, as long as it's not within quotes. Thoughts? regards, tom lane
On Thu, Aug 25, 2011 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > While I'm looking at this ... the current implementation has got a > number of very inconsistent behaviors with respect to when it will > expand a variable reference within a psql meta-command argument. > Observe: > > regression=# \set foo 'value of foo' > regression=# \set bar 'value of bar' > regression=# \echo :foo > value of foo > regression=# \echo :foo@bar > value of foo @bar > > (there shouldn't be a space before the @, IMO --- there is because this > gets treated as two separate arguments, which seems bizarre) > > regression=# \echo :foo:bar > value of foo value of bar > > (again, why is this two arguments not one?) > > regression=# \echo :foo@:bar > value of foo @:bar > > (why isn't :bar expanded here, when it is in the previous case?) > > regression=# \echo foo:foo@:bar > foo:foo@:bar > > (and now neither one gets expanded) > > ISTM the general rule ought to be that we attempt to substitute for a > colon-construct regardless of where it appears within an argument, as > long as it's not within quotes. > > Thoughts? My main thought is that I remember this code being pretty awful - especially with respect to error handling - when I looked at it. A lot of dubious behaviors were more or less compelled by the impossibility of bailing out at an arbitrary point a la ereport(). At least, barring a drastic refactoring of the code, which might not be a bad idea either. No objection if you want to clean some of it up, but you may find it's a larger sinkhole than you anticipate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 08/25/2011 02:45 PM, Tom Lane wrote: > I wrote: >> Andrew Dunstan<andrew@dunslane.net> writes: >>> We could just add -b unconditionally to the flex flags and then count >>> the number of lines in lex.backup. If it's greater that 1 whine loudly, >>> or even fail, otherwise remove lex.backup. Would that avoid locale >>> dependencies? >> Hm, yeah, seems like that ought to work. > Done in HEAD, but only for the Makefile-based build mechanism. Anybody > want to add the comparable logic to the MSVC scripts? > > Done. cheers andrew
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, Aug 25, 2011 at 5:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ISTM the general rule ought to be that we attempt to substitute for a >> colon-construct regardless of where it appears within an argument, as >> long as it's not within quotes. > My main thought is that I remember this code being pretty awful - > especially with respect to error handling - when I looked at it. A > lot of dubious behaviors were more or less compelled by the > impossibility of bailing out at an arbitrary point a la ereport(). At > least, barring a drastic refactoring of the code, which might not be a > bad idea either. What I had in mind to do was just to rearrange the flex rules --- the issues that I called out have to do with dubious choices about when to transition between different lexer states. I agree that the error handling isn't terribly friendly in unexpected cases like there not being a connection available to determine the literal-quoting rules, but that's not what I'm on about here. I'm just after consistent variable-expansion behavior in normal operation. regards, tom lane