Backslash handling in strings - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Backslash handling in strings |
Date | |
Msg-id | 200505302011.j4UKB8R23097@candle.pha.pa.us Whole thread Raw |
In response to | Re: Escape handling in COPY, strings, psql (Bruce Momjian <pgman@candle.pha.pa.us>) |
Responses |
Re: Backslash handling in strings
|
List | pgsql-hackers |
Bruce Momjian wrote: > Peter Eisentraut wrote: > > Bruce Momjian wrote: > > > I was suggesting ESCAPE 'string' or ESC 'string'. The marker has to > > > be before the string so scan.l can alter its processing of the string > > > --- after the string is too late --- there is no way to undo any > > > escaping that has happened, and it might already be used by gram.y. > > > > That pretty much corresponds to my E'string' proposal. Both are > > probably equally trivial to implement. > > Right. I think your E'' idea has the benefit of fitting with our > existing X'' and B'' modifiers. It is also simpler and cleaner to do in > scan.l, so I think your idea is best. [ CC list trimmed.] OK, I talked to Tom and Peter and I have come up with a tentative plan. The goal, at some point, is that we would have two types of strings, '' strings and E'' strings. '' strings don't have any special backslash handling for compatibility with with the ANSI spec and all other databases except MySQL (and in MySQL it is now optional). E'' strings behave just like our strings do now, with backslash handling. In 8.0.X, we add support for E'' strings, but it is a noop. This is done just for portability with future releases. We also state that users should stop using \' to escape quotes in strings, and instead use '', and that we will throw a warning in 8.1 if we see a \' in a non-E string. (We could probably throw a warning for E'' use of \' too, but I want to give users the ability to avoid the warning if they can't change from using \' to ''.) In 8.1, we start issuing the warning for \' in non-E strings, plus we tell users who want escape processing that they will have to use E'' strings for this starting in release 8.2, and they should start migrating their escaped strings over to E''. Tom also suggested a readonly GUC variable that is sent to clients that indicates if simple strings have backslash handling, for use by applications that are doing escapes themselves, perhaps 'escape_all_strings'. PQescapeString() and PQescapeBytea() can still be used, but only with E'' strings in 8.2. We could create PQquoteString() for 8.1 and later to allow for just single-quote doubling for non-E strings. Tom asked about how to handle pg_dump contents that have strings, like function bodies. We could start using E'' for those in 8.0 but it does break backward movement of dumps, and someone upgrading from 7.1 to 8.2 would be in trouble. :-( Perhaps we will have another round of subrelease fixes and we can bundle this into that and tell people they have to upgrade to the newest subrelease before going to 8.2. I think we have had that requirement in the past when we had broken pg_dump processing. The good news is that once everyone uses only '' to quote string, we will not have any data security issues with this change. The only potential problem is the mishandling of backslash characters if there is a mismatch between what the client expects and the server uses. By backpatching E'' perhaps even to 7.4 and earlier (as a noop), we could minimize this problem. Is this whole thing ugly? Yes. Can we just close our eyes and hope we can continue with our current behavior while growing a larger userbase --- probabably not. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: