Re: [HACKERS] backslash in psql output - Mailing list pgsql-hackers
From | Massimo Dal Zotto |
---|---|
Subject | Re: [HACKERS] backslash in psql output |
Date | |
Msg-id | 199810111224.OAA02257@nikita.wizard.net Whole thread Raw |
In response to | Re: [HACKERS] backslash in psql output (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [HACKERS] backslash in psql output
|
List | pgsql-hackers |
> > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > I realize the double-backslash is confusing, but I don't think we can > > make such a user-visible change at this time. I think we need to open > > discussion on this issue on the general list, and to include discussion > > of NULL displays, and any other issues, as well as how to properly > > output the column separation character if that appears in the data. > > So, I think we have to put it back to the old way, and open discussion > > about this after 6.4. > > Well, actually there *was* public discussion of this issue, on the > pgsql-interfaces list around 12/13 August. The consensus was that > unnecessary backslashing was a bad idea --- in fact, I didn't see > *anyone* arguing in favor of the old behavior, and the people who > actually had backslashes in their data definitely didn't want it. > Admittedly it was a pretty small sample (Tom Lockhart and I were > two of the primary complainers) but there wasn't any sentiment > for keeping the old behavior. > > Keep in mind that what we are discussing here is the behavior of > PQprint(), not the behavior of FE/BE transport protocol or anything > else that affects data received by applications. PQprint's goal in > life is to present data in a reasonably human-friendly way, *not* > to produce a completely unambiguous machine-readable syntax. Its > output format is in fact very ambiguous. Here's an example: > > play=> create table test(id int4, val text); > play=> insert into test values (1, NULL); > play=> insert into test values (2, ' '); > play=> insert into test values (3, 'foobar'); > play=> insert into test values (4, 'oneback\\slash'); > play=> insert into test values (5, 'onevert|bar'); > play=> select * from test; > id|val > --+------------- > 1| > 2| > 3|foobar > 4|oneback\slash > 5|onevert|bar > (5 rows) > > You can't tell the difference between a NULL field and an all-blanks > value in this format; nor can you really be sure how many trailing > blanks there are in tuples 3 and 5. So the goal is readability, > not lack of ambiguity. Given that goal, I don't see the value of > printing backslash escapes. Are you really having difficulty telling > the data vertical bar from the ones used as column separators? > Physical alignment is the cue the eye relies on, I think. > > The only cases that PQprint inserted backslashes for were the column > separator char (unnecessary per above example), newlines (also not > exactly hard to recognize), and backslash itself. All of these > seem unnecessary and confusing to me. > > I'm sorry that this change sat in my to-do queue for so long, but > I don't see it as a last-minute thing. The consensus to do it was > established two months ago. > > regards, tom lane > > > In my opinion we should privilege machine-readableness first and then provide some user option to enable user-friendly conversion in psql output if one really needs it. In situations where data is processed by other programs it is very important that there is no ambiguity in strings exchanged between the application and the backend. This is already done for input, which supports C-like escape, but not yet for output, which can produce ambiguous data when nulls, arrays or non-printing characters are involved. This is the reason why I always use my C-like output functions (contrib/string-io) in all my applications. These arguments apply also to the copy command which uses the same output functions. Consider the case where a text field contains a multi-line string with newlines embedded; if you export the table into an external files the field is split into many lines which are interpreted as separate records by commonly used line-oriented filters like awk or grep. I believe that the right way to handle all this stuff is the following: input: binary data escaped data | | (user conversion) (psql input) | | +-----------------------+ | escaped query | (libpq) | escaped query escaped data | | (parser unescape) (copy-from unescape) | | +-----------------------+ | binary data | (input function) | internal data output: internal data | (output function) | escaped data | +-----------------------+ | | (libpq) (copy-to) | | escaped data escaped data | | +-----------------------+-----------------------+ | | | (user conversion) (psql output) (psql unescape) | | | binary data escaped data binary data In the above schema binary data means the external representation of data containing non-printing or delimiters characters like quotes or newlines. In this schema all the data exchanged with the backend should be escaped in order to guarantee unambiguity to applications. The input and output user conversion functions could be provided by libpq as utilities, and the conversion could possibly be done automatically by libpq itself if some global flag is set by the application. Psql input should accept only escaped data while the output could be escaped (default) or binary depending on a user supplied switch. Files read or written by the copy command should be always escaped with exactly one record for line. Pg_dump should produce escaped strings. All this stuff requires the use of new output functions like those provided in contrib/string-io. There is still the problem of distinguishing between scalars and arrays which is necessary for user output conversion. In my output functions I solved the problem by escaping the first '{' of each field if it is not an array. Another problem is that array input requires a double escaping, one for the query parser and a second one for the array parser. Also nulls (\0) are not handled by the input code. This should be fixed if we want true binary data. I don't know if C-escapes violate the ansi sql standard but I believe they makes life easier for the programmer. And if we add some global flag in libpq we could also do automatic conversion to be compatible with ansi sql and old applications. Note that arrays aren't ansi sql anyway. Anyway a runtime switch is preferable to a configure switch. -- Massimo Dal Zotto +----------------------------------------------------------------------+ | Massimo Dal Zotto email: dz@cs.unitn.it | | Via Marconi, 141 phone: ++39-461-534251 | | 38057 Pergine Valsugana (TN) www: http://www.cs.unitn.it/~dz/ | | Italy pgp: finger dz@tango.cs.unitn.it | +----------------------------------------------------------------------+
pgsql-hackers by date: