Thread: BUG #17142: COPY ignores client_encoding for octal digit characters
BUG #17142: COPY ignores client_encoding for octal digit characters
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 17142 Logged by: Andreas Grob Email address: vilarion@illarion.org PostgreSQL version: 13.3 Operating system: Debian GNU/Linux 11 (bullseye) Description: Test db and table: ``` CREATE DATABASE test WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE = 'C' LC_CTYPE = 'C'; CREATE TABLE test (text character varying(50)); ``` Test program in C: ``` #include <postgresql/libpq-fe.h> #include <stdio.h> #include <stdlib.h> #include <string.h> int main(int argc, char **argv) { const char *conninfo; char *errmsg; PGconn *conn; PGresult *res; int a, b; ExecStatusType status; int enc; char buffer[] = "\\304\\366\\337"; //Äöß // char buffer[] = "\304\366\337"; //Äöß if (argc > 1) conninfo = argv[1]; else conninfo = "user=postgres dbname=test port=5433 client_encoding=LATIN1"; /* Make a connection to the database */ conn = PQconnectdb(conninfo); /* Check to see that the backend connection was successfully made */ if (PQstatus(conn) != CONNECTION_OK) { fprintf(stderr, "Connection to database failed: %s" , PQerrorMessage(conn)); PQfinish(conn); exit(1); } res = PQexec(conn, "BEGIN"); res = PQexec(conn, "COPY public.test(text) from STDIN;"); a = PQputCopyData(conn, buffer, strlen(buffer)); b = PQputCopyEnd(conn, NULL); res = PQgetResult(conn); status = PQresultStatus(res); enc = PQclientEncoding(conn); errmsg = PQresultErrorMessage(res); printf("status=%d a=%d,b=%d, enc=%d\n", status, a, b, enc); if (status != PGRES_COMMAND_OK) printf("%s\n", errmsg); else printf("worked.\n"); res = PQexec(conn, "COMMIT"); /* close the connection to the database and cleanup */ PQfinish(conn); return 0; } ``` Output: ``` status=7 a=1,b=1, enc=8 ERROR: invalid byte sequence for encoding "UTF8": 0xc4 0xf6 CONTEXT: COPY test, line 1: "\304\366\337" ``` Expected output: ``` status=1 a=1,b=1, enc=8 worked. ``` (Äöß got inserted into the table.) Characters in octal digits should be possible as per https://www.postgresql.org/docs/13/sql-copy.html When using characters directly (char buffer[] = "\304\366\337") the expected output is displayed. My apologies if I misunderstood something.
Re: BUG #17142: COPY ignores client_encoding for octal digit characters
From
Heikki Linnakangas
Date:
On 12/08/2021 00:24, PG Bug reporting form wrote: > Characters in octal digits should be possible as per > https://www.postgresql.org/docs/13/sql-copy.html > When using characters directly (char buffer[] = "\304\366\337") the expected > output is displayed. > > My apologies if I misunderstood something. The code is pretty clear that the \123 and \x12 escapes are evaluated after encoding conversion. That means, the escapes are interpreted using the database encoding, regardless of client encoding. The documentation doesn't say anything about that, though. We should fix the docs. How does the attached patch look? You could get weird results if you use the escapes for some bytes in a multi-byte character. Mostly you'd get invalid byte sequence errors, but I think with the right combination of the client and database encodings, it could get more strange. I think the wording in the attached docs patch is enough to cover that, though. - Heikki
Attachment
Re: BUG #17142: COPY ignores client_encoding for octal digit characters
From
vilarion@illarion.org
Date:
On 12.08.2021 09:40, Heikki Linnakangas wrote: > On 12/08/2021 00:24, PG Bug reporting form wrote: >> Characters in octal digits should be possible as per >> https://www.postgresql.org/docs/13/sql-copy.html >> When using characters directly (char buffer[] = "\304\366\337") the >> expected >> output is displayed. >> >> My apologies if I misunderstood something. > > The code is pretty clear that the \123 and \x12 escapes are evaluated > after encoding conversion. That means, the escapes are interpreted > using the database encoding, regardless of client encoding. The > documentation doesn't say anything about that, though. We should fix > the docs. How does the attached patch look? > > You could get weird results if you use the escapes for some bytes in a > multi-byte character. Mostly you'd get invalid byte sequence errors, > but I think with the right combination of the client and database > encodings, it could get more strange. I think the wording in the > attached docs patch is enough to cover that, though. > > - Heikki Thanks for clarifying! This patch to the docs will allow me to file a bug report against the library I am using (pqxx). Andreas
Re: BUG #17142: COPY ignores client_encoding for octal digit characters
From
Heikki Linnakangas
Date:
On 12/08/2021 11:01, vilarion@illarion.org wrote: > Thanks for clarifying! This patch to the docs will allow me to file a > bug report against the library I am using (pqxx). Pushed the docs patch now. - Heikki