Re: confusing / inefficient "need_transcoding" handling in copy - Mailing list pgsql-hackers
From | Sutou Kouhei |
---|---|
Subject | Re: confusing / inefficient "need_transcoding" handling in copy |
Date | |
Msg-id | 20240214.114608.2091541942684063981.kou@clear-code.com Whole thread Raw |
In response to | Re: confusing / inefficient "need_transcoding" handling in copy (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: confusing / inefficient "need_transcoding" handling in copy
|
List | pgsql-hackers |
Hi, In <ZcvlgMEjt3qY8eiL@paquier.xyz> "Re: confusing / inefficient "need_transcoding" handling in copy" on Wed, 14 Feb 2024 06:56:16 +0900, Michael Paquier <michael@paquier.xyz> wrote: > We have a couple of non-ASCII characters in the tests, but I suspect > that this one will not be digested correctly everywhere, even if > EUC_JP should be OK to use for the check. How about writing an > arbitrary sequence of bytes into a temporary file that gets used for > the COPY FROM instead? See for example how we do that with > abs_builddir in copy.sql. It makes sense. How about the attached patch? Thanks, -- kou From 6eb9669f97c54f8b85fac63db40ad80664692d12 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei <kou@clear-code.com> Date: Wed, 14 Feb 2024 11:44:13 +0900 Subject: [PATCH v2] Add a test for invalid encoding for COPY FROM The test data use an UTF-8 character (U+3042 HIRAGANA LETTER A) but the test specifies EUC_JP. So it's an invalid data. --- src/test/regress/expected/copyencoding.out | 13 +++++++++++++ src/test/regress/parallel_schedule | 2 +- src/test/regress/sql/copyencoding.sql | 15 +++++++++++++++ 3 files changed, 29 insertions(+), 1 deletion(-) create mode 100644 src/test/regress/expected/copyencoding.out create mode 100644 src/test/regress/sql/copyencoding.sql diff --git a/src/test/regress/expected/copyencoding.out b/src/test/regress/expected/copyencoding.out new file mode 100644 index 0000000000..32a9d918fa --- /dev/null +++ b/src/test/regress/expected/copyencoding.out @@ -0,0 +1,13 @@ +-- +-- Test cases for COPY WITH (ENCODING) +-- +-- directory paths are passed to us in environment variables +\getenv abs_builddir PG_ABS_BUILDDIR +CREATE TABLE test (t text); +\set utf8_csv :abs_builddir '/results/copyencoding_utf8.csv' +-- U+3042 HIRAGANA LETTER A +COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8'); +COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP'); +ERROR: invalid byte sequence for encoding "EUC_JP": 0xe3 0x81 +CONTEXT: COPY test, line 1 +DROP TABLE test; diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule index 1d8a414eea..238cef28c4 100644 --- a/src/test/regress/parallel_schedule +++ b/src/test/regress/parallel_schedule @@ -36,7 +36,7 @@ test: geometry horology tstypes regex type_sanity opr_sanity misc_sanity comment # execute two copy tests in parallel, to check that copy itself # is concurrent safe. # ---------- -test: copy copyselect copydml insert insert_conflict +test: copy copyselect copydml copyencoding insert insert_conflict # ---------- # More groups of parallel tests diff --git a/src/test/regress/sql/copyencoding.sql b/src/test/regress/sql/copyencoding.sql new file mode 100644 index 0000000000..89e2124996 --- /dev/null +++ b/src/test/regress/sql/copyencoding.sql @@ -0,0 +1,15 @@ +-- +-- Test cases for COPY WITH (ENCODING) +-- + +-- directory paths are passed to us in environment variables +\getenv abs_builddir PG_ABS_BUILDDIR + +CREATE TABLE test (t text); + +\set utf8_csv :abs_builddir '/results/copyencoding_utf8.csv' +-- U+3042 HIRAGANA LETTER A +COPY (SELECT E'\u3042') TO :'utf8_csv' WITH (FORMAT csv, ENCODING 'UTF8'); +COPY test FROM :'utf8_csv' WITH (FORMAT csv, ENCODING 'EUC_JP'); + +DROP TABLE test; -- 2.43.0
pgsql-hackers by date: