Re: raw output from copy - Mailing list pgsql-hackers

From Pavel Stehule
Subject Re: raw output from copy
Date
Msg-id CAFj8pRDu01zic74KZkG2=7vBUBddRRb1haPrXsmuv3zKHPwVuQ@mail.gmail.com
Whole thread Raw
In response to Re: raw output from copy  (Corey Huinker <corey.huinker@gmail.com>)
Responses Re: raw output from copy
List pgsql-hackers
Hi

2016-03-09 18:41 GMT+01:00 Corey Huinker <corey.huinker@gmail.com>:

The regression tests seem to adequately cover all new functionality, though I wonder if we should add some cases that highlight situations where BINARY mode is insufficient.


One thing I tried to test RAW was to load an existing json file.

My own personal test was to load an existing .json file into a 1x1 bytea table, which worked. From there I was able to 
select encode(col_name,'escape')::text::jsonb from test_table
and the json was correctly converted.

A similar test copying binary failed.

A write up of the test looks like this:

\copy (select '{"foo": "bar"}') to '/tmp/raw_test.jsonb' (format raw);
COPY 1
create temporary table raw_byte (b bytea);
CREATE TABLE
create temporary table raw_text (t text);
CREATE TABLE
\copy raw_jsonb from '/tmp/raw_test.blob' (format raw);
psql:/home/ubuntu/raw_test.sql:9: ERROR:  relation "raw_jsonb" does not exist
\copy raw_byte from '/tmp/raw_test.blob' (format raw);
COPY 1
select encode(b,'escape')::text::json from raw_byte;
     encode     
----------------
 {"foo": "bar"}
(1 row)

\copy raw_text from '/tmp/raw_test.blob' (format raw);
COPY 1
select t::jsonb from raw_text;
       t        
----------------
 {"foo": "bar"}
(1 row)

create temporary table binary_byte (b bytea);
CREATE TABLE
create temporary table binary_text (t text);
CREATE TABLE
\copy binary_byte from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:22: ERROR:  COPY file signature not recognized
select encode(b,'escape')::jsonb from binary_byte;
 encode 
--------
(0 rows)

\copy binary_text from '/tmp/raw_test.blob' (format binary);
psql:/home/ubuntu/raw_test.sql:26: ERROR:  COPY file signature not recognized
select t::jsonb from binary_text;
 t 
---
(0 rows)


So, if we want to add a regression test to demonstrate to posterity why we need RAW for cases that BINARY can't handle, I offer the attached file.

I don't think so regress tests should to do this demonstration. It is clean, so COPY BINARY should to fail every time, and then there is not any benefit from it in regress tests. There are lot of discussion in this thread, and we don't need to inject more "garbage" to regress tests.
 

Does anyone else see value in adding that to the regression tests?

 
Before I give my approval, I want to read it again more closely to make sure that no cases were skipped with regard to the  (binary || raw) and (binary || !raw) tests. Also, I want to use it on some of my problematic files. Maybe I'll find a good edge case. Probably not.

I don't know why I thought this, but when I looked at the patch, I assumed that the ( binary || raw ) tests were part of a large if/elseif/else waterfall. They are not. They stand alone. There are no edge cases to find.

This is organized to files by necessity to work with external files. The regress tests for COPY RAW has about 100 lines - so why need special files and infrastructure. COPY RAW, COPY BINARY tests well shares infrastructure.
 

Review complete and passed. I can re-review if we want to add the additional test.


Great, thank you very much. I hope so this feature really useful. It allow to simple export/import XML doc in different encodings, JSONs and can be enhanced future via options. The nice feature  (but not for this release) can be additional cast info for import -- like "COPY table(jsonb_column) FROM stdin (FORMAT RAW, CAST json_2_jsonb). Because there are the options, there are big space for other enhancing.

Regards

Pavel


pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: Relation extension scalability
Next
From: Amit Kapila
Date:
Subject: Re: Explain [Analyze] produces parallel scan for select Into table statements.