Home > mailing lists

Re: Emitting JSON to file using COPY TO - Mailing list pgsql-hackers

From	Joe Conway
Subject	Re: Emitting JSON to file using COPY TO
Date	December 6, 2023 21:28:59
Msg-id	4d5688f4-9582-4093-8448-e1867bc9e2bc@joeconway.com Whole thread Raw
In response to	Re: Emitting JSON to file using COPY TO (Sehrope Sarkuni <sehrope@jackdb.com>)
Responses	Re: Emitting JSON to file using COPY TO
List	pgsql-hackers

Tree view

On 12/6/23 11:28, Sehrope Sarkuni wrote:
> Big +1 to this overall feature.

cool!

> Regarding the defaults for the output, I think JSON lines (rather than a 
> JSON array of objects) would be preferred. It's more natural to combine 
> them and generate that type of data on the fly rather than forcing 
> aggregation into a single object.

So that is +2 (Sehrope and me) for the status quo (JSON lines), and +2 
(Andrew and Davin) for defaulting to json arrays. Anyone else want to 
weigh in on that issue?

> Couple more features / use cases come to mind as well. Even if they're 
> not part of a first round of this feature I think it'd be helpful to 
> document them now as it might give some ideas for what does make that 
> first cut:
> 
> 1. Outputting a top level JSON object without the additional column 
> keys. IIUC, the top level keys are always the column names. A common use 
> case would be a single json/jsonb column that is already formatted 
> exactly as the user would like for output. Rather than enveloping it in 
> an object with a dedicated key, it would be nice to be able to output it 
> directly. This would allow non-object results to be outputted as well 
> (e.g., lines of JSON arrays, numbers, or strings). Due to how JSON is 
> structured, I think this would play nice with the JSON lines v.s. array 
> concept.
> 
> COPY (SELECT json_build_object('foo', x) AS i_am_ignored FROM 
> generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON, 
> SOME_OPTION_TO_NOT_ENVELOPE)
> {"foo":1}
> {"foo":2}
> {"foo":3}

Your example does not match what you describe, or do I misunderstand? I 
thought your goal was to eliminate the repeated "foo" from each row...

> 2. An option to ignore null fields so they are excluded from the output. 
> This would not be a default but would allow shrinking the total size of 
> the output data in many situations. This would be recursive to allow 
> nested objects to be shrunk down (not just the top level). This might be 
> worthwhile as a standalone JSON function though handling it during 
> output would be more efficient as it'd only be read once.
> 
> COPY (SELECT json_build_object('foo', CASE WHEN x > 1 THEN x END) FROM 
> generate_series(1, 3) x) TO STDOUT WITH (FORMAT JSON, 
> SOME_OPTION_TO_NOT_ENVELOPE, JSON_SKIP_NULLS)
> {}
> {"foo":2}
> {"foo":3}

clear enough I think

> 3. Reverse of #2 when copying data in to allow defaulting missing fields 
> to NULL.

good to record the ask, but applies to a different feature (COPY FROM 
instead of COPY TO).

-- 
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

pgsql-hackers by date:

From: Matthias van de Meent
Date: 06 December 2023, 21:20:03
Subject: Re: automating RangeTblEntry node support

From: Sehrope Sarkuni
Date: 06 December 2023, 21:36:02
Subject: Re: Emitting JSON to file using COPY TO

Re: Emitting JSON to file using COPY TO - Mailing list pgsql-hackers

Previous

Next