Re: legacy assumptions - Mailing list pgsql-docs
From | Jonathan S. Katz |
---|---|
Subject | Re: legacy assumptions |
Date | |
Msg-id | cba4b70f-2ae8-83f8-0655-47f3bebea755@postgresql.org Whole thread Raw |
In response to | legacy assumptions (PG Doc comments form <noreply@postgresql.org>) |
Responses |
Re: legacy assumptions
|
List | pgsql-docs |
Hi, On 11/25/19 12:47 PM, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/12/datatype-json.html > Description: > > I'm wondering if this one line of section 8.14 JSON Types > (https://www.postgresql.org/docs/current/datatype-json.html) can be edited > to remove the word "legacy": > > "In general, most applications should prefer to store JSON data as jsonb, > unless there are quite specialized needs, such as legacy assumptions about > ordering of object keys." > > I'm concerned that with the word "legacy" there, someone might come along > eventually and decide the json column type isn't needed anymore because it's > "legacy", where in fact there are modern and legitimate uses for a field > that allows you to retrieve the data exactly as it was stored and allows > JSON queries on that data (even if they are slower). While I'm certainly sensitive to this need as once upon a time I had a similar requirement, slightly less strict requirement, I made sure to not rely on the PostgreSQL JSON type itself to ensure ordering was preserved (and in my case I was able to rely on a solution external to PostgreSQL). The JSON RFC states that objects should be considered "unordered", and mentions that while different parsing libraries may preserve key ordering, "implementations whose behavior does not depend on member ordering will be interoperable in the sense that they will not be affected by these differences."[1] > An alternative would be to store the > plaintext as binary data for the integrity check and have a separate jsonb > column with a second copy of the same data. Since different applications > have different time/space tradeoffs, it's good to have the choice. Another approach is to leverage PostgreSQL's expression index capabilities, which would allow you to limit the data duplication. For example: CREATE TABLE docs (doc bytea); -- populating some test data INSERT INTO docs SELECT ('{"id": ' || x || ', "data": [1,2,3] }')::bytea FROM generate_series(1, 100000) x; -- create an expression index that maps to the operators supported by GIN CREATE INDEX docs_doc_json_idx ON docs USING gin(jsonb(encode(doc, 'escape'))); and in one test run: EXPLAIN SELECT doc FROM docs WHERE encode(doc, 'escape')::jsonb @> '{"id": 567}'; I got a plan similar to: QUERY PLAN ------------------------------------------------------------------------------------ Bitmap Heap Scan on docs (cost=28.77..306.00 rows=100 width=31) Recheck Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id": 567}'::jsonb) -> Bitmap Index Scan on docs_doc_json_idx (cost=0.00..28.75 rows=100 width=0) Index Cond: ((encode(doc, 'escape'::text))::jsonb @> '{"id": 567}'::jsonb) In this way, you can: - Keep the key ordering preserved and perform any integrity checks, etc. that your application requires - Limit your data duplication to that of the index - Still get the benefits of the JSONB lookup functions that work with the indexing - Still perform JSON validation: INSERT INTO docs VALUES ('{]'::bytea); ERROR: invalid input syntax for type json DETAIL: Expected string or "}", but found "]". CONTEXT: JSON data, line 1: {] > My suggestion for that sentence: > > "In general, most applications should prefer to store JSON data as jsonb, > unless there are quite specialized needs, such as assumptions about ordering > of object keys or the need to retrieve the data exactly as it was stored." My preference would be that we guide in the documentation on what to do if one has an application sensitive to ordering. I'm not opposed to the wording, but I'd prefer we encourage people to leverage JSONB for storage & retrieval. Thanks! Jonathan [1] https://tools.ietf.org/html/rfc7159#section-4
Attachment
pgsql-docs by date: