Re: additional json functionality - Mailing list pgsql-hackers
From | Andrew Dunstan |
---|---|
Subject | Re: additional json functionality |
Date | |
Msg-id | 52868605.8020900@dunslane.net Whole thread Raw |
In response to | Re: additional json functionality (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: additional json functionality
Re: additional json functionality |
List | pgsql-hackers |
On 11/15/2013 03:25 PM, Merlin Moncure wrote: > On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote: >> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >>> Here are the options on the table: >>> 1) convert existing json type to binary flavor (notwithstanding objections) >>> 2) maintain side by side types, one representing binary, one text. >>> unfortunately, i think the text one must get the name 'json' due to >>> unfortunate previous decision. >>> 3) merge the behaviors into a single type and get the best of both >>> worlds (as suggested upthread). >>> >>> I think we need to take a *very* hard look at #3 before exploring #1 >>> or #2: Haven't through it through yet but it may be possible to handle >>> this in such a way that will be mostly transparent to the end user and >>> may have other benefits such as a faster path for serialization. >> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ fora couple of examples) --- without undue maintenance overhead, then great. >> >> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at$work. > Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: > > *) 'json' type internally has a binary as well a text representation. > The text representation is basically the current type behavior > (duduplicated unordered). The binary representation is the hstore-ish > variant. The text mode is discarded when it's deemed no longer > appropriate to be needed, and, once gone, can never be rebuilt as it > was. > > *) only the binary internal representation ever gets stored to disk > (or anything else). > > *) the text mode is preferred for output if it is there. otherwise, a > deduplicated, reordered text representation is generated > > *) When literal text is casted to json, the binary structure is built > up and kept alongside binary mode. So, if you went: 'select '{"a": > 1, "a": 2}'::json', you'd get the same thing back. (This is how > it works now.). but, if you went: 'insert into foo select '{"a": 1, > "a": 2}'::json returning *', you'd get {"a": 2} back essentially > (although technically that would be a kind of race). > > *) When the json is stored to table, the text representation gets > immediately discarded on the basis that it's no longer the true > representation of the data. > > *) Ditto when making any equality operation (not as sure on this point). > > *) Ditto when doing any operation that mutates the structure in any > way. the text representation is immutable except during serialization > and if it gets invalidated it gets destroyed. > > *) New API function: json_simplify(); or some such. It reorders and > dedups from user's point of view (but really just kills off the text > representation) > > *) once the text mode is gone, you get basically the proposed 'hstore' behavior. > > *) serialization functions are generally used in contexts that do not > store anything but get output as query data. They create *only* the > text mode. However, if the resultant json is stored anywhere, the > text mode is destroyed and replaced with binary variant. This is both > a concession to the current behavior and an optimization of > 'serialization-in-query' for which I think the binary mode is pessimal > performance wise. so, xxx_to_json serialization functions work > exactly as they do now which fixes my problem essentially. > > *) if you are unhappy with duplicates in the above, just get use to > calling json_simpify() on the serialized json (or deal with in on the > client side). > > This is all pretty glossy, but maybe there is a way forward... > It's making my head hurt, to be honest, and it sounds like a recipe for years and years of inconsistencies and bugs. I don't want to have two types, but I think I'd probably rather have two clean types than this. I can't imagine it being remotely acceptable to have behaviour depend in whether or not something was ever stored, which is what this looks like. cheers andrew > >
pgsql-hackers by date: