Thread: additional json functionality
In the last year I have created a number of modules providing extra JSON functionality. I'd like to include some of these in release 9.4. Candidates for inclusion are: json_build: which provides extra functionality for building up non-regualr and arbitrarily complex json, as well as a way of aggregating key value pairs as a json record. See <https://github.com/pgexperts/json_build> json_object: take a one or two dimensional array of text and turn it into a json object, similar to the way hstore currently does. See <https://bitbucket.org/qooleot/json_object> json_to_record: turn json into a record or recordset, using the same logic as json_pupolate_record and json_populate_recordset, but inferring the return type from the call instead of from the first argument. See <https://bitbucket.org/qooleot/json_to_record> json_build is probably the most important in terms of additional functionality. Both json_build and json_to_record have had to copy non-exposed parts of the internal Json code, which is one of the many reasons I would like to include them, particularly. These extensions have all been created in response to client requirements, so there is a definiite use case for all of them. If there's agreement on taking these, I will prepare patches and submit them by the 15th. cheers andrew
On 11/13/2013 02:34 AM, Andrew Dunstan wrote: > > If there's agreement on taking these, I will prepare patches and submit > them by the 15th. With JSON enhancement, my only concern is that there's work ongoing to integrate the v2 development version of hstore with json, providing typed hstore and an efficient binary storage format for json. It might be worth seeing how that work is going and what functionality needs to be added to it, rather than enhancing the existing json support that may soon change dramatically. -- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > On 11/13/2013 02:34 AM, Andrew Dunstan wrote: >> >> If there's agreement on taking these, I will prepare patches and submit >> them by the 15th. > > With JSON enhancement, my only concern is that there's work ongoing to > integrate the v2 development version of hstore with json, providing > typed hstore and an efficient binary storage format for json. > > It might be worth seeing how that work is going and what functionality > needs to be added to it, rather than enhancing the existing json support > that may soon change dramatically. I'm not so sure we should require hstore to do things like build arbitrary json objects even though I agree that hstore will probably displace json for must cases where you want to store nested data (as opposed to (de-)serialize). Andrew's patches just fill out a couple of missing cases that are handled in the existing API. Putting all the patches together, ISTM there might be a function or two too many. I'm not sure why the json_ prefix was abandoned for build_json_object and build_json_array. Also, json_object is pretty weird to me, I'm not sure I see the advantage of a new serialization format, and I don't agree with the statement "but it is the caller's reponsibility to ensure that keys are not repeated.". I think the caller should have no such responsibility. Keys should be able to repeated. Also, I'm not sure how the {k,v,k,v,k,v}...convention serialized into a string is very useful in general practice. I greatly prefer the aggregation and the variadic methods in json_build. Putting it all together, I'd consider: *) dropping json_object (although maybe there is a case I'm not thinking about) *) changing json_build function names to get the json prefix *) adding a json object constructor that takes two parallel arrays as arguments. merlin
On 11/13/2013 09:45 AM, Merlin Moncure wrote: > On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >> On 11/13/2013 02:34 AM, Andrew Dunstan wrote: >>> If there's agreement on taking these, I will prepare patches and submit >>> them by the 15th. >> With JSON enhancement, my only concern is that there's work ongoing to >> integrate the v2 development version of hstore with json, providing >> typed hstore and an efficient binary storage format for json. >> >> It might be worth seeing how that work is going and what functionality >> needs to be added to it, rather than enhancing the existing json support >> that may soon change dramatically. I'm going to be fairly upset if I'm told I have to wait for the new format work, and then I'm later told it's too late to bring this into 9.4. I think these are really orthogonal issues. Adding a new serialization format (which I have been discussing with Oleg and Teodor, and which I hope to help in bringing to JSON) will make some things lots faster than they now are, and might make some things easier or possible where now they are hard or impossible, but it won't remove any functionality requirement. In particular, json_build, which lets you build up arbitrarily complex and irregular json in a way that's just not possible without using a PL right now, is quite an important requirement. I've given talks about it and JSON users have been quite excited by the possibilities it opens up. The patch for json_to_record is quite small (two functions), and it has the advantage that unlike the json_populate_record functions you don't need to have or create a named type to use it. I think that makes it worth having in itself. > I'm not so sure we should require hstore to do things like build > arbitrary json objects even though I agree that hstore will probably > displace json for must cases where you want to store nested data (as > opposed to (de-)serialize). I have no idea what this means. The plan with the work that Oleg and Teodor are doing is to provide a set of common code that can be used by either a binary json representation (which will be able to be distinguished from a text representation, so there would be no pg_upgrade problems) or nested hstore. In effect, nested hstore and json would have pretty much identical capabilities, so using one ovber another should be largely a matter of preference than a forced choice. Frankly, I believe the audience for JSON is vastly larger, and I expect it to be the treeish data format of choice for almost all users. > Andrew's patches just fill out a couple > of missing cases that are handled in the existing API. Putting all > the patches together, ISTM there might be a function or two too many. > I'm not sure why the json_ prefix was abandoned for build_json_object > and build_json_array. I'm quite happy to change it. > > Also, json_object is pretty weird to me, I'm not sure I see the > advantage of a new serialization format, What? there is no new serialization format. This is a way to generate a json object in the existing format from a one or two dimensional array of text. c.f. |existing function hstore(text[]) => hstore| > and I don't agree with the > statement "but it is the caller's reponsibility to ensure that keys > are not repeated.". I think the caller should have no such > responsibility. Keys should be able to repeated. They can be repeated, as they can in the current json text format. However, the function makes no attempt to deal with repeated keys. If a key is repeated in the inout it will be repeated in the output. In this respect it differs from the hstore function. Note too, that one effect of moving to a non-text representation of json will be that duplicated keys will be resolved (last value will win). But that's a much wider issue that this function. > Also, I'm not sure > how the {k,v,k,v,k,v}...convention serialized into a string is very > useful in general practice. I greatly prefer the aggregation and the > variadic methods in json_build. The extension was built before json_build. But it met a requirement that existed at the time. It probably wouldn't be a tragedy to leave it out, but there is probably a place for it just as there is for the hstore function. > > Putting it all together, I'd consider: > *) dropping json_object (although maybe there is a case I'm not thinking about) > *) changing json_build function names to get the json prefix > *) adding a json object constructor that takes two parallel arrays as > arguments. The third point seems to conflict with the first. I'd only consider that if we *do* add the one-array version of json_object. cheers andrew
On Wed, Nov 13, 2013 at 9:32 AM, Andrew Dunstan <andrew@dunslane.net> wrote: >> I'm not so sure we should require hstore to do things like build >> arbitrary json objects even though I agree that hstore will probably >> displace json for must cases where you want to store nested data (as >> opposed to (de-)serialize). > > I have no idea what this means. What I'm saying there is I agree: what's going with hstore is not relevant here. json features currently provide standalone serialization and deserialization minus a couple of edge cases that you are fixing up here. Hstore will emerge as a json manipulation engine. Aside: I thought we blew it (as you know) by not unifying the hstore and json APIs in the 9.2 cycle and now with the emerging json stuff I really think so...it's way to late to do anything about it now even if there was consensus on that point. >> Also, json_object is pretty weird to me, I'm not sure I see the >> advantage of a new serialization format, > > What? there is no new serialization format. This is a way to generate a json > object in the existing format from a one or two dimensional array of text. > c.f. |existing function hstore(text[]) => hstore| Right -- I thought it took text, not text[] -- withdrawn. I consider a variant taking (text[], text[]) to be generally more practical than the one argument version (either 1d or 2d variant). Dealing with 2d arrays is a headache unless you really know what you're doing. >> and I don't agree with the >> statement "but it is the caller's reponsibility to ensure that keys >> are not repeated.". I think the caller should have no such >> responsibility. Keys should be able to repeated. > > They can be repeated, as they can in the current json text format. However, > the function makes no attempt to deal with repeated keys. If a key is > repeated in the inout it will be repeated in the output. In this respect it > differs from the hstore function. Yes. and I think this is one of the major advantages of the json API vs hstore: you can serialize objects that hstore cannot -- at least not without extra scaffolding (at least, AIUI, I haven't fully grappled with the coming hstore stuff yet). In other words, just because key order and cardinality is unimportant in an associative array, it does not in any way follow it is similarly unimportant for object serialization. > Note too, that one effect of moving to a non-text representation of json > will be that duplicated keys will be resolved (last value will win). But > that's a much wider issue that this function. Right, exactly. And I think this a pretty serious problem with 'non text json' unless there is a relatively robust and fast process to recompose the json properly for serialization purposes (but that's mostly off topic for your proposed patch). >> Also, I'm not sure >> how the {k,v,k,v,k,v}...convention serialized into a string is very >> useful in general practice. I greatly prefer the aggregation and the >> variadic methods in json_build. > > The extension was built before json_build. But it met a requirement that > existed at the time. It probably wouldn't be a tragedy to leave it out, but > there is probably a place for it just as there is for the hstore function. > >> Putting it all together, I'd consider: >> *) dropping json_object (although maybe there is a case I'm not thinking >> about) >> *) changing json_build function names to get the json prefix >> *) adding a json object constructor that takes two parallel arrays as >> arguments. > > The third point seems to conflict with the first. I'd only consider that if > we *do* add the one-array version of json_object. ok, agreed. so now I'm just saying to unify function names over json prefix and maybe add text[], text[] variant for the object builder, or maybe just drop json_object completely. merlin
On 11/13/2013 11:37 AM, Merlin Moncure wrote: > > Yes. and I think this is one of the major advantages of the json API > vs hstore: you can serialize objects that hstore cannot -- at least > not without extra scaffolding (at least, AIUI, I haven't fully > grappled with the coming hstore stuff yet). In other words, just > because key order and cardinality is unimportant in an associative > array, it does not in any way follow it is similarly unimportant for > object serialization. I think you're probably going to lose any argument that says we should necessarily preserve key order (and possibly key duplication) in objects. The standard doesn't support such a contention, either: An object is an unordered collection of zero or more name/value pairs ... The names within an object SHOULD be unique. Forcing us to preserve order and key duplication would be a pretty effective barrier to any performance improvements. cheers andrew
On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > On 11/13/2013 11:37 AM, Merlin Moncure wrote: >> >> Yes. and I think this is one of the major advantages of the json API >> vs hstore: you can serialize objects that hstore cannot -- at least >> not without extra scaffolding (at least, AIUI, I haven't fully >> grappled with the coming hstore stuff yet). In other words, just >> because key order and cardinality is unimportant in an associative >> array, it does not in any way follow it is similarly unimportant for >> object serialization. > > An object is an unordered collection ofz ero or more name/value > pairs > ... > > The names within an object SHOULD be unique. > > Forcing us to preserve order and key duplication would be a pretty effective > barrier to any performance improvements. SHOULD != MUST. Here is the definition of object per RFC 4627. "An object structure is represented as a pair of curly brackets surrounding zero or more name/value pairs (or members). A name is a string. A single colon comes after each name, separating the name from the value. A single commaseparates a value from a following name. The names within an object SHOULD be unique." And SHOULD means "3. SHOULD. This word, or the adjective "RECOMMENDED", mean that there may exist valid reasons in particular circumstances to ignore a particular item, but the full implications must be understood and carefully weighed before choosing a different course." As far as I'm concerned, that settles things right there. Beyond that (although they do say 'unordered' above), as a consequence of your argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should be considered equivalent. Another consequence is that creating particular legal constructions should be discouraged. I disagree with this. This is simply not the case with many json consuming clients. It's a nice idea but not how things work universally and that's exactly why the rules were hedged in the RFC. I have a couple of cases right now where I'm producing key order sensitive json for some (admittedly not very well designed) json consuming clients that are out of my control. merlin
On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should require hstore to do things like build > Also, json_object is pretty weird to me, I'm not sure I see the > advantage of a new serialization format, and I don't agree with the > statement "but it is the caller's reponsibility to ensure that keys > are not repeated.". This is pretty standard in the programming languages I know of which use JSON. > I think the caller should have no such > responsibility. Keys should be able to repeated. Apparently your experience with using JSON in practice has been fairly different from mine; the projects I work on, the JSON is being constantly converted back and forth to hashes and dictionaries, which means that ordering is not preserved and keys have to be unique (or become unique within one conversion cycle). I think, based on the language of the RFC and common practice, that it's completely valid for us to require unique keys within JSON-manipulation routines. Certainly the upcoming binary storage is going to require unique keys. For that matter, both MongoDB and CouchDB store unique, unordered keys.And ever supporting CRUD functions (i.e. "update thiskey") is going to require uniqueness. > Putting it all together, I'd consider: > *) dropping json_object (although maybe there is a case I'm not thinking about) > *) changing json_build function names to get the json prefix > *) adding a json object constructor that takes two parallel arrays as > arguments. I was with you until the third idea. Huh? The scripting languages I use (Perl, Python) have functions which convert a list/array to a hash/dictionary. In each case, the standard input is a single list/array in the form [ k, v, k, v, k, v ]. Now, while there are standard language functions which support munging two parallel arrays into one hash (such as Python's zip()), these are less frequently used. Supporting the zip() option without supporting the [ k, v ] array option would be a bizarre and puzzling approach to most programmers I know. I can see three approaches which make sense: 1. we don't include json_object at all. 2. we include the existing json_object 3. we include json_object, plus a second json_object function which takes two arrays Keep in mind that all of Andrew's functions came out of real-life use cases of writing applications which return JSON to the caller, so they are based on real needs to fill holes in our JSON-building function library. In the case of json_object, the need was to supply "column" labels where, usually due to calculated columns, none exist in the input. Take the example where I want to return a bunch of aggregates from a table as a series of json objects with user-friendly labels: SELECT build_json_object( "dept", department, "total_costs", sum(costs), "running_total", running_sum() ) FROM .... Where it becomes even more useful is when you want the json label to be the result of a calculated expression: SELECT build_json_object ( department, sum() ) Yes, you could do this with a two-array version as well; it's just not more intuitive, and in cases where you have dozens of columns, puts you in column-counting hell. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/13/2013 04:58 PM, Merlin Moncure wrote: > On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> On 11/13/2013 11:37 AM, Merlin Moncure wrote: >>> Yes. and I think this is one of the major advantages of the json API >>> vs hstore: you can serialize objects that hstore cannot -- at least >>> not without extra scaffolding (at least, AIUI, I haven't fully >>> grappled with the coming hstore stuff yet). In other words, just >>> because key order and cardinality is unimportant in an associative >>> array, it does not in any way follow it is similarly unimportant for >>> object serialization. >> An object is an unordered collection ofz ero or more name/value >> pairs >> ... >> >> The names within an object SHOULD be unique. >> >> Forcing us to preserve order and key duplication would be a pretty effective >> barrier to any performance improvements. > SHOULD != MUST. Here is the definition of object per RFC 4627. > > "An object structure is represented as a pair of curly brackets > surrounding zero or more name/value pairs (or members). A name is a > string. A single colon comes after each name, separating the name > from the value. A single comma separates a value from a following > name. The names within an object SHOULD be unique." > > And SHOULD means > "3. SHOULD. This word, or the adjective "RECOMMENDED", mean that > there may exist valid reasons in particular circumstances to ignore a > particular item, but the full implications must be understood and > carefully weighed before choosing a different course." > > As far as I'm concerned, that settles things right there. Beyond that > (although they do say 'unordered' above), as a consequence of your > argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should > be considered equivalent. Another consequence is that creating > particular legal constructions should be discouraged. I disagree with > this. > > This is simply not the case with many json consuming clients. It's a > nice idea but not how things work universally and that's exactly why > the rules were hedged in the RFC. I have a couple of cases right now > where I'm producing key order sensitive json for some (admittedly not > very well designed) json consuming clients that are out of my control. > I understand the difference between "should" and "must". But there is nothing that REQUIRES us to preserve key order or duplicate keys. If you really need textual preservation, you should probably store the data as text and convert it to json to do json-ish things to it. If not, we're going to face huge demands to implement another type which almost everyone but you will move to in rapid order because it performs so much better. The strong consensus I have seen in discussions at conferences and elsewhere is to go the way we're going, instead. cheers andrew
On 14/11/13 11:33, Andrew Dunstan wrote: > > On 11/13/2013 04:58 PM, Merlin Moncure wrote: >> On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net> >> wrote: >>> On 11/13/2013 11:37 AM, Merlin Moncure wrote: >>>> Yes. and I think this is one of the major advantages of the json API >>>> vs hstore: you can serialize objects that hstore cannot -- at least >>>> not without extra scaffolding (at least, AIUI, I haven't fully >>>> grappled with the coming hstore stuff yet). In other words, just >>>> because key order and cardinality is unimportant in an associative >>>> array, it does not in any way follow it is similarly unimportant for >>>> object serialization. >>> An object is an unordered collection ofz ero or more name/value >>> pairs >>> ... >>> >>> The names within an object SHOULD be unique. >>> >>> Forcing us to preserve order and key duplication would be a pretty >>> effective >>> barrier to any performance improvements. >> SHOULD != MUST. Here is the definition of object per RFC 4627. >> >> "An object structure is represented as a pair of curly brackets >> surrounding zero or more name/value pairs (or members). A name is a >> string. A single colon comes after each name, separating the name >> from the value. A single comma separates a value from a following >> name. The names within an object SHOULD be unique." >> >> And SHOULD means >> "3. SHOULD. This word, or the adjective "RECOMMENDED", mean that >> there may exist valid reasons in particular circumstances to ignore a >> particular item, but the full implications must be understood and >> carefully weighed before choosing a different course." >> >> As far as I'm concerned, that settles things right there. Beyond that >> (although they do say 'unordered' above), as a consequence of your >> argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should >> be considered equivalent. Another consequence is that creating >> particular legal constructions should be discouraged. I disagree with >> this. >> >> This is simply not the case with many json consuming clients. It's a >> nice idea but not how things work universally and that's exactly why >> the rules were hedged in the RFC. I have a couple of cases right now >> where I'm producing key order sensitive json for some (admittedly not >> very well designed) json consuming clients that are out of my control. >> > > > I understand the difference between "should" and "must". But there is > nothing that REQUIRES us to preserve key order or duplicate keys. If > you really need textual preservation, you should probably store the > data as text and convert it to json to do json-ish things to it. If > not, we're going to face huge demands to implement another type which > almost everyone but you will move to in rapid order because it > performs so much better. The strong consensus I have seen in > discussions at conferences and elsewhere is to go the way we're going, > instead. > > cheers > > andrew > > > I can see that both points of view are valid, in different contexts. Would be possible to have a boolean, such as 'strict' - so that unique & ordered was only imposed when strict was TRUE? Alternately, separate functions to allow the same choice? Cheers, Gavin
On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should > require hstore to do things like build >> Also, json_object is pretty weird to me, I'm not sure I see the >> advantage of a new serialization format, and I don't agree with the >> statement "but it is the caller's reponsibility to ensure that keys >> are not repeated.". > > This is pretty standard in the programming languages I know of which use > JSON. > >> I think the caller should have no such >> responsibility. Keys should be able to repeated. > > Apparently your experience with using JSON in practice has been fairly > different from mine; the projects I work on, the JSON is being > constantly converted back and forth to hashes and dictionaries, which > means that ordering is not preserved and keys have to be unique (or > become unique within one conversion cycle). I think, based on the > language of the RFC and common practice, that it's completely valid for > us to require unique keys within JSON-manipulation routines. Common practice? The internet is littered with complaints about documents being spontaneously re-ordered and or de-duplicated in various stacks. Other stacks provide mechanisms for explicit key order handling (see here: http://docs.python.org/2/library/json.html). Why do you think they did that? I use pg/JSON all over the place. In several cases I have to create documents with ordered keys because the parser on the other side wants them that way -- this is not a hypothetical argument. The current json serialization API handles that just fine and the hstore stuff coming down the pike will not. I guess that's a done deal based on 'performance'. I'm clearly not the only one to have complained about this though. merln
On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Putting it all together, I'd consider:
> *) dropping json_object (although maybe there is a case I'm not
thinking about)
> *) changing json_build function names to get the json prefix
> *) adding a json object constructor that takes two parallel arrays as
> arguments.
I was with you until the third idea. Huh?
I actually had a use case for this today, though with hstore, importing a fixed length record with something along the lines of:
hstore(
ARRAY['field 1', 'field 2', 'field 3'],
regexp_matches(fixed_field,'(.{4})(.{10})(.{5})')
)
__________________________________________________________________________________
Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management | RR Donnelley
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike.Blackwell@rrd.com
http://www.rrdonnelley.com
Merlin, > I use pg/JSON all over the place. In several cases I have to create > documents with ordered keys because the parser on the other side wants > them that way -- this is not a hypothetical argument. The current > json serialization API handles that just fine and the hstore stuff > coming down the pike will not. I guess that's a done deal based on > 'performance'. I'm clearly not the only one to have complained about > this though. It's not just a matter of "performance". It's the basic conflict of JSON as document format vs. JSON as data storage. For the latter, unique, unordered keys are required, or certain functionality isn't remotely possible: indexing, in-place key update, transformations, etc. XML went through the same thing, which is part of how we got a bunch of incompatible "dialects" of XML. Now, your use case does show us that there's a case to be made for still having text JSON even after we have binary JSON. There's a strong simplicity argument against that, though ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/14/2013 12:09 AM, Merlin Moncure wrote: > On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote: >> On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should >> require hstore to do things like build >>> Also, json_object is pretty weird to me, I'm not sure I see the >>> advantage of a new serialization format, and I don't agree with the >>> statement "but it is the caller's reponsibility to ensure that keys >>> are not repeated.". >> This is pretty standard in the programming languages I know of which use >> JSON. >> >>> I think the caller should have no such >>> responsibility. Keys should be able to repeated. >> Apparently your experience with using JSON in practice has been fairly >> different from mine; the projects I work on, the JSON is being >> constantly converted back and forth to hashes and dictionaries, which >> means that ordering is not preserved and keys have to be unique (or >> become unique within one conversion cycle). I think, based on the >> language of the RFC and common practice, that it's completely valid for >> us to require unique keys within JSON-manipulation routines. > Common practice? The internet is littered with complaints about > documents being spontaneously re-ordered and or de-duplicated in > various stacks. Other stacks provide mechanisms for explicit key > order handling (see here: http://docs.python.org/2/library/json.html). > Why do you think they did that? > > I use pg/JSON all over the place. In several cases I have to create > documents with ordered keys because the parser on the other side wants > them that way -- this is not a hypothetical argument. The current > json serialization API handles that just fine and the hstore stuff > coming down the pike will not. I guess we should not replace current JSON type with hstore based one, but add something json-like based on nested hstore instead. Maybe call it jsdoc or jdoc or jsobj or somesuch. For some time I was also pretty perplexed by by some PostgreSQL JSON type discussions where JSON was not really being a defined as the the type constructed from its string representation, and even not a string which results from serialising an existing javascript object, but rather a source code, which can be parsed into a structured type. So PostgreSQL "json" type is *not* a structured type like hstore is but is really a string type with a few syntax checks. Some of the json_* functions are then defined on top of this "json-source" type which treat this source as if it were actual structured type. It is kind of defining an "int-notation" type, which acts like an integer when added to another integer, but is required to also keep its original representation: select '1+1'::"int-notation" + 2; ==> 4 select '1+1'::"int-notation" ==> 1+1 > I guess that's a done deal based on > 'performance'. I'm clearly not the only one to have complained about > this though. I am pretty sure we can not move to internal object representation and preserve the current 'json source" behaviour. this is why I recommend not replacing json, but rather adding another built-in for real structured type. then you can keep using current json for the earlier-quoted uses of "processing instructions" and do real data manipulation on jsdoc/jsobj type. Also most of the current json functions should also be moved to work on jsobj instead with explicit cast from json to jsobj Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Nov 13, 2013, at 2:41 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: > Would be possible to have a boolean, such as 'strict' - so that unique & ordered was only imposed when strict was TRUE?Alternately, separate functions to allow the same choice? It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO. Best, David
On 11/13/2013 07:01 PM, Hannu Krosing wrote: > > I guess we should not replace current JSON type with hstore based > one, but add something json-like based on nested hstore instead. Well, that's two voices for that course of action. Interesting that I don't think I heard a single voice for this either at pgCon or pgOpen, although I spent large amounts of time at both talking to people about Json, so I'd be interested to hear more voices. It would actually simplify things in a way if we do that - we've been working on a way of doing this that wouldn't upset pg_upgrade. This would render that effort unnecessary. However it will complicate things for users who will have to choose between the data types, and function authors who will possibly have to write versions of functions to work with both types. > > Also most of the current json functions should also be moved to work on > jsobj instead with explicit cast from json to jsobj > Sure, we can overload them - that's probably the least of our worries. cheers andrew
On 11/13/2013 07:39 PM, David E. Wheeler wrote: > On Nov 13, 2013, at 2:41 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote: > >> Would be possible to have a boolean, such as 'strict' - so that unique & ordered was only imposed when strict was TRUE?Alternately, separate functions to allow the same choice? > It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO. I don't in the least understand how it could be a pretty printing option. If we move to a binary rep using the hstore stuff order will be destroyed and not stored anywhere, and duplicate keys will be lost. Once that's done, how would a pretty print option restore the lost info? cheers andrew
On Nov 13, 2013, at 4:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO. > > I don't in the least understand how it could be a pretty printing option. If we move to a binary rep using the hstore stufforder will be destroyed and not stored anywhere, and duplicate keys will be lost. Once that's done, how would a prettyprint option restore the lost info? I meant ordering the keys, usually in lexicographic order. I agree that preserving order is untenable. Best, David
On Wed, Nov 13, 2013 at 04:50:49PM -0800, David E. Wheeler wrote: > On Nov 13, 2013, at 4:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > >> It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO. > > > > I don't in the least understand how it could be a pretty printing option. > > If we move to a binary rep using the hstore stuff order will be destroyed > > and not stored anywhere, and duplicate keys will be lost. Once that's done, > > how would a pretty print option restore the lost info? > > I meant ordering the keys, usually in lexicographic order. I agree that preserving order is untenable. There is a canonical form. http://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00 A Canonical form would be very useful. Thats a bit trickier than sorting the keys and I don't know there is an accepted canonical form for json yet that can represent all json documents. (The canonical form is not the pretty form, but I think the key ordering should be the same.) It might be nice to have a more general canonical form if one emerges from somewhere that could encode any json. Since without something like this, hashing can only be well specified for the 'sensible subset of json' used in security protocols. Garick
On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > On 11/14/2013 12:09 AM, Merlin Moncure wrote: >> On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote: >>> On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should >>> require hstore to do things like build >>>> Also, json_object is pretty weird to me, I'm not sure I see the >>>> advantage of a new serialization format, and I don't agree with the >>>> statement "but it is the caller's reponsibility to ensure that keys >>>> are not repeated.". >>> This is pretty standard in the programming languages I know of which use >>> JSON. >>> >>>> I think the caller should have no such >>>> responsibility. Keys should be able to repeated. >>> Apparently your experience with using JSON in practice has been fairly >>> different from mine; the projects I work on, the JSON is being >>> constantly converted back and forth to hashes and dictionaries, which >>> means that ordering is not preserved and keys have to be unique (or >>> become unique within one conversion cycle). I think, based on the >>> language of the RFC and common practice, that it's completely valid for >>> us to require unique keys within JSON-manipulation routines. >> Common practice? The internet is littered with complaints about >> documents being spontaneously re-ordered and or de-duplicated in >> various stacks. Other stacks provide mechanisms for explicit key >> order handling (see here: http://docs.python.org/2/library/json.html). >> Why do you think they did that? >> >> I use pg/JSON all over the place. In several cases I have to create >> documents with ordered keys because the parser on the other side wants >> them that way -- this is not a hypothetical argument. The current >> json serialization API handles that just fine and the hstore stuff >> coming down the pike will not. > I guess we should not replace current JSON type with hstore based > one, but add something json-like based on nested hstore instead. > > Maybe call it jsdoc or jdoc or jsobj or somesuch. This is exactly what needs to be done, full stop (how about: hstore). It really comes down to this: changing the serialization behaviors that have been in production for 2 releases (three if you count the extension) is bad enough, but making impossible some legal json constructions which are currently possible is an unacceptable compatibility break. It's going to break applications I've currently put into production with no clear workaround. This is quite frankly not ok and and I'm calling foul. The RFC may claim that these constructions are dubious but that's irrelevant. It's up to the parser to decide that and when serializing you are not in control of the parser. Had the json type been stuffed into an extension, there would be a clearer path to get to where you want to go since we could have walled off the old functionality and introduced side by side API calls. As things stand now, I don't see a clean path to do that. > I use pg/JSON all over the place. In several cases I have to create > documents with ordered keys because the parser on the other side wants > them that way -- this is not a hypothetical argument. The current > json serialization API handles that just fine and the hstore stuff > coming down the pike will not. I guess that's a done deal based on > 'performance'. I'm clearly not the only one to have complained about > this though. It's not just a matter of "performance". It's the basic conflict of JSON as document format vs. JSON as data storage. For the latter, unique, unordered keys are required, or certain functionality isn't remotely possible: indexing, in-place key update, transformations, etc. On Wed, Nov 13, 2013 at 5:20 PM, Josh Berkus <josh@agliodbs.com> wrote: > It's not just a matter of "performance". It's the basic conflict of > JSON as document format vs. JSON as data storage. For the latter, > unique, unordered keys are required, or certain functionality isn't > remotely possible: indexing, in-place key update, transformations, etc. That's not very convincing. What *exactly* is impossible and why to you think it justifies breaking compatibility with current applications? The way forward seems pretty straightforward: given that hstore is getting nesting power and is moving closer to the json way of doing things it is essentially 'binary mode json'. I'm ok with de-duplication and key ordering when moving into that structure since it's opt in and doesn't break the serialization behaviors we have today. If you want to go further and unify the types then you have to go through the design work to maintain compatibility. Furthermore, I bet the performance argument isn't so clear cut either.The current json type is probably faster at bulk serialization precisely because you *dont* need to deduplicate and reorder keys: the serialization operates without context. It will certainly be much better for in place manipulations but it's not nearly as simple as you are making it out to be. merlin
On 11/14/2013 01:42 AM, Andrew Dunstan wrote: > > On 11/13/2013 07:01 PM, Hannu Krosing wrote: >> >> I guess we should not replace current JSON type with hstore based >> one, but add something json-like based on nested hstore instead. > > > Well, that's two voices for that course of action. I am not really "for" it (I would have liked to have a json_object/json_structure instead of json_string as the meaning of "json") but I think there is quite strong argument for not breaking backwards compatibility. > > Interesting that I don't think I heard a single voice for this either > at pgCon or pgOpen, I attended neither, but I did voice my preferences for _not_ having the "json-as-source-code" type on the mailing lists during previous json discussions. > although I spent large amounts of time at both talking to people about > Json, so I'd be interested to hear more voices. > > It would actually simplify things in a way if we do that - we've been > working on > a way of doing this that wouldn't upset pg_upgrade. This would render > that effort unnecessary. I wonder how hard it would be to rename current json to json_source and have a new nested-hstore based json ? > > However it will complicate things for users who will have to choose > between the data types, > and function authors who will possibly have to write versions of > functions to work with both types. You mostly want the functions for json-object type. This is supported by the fact that current functions on json-source treat it as json-object (for example key lookup gives you the value of latest key and not a list of all matching key values). You may want some new functions on json-source (maybe json_source_enumerate_key_values(json, key)) but the current ones are really for json-object. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 11/14/2013 12:20 AM, Josh Berkus wrote: > Merlin, > > >> I use pg/JSON all over the place. In several cases I have to create >> documents with ordered keys because the parser on the other side wants >> them that way -- this is not a hypothetical argument. The current >> json serialization API handles that just fine and the hstore stuff >> coming down the pike will not. I guess that's a done deal based on >> 'performance'. I'm clearly not the only one to have complained about >> this though. > It's not just a matter of "performance". It's the basic conflict of > JSON as document format vs. JSON as data storage. For the latter, > unique, unordered keys are required, or certain functionality isn't > remotely possible: indexing, in-place key update, transformations, etc. > > XML went through the same thing, which is part of how we got a bunch of > incompatible "dialects" of XML. > > Now, your use case does show us that there's a case to be made for still > having text JSON even after we have binary JSON. text-json could easily be a domain (text + check that it is convertible to json) maybe it is even possible to teach pg_upgrade to do this automatically > There's a strong simplicity argument against that, though ... I think it confuses most people, similar to how storing 1+1 as "processing instructions" instead of just evaluationg it and storing 2 :) OTOH we are in this mess now and have to solve the backwards compatibility somehow. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > This is supported by the fact that current functions on json-source > treat it as json-object (for example key lookup gives you the value > of latest key and not a list of all matching key values). yeah. hm. that's a good point. Maybe there's a middle ground here: I bet the compatibility issues would be minimized to an acceptable level if the 'xxx_to_json' functions maintained their current behaviors; they would construct the json type in a special internal mode that would behave like the current type does. In other words, the marshalling into binary structure could happen when: *) stored do a column in a table *) when any modifying routine is called, updating a key, value, etc *) manually via a function but not at cast time. This preserves compatibility for the important points and allows serialization of structures that are difficult with the binary mode variant. merln
On 11/14/2013 04:07 PM, Merlin Moncure wrote: > On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> >> I guess we should not replace current JSON type with hstore based >> one, but add something json-like based on nested hstore instead. >> >> Maybe call it jsdoc or jdoc or jsobj or somesuch. > This is exactly what needs to be done, full stop (how about: hstore). hstore has completely different i/o formats and thus has similar backwards compatibility problems. > It really comes down to this: changing the serialization behaviors It is really not "serialisation behaviours" as there is nothing you can sensibly serialise to have repeated keys. I agree that you can "generate" such JSON which would be valid input tu any json parser, but no JavaScript Object which really serializes to such JSON. > that have been in production for 2 releases (three if you count the > extension) is bad enough, but making impossible some legal json > constructions which are currently possible is an unacceptable > compatibility break. we should have disallowed this from the beginning and should have encourages using text as storage for JavaScript source code. > It's going to break applications I've currently > put into production with no clear workaround. we could rename the old json type during pg_upgrade, but this would likely break at least implicit casts in functions. > This is quite frankly > not ok and and I'm calling foul. The RFC may claim that these > constructions are dubious but that's irrelevant. It's up to the > parser to decide that and when serializing you are not in control of > the parser. You could choose a sane serializer ;) The main argument here is still weather "json" is source code or serialization result for JavaScript Object (Notation). > Had the json type been stuffed into an extension, there would be a > clearer path to get to where you want to go since we could have walled > off the old functionality and introduced side by side API calls. As > things stand now, I don't see a clean path to do that. >> I use pg/JSON all over the place. In several cases I have to create >> documents with ordered keys because the parser on the other side wants >> them that way -- this is not a hypothetical argument. But one could argue that this is not "json" either but rather some json-like input format for special parsers. Current recommendation is to use "text" for these kinds of things. >> The current >> json serialization API handles that just fine and the hstore stuff >> coming down the pike will not. I guess that's a done deal based on >> 'performance'. I'm clearly not the only one to have complained about >> this though. > It's not just a matter of "performance". It's the basic conflict of > JSON as document format vs. JSON as data storage. For the latter, > unique, unordered keys are required, or certain functionality isn't > remotely possible: indexing, in-place key update, transformations, etc. All these would be possible if we redefined json as another notation for XML instead of string representation of JavaScript Object :) And things could really be "in-place" only inside pl/language functions, as PostgreSQL is still MVCC. What should be faster is access to nested values, though I suspect that it is not significantly faster unless you have very large json documents. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 11/14/2013 05:06 PM, Merlin Moncure wrote: > On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> This is supported by the fact that current functions on json-source >> treat it as json-object (for example key lookup gives you the value >> of latest key and not a list of all matching key values). > yeah. hm. that's a good point. > > Maybe there's a middle ground here: I bet the compatibility issues > would be minimized to an acceptable level if the 'xxx_to_json' > functions maintained their current behaviors; they would construct the > json type in a special internal mode that would behave like the > current type does. Do you have any xxx_to_json usage which can generate a field with multiple equal keys ? Or is it just about preserving order ? > In other words, the marshalling into binary > structure could happen when: > > *) stored do a column in a table > *) when any modifying routine is called, updating a key, value, etc > *) manually via a function > > but not at cast time. This preserves compatibility for the important > points and allows serialization of structures that are difficult with > the binary mode variant. Seems like this would not play nice with how PostgreSQL type system work in general, but could be a way forward if you say that you really do not need to store the order-preserving, multi-valued json. But in this case it could also be possible for these function to just generate json-format "text", and with proper casts this would act exactly as you describe above, no ? -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Thu, Nov 14, 2013 at 10:54 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > On 11/14/2013 05:06 PM, Merlin Moncure wrote: >> On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >>> This is supported by the fact that current functions on json-source >>> treat it as json-object (for example key lookup gives you the value >>> of latest key and not a list of all matching key values). >> yeah. hm. that's a good point. >> >> Maybe there's a middle ground here: I bet the compatibility issues >> would be minimized to an acceptable level if the 'xxx_to_json' >> functions maintained their current behaviors; they would construct the >> json type in a special internal mode that would behave like the >> current type does. > > Do you have any xxx_to_json usage which can generate a field with > multiple equal keys ? Absolutely -- that's what I've been saying all along. For example: IIRC the end consumer is jqgrid, although the structure format may be being done to satisfy some intermediate transformations perhaps in GWT or in the browser itself. The point is I didn't define the structure (I think it sucks too), it was given to me to create and I did. The object's dynamic keys and values are moved into json structure by passing two parallel arrays into a userland function similar to what Andrew is proposing with json_build functionality. { "classDisplayName": null, "rows": [ { "PropertyName": "xxx", "Row": 1, "Group":"Executive Dashboard", "MetricName": "Occupancy", "2012": "95.4%", "Q2": "96.5%", "Q3": "96.3%", "Q4": "94.8%", "2013": "95.1%", "Q2": "94.1%", "Q3": "96.0%", "Q4": "96.1%" }, { "PropertyName": "xxx", "Row": 2, "Group":"Executive Dashboard", "MetricName": "Occupancy", "2012": "95.9%", "Q2": "97.3%", "Q3": "95.7%", "Q4": "95.2%", "2013": "93.9%", "Q2": "93.4%", "Q3": "95.3%", "Q4": "95.1%" } ] } >> but not at cast time. This preserves compatibility for the important >> points and allows serialization of structures that are difficult with >> the binary mode variant. > > Seems like this would not play nice with how PostgreSQL type system work > in general, but could be a way forward if you say that you really do not > need > to store the order-preserving, multi-valued json. Yes, exactly. I'm OK with simplifying the structure for storage purposes because in that context postgres is the parser and gets to decide what the precise behaviors are. Simplifying the stored structures during upgrade is an OK concession to make, I think. It is not safe to assume the structure should be simplified when serializing. > But in this case it could also be possible for these function to just > generate > json-format "text", and with proper casts this would act exactly as you > describe > above, no ? I think so. if I'm following you correctly. Maybe you get the best of both worlds and (mostly) maintaining compatibility by deferring the decomposition into binary structure in certain contexts. I'd even throw in the equality operator (which, thankfully, we haven't defined yet) as a place where decomposition could happen. Pretty much any scenario that isn't involved in raw assembly and output. merlin
On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > This is exactly what needs to be done, full stop (how about: hstore). > It really comes down to this: changing the serialization behaviors > that have been in production for 2 releases (three if you count the > extension) is bad enough, but making impossible some legal json > constructions which are currently possible is an unacceptable > compatibility break. It's going to break applications I've currently > put into production with no clear workaround. This is quite frankly > not ok and and I'm calling foul. The RFC may claim that these > constructions are dubious but that's irrelevant. It's up to the > parser to decide that and when serializing you are not in control of > the parser. The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something to beguaranteed? Just because people have come to depend on something doesn’t mean we can’t change it. It’s one thing if wesaid this was a feature you could depend on, but AFAIK we haven’t. And frankly, the dupes have caused problems for someof my colleagues at work. To me, it’s a bug (or, at best, a mis-feature) that causes more issues than it prevents. In my experience, no JSON parser guarantees key order or duplication. You can’t have dupes and there is no ordering in aPerl hash, Objective-C NSDictionary, or JavaScript object. There is of course order and there can be dupes in a JSON string,but not in the objects built from it. If you go in and out of a parser, dupes are eliminated and key order is notpreserved. I expect the same from JSON storage. With no guarantees of preserved ordering or duplication, and with no formal expectation of such by JSON parsers written forvarious programming languages, I think there is little to be lost by removing those aspects of the JSON type. For those(hopefully rare) situations where such expectations exist, the JSON should be stored as text, as Hannu suggests. My $0.02. Best, David
On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler <david@justatheory.com> wrote: > On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> This is exactly what needs to be done, full stop (how about: hstore). >> It really comes down to this: changing the serialization behaviors >> that have been in production for 2 releases (three if you count the >> extension) is bad enough, but making impossible some legal json >> constructions which are currently possible is an unacceptable >> compatibility break. It's going to break applications I've currently >> put into production with no clear workaround. This is quite frankly >> not ok and and I'm calling foul. The RFC may claim that these >> constructions are dubious but that's irrelevant. It's up to the >> parser to decide that and when serializing you are not in control of >> the parser. > > The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something tobe guaranteed? It doesn't, but the row_to_json function has a very clear mechanism of action. And, 'not being documented' is not the standard for latitude to make arbitrary changes to existing function behaviors. > In my experience, no JSON parser guarantees key order or duplication. I found one in about two seconds. http://docs.python.org/2/library/json.html "object_pairs_hook, if specified will be called with the result of every JSON object decoded with an ordered list of pairs. The return value ofobject_pairs_hook will be used instead of the dict. This feature can be used to implement custom decoders that rely on the order that the key and value pairs are decoded (for example, collections.OrderedDict() will remember the order of insertion). If object_hook is also defined, the object_pairs_hooktakes priority." That makes the rest of your argument moot. Plus, I quite clearly am dealing with parsers that do. merlin
On 11/14/2013 08:17 PM, Merlin Moncure wrote: > On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler > <david@justatheory.com> wrote: >> On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >>> This is exactly what needs to be done, full stop (how about: hstore). >>> It really comes down to this: changing the serialization behaviors >>> that have been in production for 2 releases (three if you count the >>> extension) is bad enough, but making impossible some legal json >>> constructions which are currently possible is an unacceptable >>> compatibility break. It's going to break applications I've currently >>> put into production with no clear workaround. This is quite frankly >>> not ok and and I'm calling foul. The RFC may claim that these >>> constructions are dubious but that's irrelevant. It's up to the >>> parser to decide that and when serializing you are not in control of >>> the parser. >> The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something tobe guaranteed? > It doesn't, but the row_to_json function has a very clear mechanism of > action. And, 'not being documented' is not the standard for latitude > to make arbitrary changes to existing function behaviors. the whole hash*() function family was changed based on "not documented" premise, so we do have a precedent . > >> In my experience, no JSON parser guarantees key order or duplication. > I found one in about two seconds. http://docs.python.org/2/library/json.html > > "object_pairs_hook, if specified will be called with the result of > every JSON object decoded with an ordered list of pairs. The return > value ofobject_pairs_hook will be used instead of the dict. This > feature can be used to implement custom decoders that rely on the > order that the key and value pairs are decoded (for example, > collections.OrderedDict() will remember the order of insertion). If > object_hook is also defined, the object_pairs_hooktakes priority." > > That makes the rest of your argument moot. Plus, I quite clearly am > dealing with parsers that do. I am sure you could also devise an json encoding scheme where white space is significant ;) The question is, how much of it should json *type* support. As discussed in other thread, most of your requirements would be met by having json/row/row set-to-text serializer functions which output json-formatted "text". Then if you actually want to save this as easy to manipulate json document, you can save this "text" to a field of type "json", which does de-duplication and loses order. So my suggestion is to upgrade existing json data type to text - or maybe json_text with format check - when upgrading to 9.4, to change current function which output "json" to output "text" and have new "json" type which stores proper JavaScript Object - like structured data. I would like to go a step further and have it automatically support not only the json data types as data but all postgresql data types by including type oid in the binary encoding, but this is probably not something for "json" but rather for a new "pgdoc" data type in 9.5 Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
Hannu Krosing-5 wrote > On 11/14/2013 08:17 PM, Merlin Moncure wrote: >> On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler >> < > david@ > > wrote: >>> On Nov 14, 2013, at 7:07 AM, Merlin Moncure < > mmoncure@ > > wrote: >>> >>>> This is exactly what needs to be done, full stop (how about: hstore). >>>> It really comes down to this: changing the serialization behaviors >>>> that have been in production for 2 releases (three if you count the >>>> extension) is bad enough, but making impossible some legal json >>>> constructions which are currently possible is an unacceptable >>>> compatibility break. The current json format is a minimally conforming (i.e., does not enforce the "should not contain duplicates" suggestion) structured json validating type that stores its input as-is once validated. Its presence is going to probably cause difficulties with function API for reasons already mentioned but its place in core type-library is already firmly established. Andrew's API additions seem like good things to have for this type. I haven't seen any comments on this but do these functions facilitate creating json that can have duplicates and that maintain order? Even if we accept input to json with these limitations we are not obligated to make our own json output minimally conforming - though we should at maintain such if it is already in place. > So my suggestion is to upgrade existing json data type to > text - or maybe json_text with format check - when upgrading > to 9.4, to change current function which output "json" to > output "text" and have new "json" type which stores proper > JavaScript Object - like structured data. Technically a down-grade but anyway... How does this work with a pg_dump/pg_restore upgrade? If we want to have maximally conforming json type(s) we can still create them. I'd say we'd still want two versions, similar in a way to how we have "bytea" and "text" even though any text can technically be stored like "bytea". The constructor API for both would want to be identical with the only real difference being that text->json_source would be layout preserving (i.e., validation only) while text->json_binary would be a true parsing conversion. Likewise json_source->text would output the same input while json_binary->text would output the canonical form (pretty-printing and such would need to be initiated via functions). If things are going to be a little more complex anyway why not just go and toss in the kitchen sink too? This way we provide maximal flexibility. From a development perspective some features (indexes, equality, in-place updates and related modification API) may only make sense on a subset of the available types but trade-offs are a fact of life. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778406.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Thu, Nov 14, 2013 at 1:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > On 11/14/2013 08:17 PM, Merlin Moncure wrote: >> On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler >> <david@justatheory.com> wrote: >>> On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> >>>> This is exactly what needs to be done, full stop (how about: hstore). >>>> It really comes down to this: changing the serialization behaviors >>>> that have been in production for 2 releases (three if you count the >>>> extension) is bad enough, but making impossible some legal json >>>> constructions which are currently possible is an unacceptable >>>> compatibility break. It's going to break applications I've currently >>>> put into production with no clear workaround. This is quite frankly >>>> not ok and and I'm calling foul. The RFC may claim that these >>>> constructions are dubious but that's irrelevant. It's up to the >>>> parser to decide that and when serializing you are not in control of >>>> the parser. >>> The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something tobe guaranteed? >> It doesn't, but the row_to_json function has a very clear mechanism of >> action. And, 'not being documented' is not the standard for latitude >> to make arbitrary changes to existing function behaviors. > the whole hash*() function family was changed based on "not documented" > premise, so we do have a precedent . >> >>> In my experience, no JSON parser guarantees key order or duplication. >> I found one in about two seconds. http://docs.python.org/2/library/json.html >> >> "object_pairs_hook, if specified will be called with the result of >> every JSON object decoded with an ordered list of pairs. The return >> value ofobject_pairs_hook will be used instead of the dict. This >> feature can be used to implement custom decoders that rely on the >> order that the key and value pairs are decoded (for example, >> collections.OrderedDict() will remember the order of insertion). If >> object_hook is also defined, the object_pairs_hooktakes priority." >> >> That makes the rest of your argument moot. Plus, I quite clearly am >> dealing with parsers that do. > I am sure you could also devise an json encoding scheme > where white space is significant ;) > > The question is, how much of it should json *type* support. > > As discussed in other thread, most of your requirements > would be met by having json/row/row set-to-text serializer > functions which output json-formatted "text". No, that would not work putting aside the fact it would require rewriting heaps of code. What I do now inside the json wrapping routines is create things like { "x": [ {dynamic object}, {dynamic object}, ... ], "y": ..., ... } The only way to do it is to build 'dynamic object' into json in advance of the outer xxx_to_json call. The 'dynamic object' is created out of a json builder that takes a paired array -- basically a variant of Andrew's 'json_build' upthread. If the 'json serializer' outputted text, the 'outer' to_json call would then re-escape the object. I can't use hstore for that purpose precisely because of the transformations it does on the object. Stepping back, I'm using json serialization as a kind of 'supercharged crosstab'. To any client that can parse json, json serialization completely displaces crosstabbing -- it's superior in every way. I am, if you may, kind of leading research efforts in the area and I can tell you with absolute certainty that breaking this behavior is a mistake. Forcing hstore-ish output mechanisms removes the ability to handle certain important edge cases that work just fine today. If that ability was taken away, it would be a very bitter pill for me to swallow and would have certain ramifications for me professionally; I went out on a pretty big limb and pushed pg/json aggressively (over strenuous objection) in an analytics product which is now in the final stages of beta testing. I would hate to see the conclusion of the case study be "Ultimately we had to migrate the code back to Hibernate due to compatibility issues". Here are the options on the table: 1) convert existing json type to binary flavor (notwithstanding objections) 2) maintain side by side types, one representing binary, one text. unfortunately, i think the text one must get the name 'json' due to unfortunate previous decision. 3) merge the behaviors into a single type and get the best of both worlds (as suggested upthread). I think we need to take a *very* hard look at #3 before exploring #1 or #2: Haven't through it through yet but it may be possible to handle this in such a way that will be mostly transparent to the end user and may have other benefits such as a faster path for serialization. merlin
On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > Here are the options on the table: > 1) convert existing json type to binary flavor (notwithstanding objections) > 2) maintain side by side types, one representing binary, one text. > unfortunately, i think the text one must get the name 'json' due to > unfortunate previous decision. > 3) merge the behaviors into a single type and get the best of both > worlds (as suggested upthread). > > I think we need to take a *very* hard look at #3 before exploring #1 > or #2: Haven't through it through yet but it may be possible to handle > this in such a way that will be mostly transparent to the end user and > may have other benefits such as a faster path for serialization. If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (see http://theory.so/pg/2013/10/23/testing-nested-hstore/and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a coupleof examples) --- without undue maintenance overhead, then great. I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at $work. Best, David
On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote: > On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote: > >> Here are the options on the table: >> 1) convert existing json type to binary flavor (notwithstanding objections) >> 2) maintain side by side types, one representing binary, one text. >> unfortunately, i think the text one must get the name 'json' due to >> unfortunate previous decision. >> 3) merge the behaviors into a single type and get the best of both >> worlds (as suggested upthread). >> >> I think we need to take a *very* hard look at #3 before exploring #1 >> or #2: Haven't through it through yet but it may be possible to handle >> this in such a way that will be mostly transparent to the end user and >> may have other benefits such as a faster path for serialization. > > If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ fora couple of examples) --- without undue maintenance overhead, then great. > > I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at $work. Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: *) 'json' type internally has a binary as well a text representation. The text representation is basically the current type behavior (duduplicated unordered). The binary representation is the hstore-ish variant. The text mode is discarded when it's deemed no longer appropriate to be needed, and, once gone, can never be rebuilt as it was. *) only the binary internal representation ever gets stored to disk (or anything else). *) the text mode is preferred for output if it is there. otherwise, a deduplicated, reordered text representation is generated *) When literal text is casted to json, the binary structure is built up and kept alongside binary mode. So, if you went: 'select '{"a": 1, "a": 2}'::json', you'd get the same thing back. (This is how it works now.). but, if you went: 'insert into foo select '{"a": 1, "a": 2}'::json returning *', you'd get {"a": 2} backessentially (although technically that would be a kind of race). *) When the json is stored to table, the text representation gets immediately discarded on the basis that it's no longer the true representation of the data. *) Ditto when making any equality operation (not as sure on this point). *) Ditto when doing any operation that mutates the structure in any way. the text representation is immutable except during serialization and if it gets invalidated it gets destroyed. *) New API function: json_simplify(); or some such. It reorders and dedups from user's point of view (but really just kills off the text representation) *) once the text mode is gone, you get basically the proposed 'hstore' behavior. *) serialization functions are generally used in contexts that do not store anything but get output as query data. They create *only* the text mode. However, if the resultant json is stored anywhere, the text mode is destroyed and replaced with binary variant. This is both a concession to the current behavior and an optimization of 'serialization-in-query' for which I think the binary mode is pessimal performance wise. so, xxx_to_json serialization functions work exactly as they do now which fixes my problem essentially. *) if you are unhappy with duplicates in the above, just get use to calling json_simpify() on the serialized json (or deal with in on the client side). This is all pretty glossy, but maybe there is a way forward... merlin
On 11/15/2013 03:25 PM, Merlin Moncure wrote: > On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote: >> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >>> Here are the options on the table: >>> 1) convert existing json type to binary flavor (notwithstanding objections) >>> 2) maintain side by side types, one representing binary, one text. >>> unfortunately, i think the text one must get the name 'json' due to >>> unfortunate previous decision. >>> 3) merge the behaviors into a single type and get the best of both >>> worlds (as suggested upthread). >>> >>> I think we need to take a *very* hard look at #3 before exploring #1 >>> or #2: Haven't through it through yet but it may be possible to handle >>> this in such a way that will be mostly transparent to the end user and >>> may have other benefits such as a faster path for serialization. >> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ fora couple of examples) --- without undue maintenance overhead, then great. >> >> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at$work. > Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: > > *) 'json' type internally has a binary as well a text representation. > The text representation is basically the current type behavior > (duduplicated unordered). The binary representation is the hstore-ish > variant. The text mode is discarded when it's deemed no longer > appropriate to be needed, and, once gone, can never be rebuilt as it > was. > > *) only the binary internal representation ever gets stored to disk > (or anything else). > > *) the text mode is preferred for output if it is there. otherwise, a > deduplicated, reordered text representation is generated > > *) When literal text is casted to json, the binary structure is built > up and kept alongside binary mode. So, if you went: 'select '{"a": > 1, "a": 2}'::json', you'd get the same thing back. (This is how > it works now.). but, if you went: 'insert into foo select '{"a": 1, > "a": 2}'::json returning *', you'd get {"a": 2} back essentially > (although technically that would be a kind of race). > > *) When the json is stored to table, the text representation gets > immediately discarded on the basis that it's no longer the true > representation of the data. > > *) Ditto when making any equality operation (not as sure on this point). > > *) Ditto when doing any operation that mutates the structure in any > way. the text representation is immutable except during serialization > and if it gets invalidated it gets destroyed. > > *) New API function: json_simplify(); or some such. It reorders and > dedups from user's point of view (but really just kills off the text > representation) > > *) once the text mode is gone, you get basically the proposed 'hstore' behavior. > > *) serialization functions are generally used in contexts that do not > store anything but get output as query data. They create *only* the > text mode. However, if the resultant json is stored anywhere, the > text mode is destroyed and replaced with binary variant. This is both > a concession to the current behavior and an optimization of > 'serialization-in-query' for which I think the binary mode is pessimal > performance wise. so, xxx_to_json serialization functions work > exactly as they do now which fixes my problem essentially. > > *) if you are unhappy with duplicates in the above, just get use to > calling json_simpify() on the serialized json (or deal with in on the > client side). > > This is all pretty glossy, but maybe there is a way forward... > It's making my head hurt, to be honest, and it sounds like a recipe for years and years of inconsistencies and bugs. I don't want to have two types, but I think I'd probably rather have two clean types than this. I can't imagine it being remotely acceptable to have behaviour depend in whether or not something was ever stored, which is what this looks like. cheers andrew > >
On Fri, Nov 15, 2013 at 2:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > > On 11/15/2013 03:25 PM, Merlin Moncure wrote: >> >> On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> >> wrote: >>> >>> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >>> >>>> Here are the options on the table: >>>> 1) convert existing json type to binary flavor (notwithstanding >>>> objections) >>>> 2) maintain side by side types, one representing binary, one text. >>>> unfortunately, i think the text one must get the name 'json' due to >>>> unfortunate previous decision. >>>> 3) merge the behaviors into a single type and get the best of both >>>> worlds (as suggested upthread). >>>> >>>> I think we need to take a *very* hard look at #3 before exploring #1 >>>> or #2: Haven't through it through yet but it may be possible to handle >>>> this in such a way that will be mostly transparent to the end user and >>>> may have other benefits such as a faster path for serialization. >>> >>> If it’s possible to preserve order and still get the advantages of binary >>> representation --- which are substantial (see >>> http://theory.so/pg/2013/10/23/testing-nested-hstore/ and >>> http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of >>> examples) --- without undue maintenance overhead, then great. >>> >>> I am completely opposed to duplicate key preservation in JSON, though. It >>> has caused us a fair number of headaches at $work. >> >> Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: >> >> *) 'json' type internally has a binary as well a text representation. >> The text representation is basically the current type behavior >> (duduplicated unordered). The binary representation is the hstore-ish >> variant. The text mode is discarded when it's deemed no longer >> appropriate to be needed, and, once gone, can never be rebuilt as it >> was. >> >> *) only the binary internal representation ever gets stored to disk >> (or anything else). >> >> *) the text mode is preferred for output if it is there. otherwise, a >> deduplicated, reordered text representation is generated >> >> *) When literal text is casted to json, the binary structure is built >> up and kept alongside binary mode. So, if you went: 'select '{"a": >> 1, "a": 2}'::json', you'd get the same thing back. (This is how >> it works now.). but, if you went: 'insert into foo select '{"a": 1, >> "a": 2}'::json returning *', you'd get {"a": 2} back essentially >> (although technically that would be a kind of race). >> >> *) When the json is stored to table, the text representation gets >> immediately discarded on the basis that it's no longer the true >> representation of the data. >> >> *) Ditto when making any equality operation (not as sure on this point). >> >> *) Ditto when doing any operation that mutates the structure in any >> way. the text representation is immutable except during serialization >> and if it gets invalidated it gets destroyed. >> >> *) New API function: json_simplify(); or some such. It reorders and >> dedups from user's point of view (but really just kills off the text >> representation) >> >> *) once the text mode is gone, you get basically the proposed 'hstore' >> behavior. >> >> *) serialization functions are generally used in contexts that do not >> store anything but get output as query data. They create *only* the >> text mode. However, if the resultant json is stored anywhere, the >> text mode is destroyed and replaced with binary variant. This is both >> a concession to the current behavior and an optimization of >> 'serialization-in-query' for which I think the binary mode is pessimal >> performance wise. so, xxx_to_json serialization functions work >> exactly as they do now which fixes my problem essentially. >> >> *) if you are unhappy with duplicates in the above, just get use to >> calling json_simpify() on the serialized json (or deal with in on the >> client side). >> >> This is all pretty glossy, but maybe there is a way forward... >> > > > It's making my head hurt, to be honest, and it sounds like a recipe for > years and years of inconsistencies and bugs. > > I don't want to have two types, but I think I'd probably rather have two > clean types than this. I can't imagine it being remotely acceptable to have > behaviour depend in whether or not something was ever stored, which is what > this looks like. Well, maybe so. My main gripe with the 'two types' solutions is that: 1) current type is already in core (that is, not an extension). In hindsight, I think this was a huge mistake. 2) current type has grabbed the 'json' type name and the 'json_xxx' API. 3) current type is getting used all over the place 'Two types' means that (AIUI) you can't mess around with the existing API too much. And the new type (due out in 2016?) will be something of a second citizen. The ramifications of dealing with the bifurcation is what makes *my* head hurt. Every day the json stuff is getting more and more widely adopted. 9.4 isn't going to drop until 2014 best case and it won't be widely deployed in the enterprise until 2015 and beyond. So you're going to have a huge code base operating on the 'legacy' json type. merlin
On 11/15/2013 12:25 PM, Merlin Moncure wrote: > Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: > > *) 'json' type internally has a binary as well a text representation. > The text representation is basically the current type behavior <snip long detailed explanation of behavior-dependant type> That's not at all workable. Users would be completely unable to predict or understand the JSON type and how it acts. That's not just violating POLS; that's bashing POLS' head in with a shovel. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Fri, Nov 15, 2013 at 2:54 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 11/15/2013 12:25 PM, Merlin Moncure wrote: >> Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: >> >> *) 'json' type internally has a binary as well a text representation. >> The text representation is basically the current type behavior > > <snip long detailed explanation of behavior-dependant type> > > That's not at all workable. Users would be completely unable to predict > or understand the JSON type and how it acts. That's not just violating > POLS; that's bashing POLS' head in with a shovel. All right: make a new type then, and leave the current one alone please. merlin
On 2013-11-15 12:54:53 -0800, Josh Berkus wrote: > On 11/15/2013 12:25 PM, Merlin Moncure wrote: > > Kinda yes, kinda no. Here's a rough sketch of what I'm thinking: > > > > *) 'json' type internally has a binary as well a text representation. > > The text representation is basically the current type behavior > > <snip long detailed explanation of behavior-dependant type> > > That's not at all workable. Users would be completely unable to predict > or understand the JSON type and how it acts. That's not just violating > POLS; that's bashing POLS' head in with a shovel. It's also not currently possible to implement such a behaviour inside a type's functions. You'd need core code cooperation. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
On Nov 15, 2013, at 12:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > It's making my head hurt, to be honest, and it sounds like a recipe for years and years of inconsistencies and bugs. > > I don't want to have two types, but I think I'd probably rather have two clean types than this. I can't imagine it beingremotely acceptable to have behaviour depend in whether or not something was ever stored, which is what this looks like. I disklike having two types (no, three -- there is hstore, too!). But if there is consensus for it (and I am not at all convincedthat there is at this point), I can live with it. Docs would have to be pretty explicit, though. David
On 11/15/2013 01:12 PM, David E. Wheeler wrote: > On Nov 15, 2013, at 12:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > >> It's making my head hurt, to be honest, and it sounds like a recipe for years and years of inconsistencies and bugs. >> >> I don't want to have two types, but I think I'd probably rather have two clean types than this. I can't imagine it beingremotely acceptable to have behaviour depend in whether or not something was ever stored, which is what this looks like. > > I disklike having two types (no, three -- there is hstore, too!). But if there is consensus for it (and I am not at allconvinced that there is at this point), I can live with it. Docs would have to be pretty explicit, though. I would be happy to do a survey on how common key ordering and/or duplicate keys are in postgresql+json. However, I'm not clear on what set of survey responses would decide us in either direction. Even as a pool of one, Merlin's case is a pretty persuasive example ... and, as he points out, there will be applications built around 9.3's JSON which havent even been written yet. I believe this was a danger we recognized when we added the JSON type, including the possibility that a future binary type might need to be a separate type due to compatibility issues. The only sad thing is the naming; it would be better for the new type to carry the JSON name in the future, but there's no way to make that work that I can think of. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Merlin Moncure-2 wrote >> I don't want to have two types, but I think I'd probably rather have two >> clean types than this. I can't imagine it being remotely acceptable to >> have >> behaviour depend in whether or not something was ever stored, which is >> what >> this looks like. > > Well, maybe so. My main gripe with the 'two types' solutions is that: > 1) current type is already in core (that is, not an extension). In > hindsight, I think this was a huge mistake. > 2) current type has grabbed the 'json' type name and the 'json_xxx' API. > 3) current type is getting used all over the place > > 'Two types' means that (AIUI) you can't mess around with the existing > API too much. And the new type (due out in 2016?) will be something of > a second citizen. The ramifications of dealing with the bifurcation > is what makes *my* head hurt. Every day the json stuff is getting > more and more widely adopted. 9.4 isn't going to drop until 2014 best > case and it won't be widely deployed in the enterprise until 2015 and > beyond. So you're going to have a huge code base operating on the > 'legacy' json type. > > merlin The current type can store the exact same data as what a hash-like type could store. It can also store stuff a hash-like type would not be able to store. From my reading the main reason for adding the new hash-like type would be to increase the performance characteristics of using said type. So: 1) if reasonable performance can be had with the current type the new type would be unnecessary 2) if #1 is not possible then the new type trades of leniency in format for performance improvements One implication of #2 is that existing json that wants the improved performance will need to undergo a full-table rewrite in order to be converted. Both output textual representations are identical and function overloading and API should be able to maintained substantially identical between the two types. David J -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778628.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote: > > I believe this was a danger we recognized when we added the JSON type, > including the possibility that a future binary type might need to be a > separate type due to compatibility issues. The only sad thing is the > naming; it would be better for the new type to carry the JSON name in > the future, but there's no way to make that work that I can think of. > > -- > Josh Berkus > PostgreSQL Experts Inc. > http://pgexperts.com > What about a GUC for json version? Then you could choose and they could both be call json. Regards, Ken
"ktm@rice.edu" <ktm@rice.edu> writes: > On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote: >> I believe this was a danger we recognized when we added the JSON type, >> including the possibility that a future binary type might need to be a >> separate type due to compatibility issues. The only sad thing is the >> naming; it would be better for the new type to carry the JSON name in >> the future, but there's no way to make that work that I can think of. > What about a GUC for json version? Then you could choose and they > could both be call json. GUCs that change user-visible semantics have historically proven to be much less good ideas than they seem at first glance. regards, tom lane
On 11/15/2013 04:53 PM, Tom Lane wrote: > "ktm@rice.edu" <ktm@rice.edu> writes: >> On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote: >>> I believe this was a danger we recognized when we added the JSON type, >>> including the possibility that a future binary type might need to be a >>> separate type due to compatibility issues. The only sad thing is the >>> naming; it would be better for the new type to carry the JSON name in >>> the future, but there's no way to make that work that I can think of. >> What about a GUC for json version? Then you could choose and they >> could both be call json. > GUCs that change user-visible semantics have historically proven to be > much less good ideas than they seem at first glance. > > Yeah, it would be a total foot gun here I think. I've come to the conclusion that the only possible solution is to have a separate type. That's a bit sad, but there it is. The upside is that this will make the work Teodor has mentioned simpler. (Desperately making lemonade from lemons here.) cheers andrew
On Nov 15, 2013, at 2:02 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > Yeah, it would be a total foot gun here I think. > > I've come to the conclusion that the only possible solution is to have a separate type. That's a bit sad, but there itis. The upside is that this will make the work Teodor has mentioned simpler. (Desperately making lemonade from lemons here.) Fine. My bikeshedding: Call the new type "jsonb". “B” for “binary.” Also, the old one is implicitly "jsona". Get it? David
On 11/15/2013 09:25 PM, Merlin Moncure wrote: > On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote: >> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote: >> >>> Here are the options on the table: >>> 1) convert existing json type to binary flavor (notwithstanding objections) >>> 2) maintain side by side types, one representing binary, one text. >>> unfortunately, i think the text one must get the name 'json' due to >>> unfortunate previous decision. >>> 3) merge the behaviors into a single type and get the best of both >>> worlds (as suggested upthread). >>> >>> I think we need to take a *very* hard look at #3 before exploring #1 >>> or #2: Haven't through it through yet but it may be possible to handle >>> this in such a way that will be mostly transparent to the end user and >>> may have other benefits such as a faster path for serialization. >> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ fora couple of examples) --- without undue maintenance overhead, then great. >> >> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at$work. Let's just change the current json-constructing functions return type to json_text which is exactly like text with 2 extra properties: 1) it is syntax-checked for valid json (that is it can be cast to json) and 2) if included in outer json as data, it is included directly and is not quoted like text With just these two it should possible to have the following a) Merlin and others can keep (ab)using json_text as this wonderfully versatile format for feeding json parsers and visualiserswhich accept duplicates and consider order significant b) cast this to binary json object if de-duplication and fast access to internals is needed I do not think we need anything else for this As far as I understand merlin is mostly ok with stored json being normalised and the problem is just with constructing "extended" json (a.k.a. "processing instructions") to be used as source for specialised parsers and renderers. -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >>>> I think we need to take a *very* hard look at #3 before exploring #1 >>>> or #2: Haven't through it through yet but it may be possible to handle >>>> this in such a way that will be mostly transparent to the end user and >>>> may have other benefits such as a faster path for serialization. >>> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ fora couple of examples) --- without undue maintenance overhead, then great. >>> >>> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at$work. > Let's just change the current json-constructing functions return type to > json_text which is exactly like text with 2 extra properties: > > 1) it is syntax-checked for valid json (that is it can be cast to json) > > and > > 2) if included in outer json as data, it is included directly and is not > quoted like text > > > With just these two it should possible to have the following > > a) Merlin and others can keep (ab)using json_text as this > wonderfully versatile format for feeding json parsers and > visualisers which accept duplicates and consider order significant > > b) cast this to binary json object if de-duplication and fast access to > internals is needed > > I do not think we need anything else for this I think you may be on to something here. This might also be a way opt-in to fast(er) serialization (upthread it was noted this is unimportant; I'm skeptical). I deeply feel that two types is not the right path but I'm pretty sure that this can be finessed. > As far as I understand merlin is mostly ok with stored json being > normalised and the problem is just with constructing "extended" > json (a.k.a. "processing instructions") to be used as source for > specialised parsers and renderers. yes, this is correct. merlin
On 11/15/2013 02:59 PM, Merlin Moncure wrote: > On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > I think you may be on to something here. This might also be a way > opt-in to fast(er) serialization (upthread it was noted this is > unimportant; I'm skeptical). I deeply feel that two types is not the > right path but I'm pretty sure that this can be finessed. > >> As far as I understand merlin is mostly ok with stored json being >> normalised and the problem is just with constructing "extended" >> json (a.k.a. "processing instructions") to be used as source for >> specialised parsers and renderers. Thing is, I'm not particularly concerned about *Merlin's* specific use case, which there are ways around. What I am concerned about is that we may have users who have years of data stored in JSON text fields which won't survive an upgrade to binary JSON, because we will stop allowing certain things (ordering, duplicate keys) which are currently allowed in those columns. At the very least, if we're going to have that kind of backwards compatibilty break we'll want to call the new version 10.0. That's why naming old JSON as "json_text" won't work; it'll be a hardened roadblock to upgrading. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Josh Berkus wrote > On 11/15/2013 02:59 PM, Merlin Moncure wrote: >> On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing < > hannu@ > > wrote: >> I think you may be on to something here. This might also be a way >> opt-in to fast(er) serialization (upthread it was noted this is >> unimportant; I'm skeptical). I deeply feel that two types is not the >> right path but I'm pretty sure that this can be finessed. >> >>> As far as I understand merlin is mostly ok with stored json being >>> normalised and the problem is just with constructing "extended" >>> json (a.k.a. "processing instructions") to be used as source for >>> specialised parsers and renderers. > > Thing is, I'm not particularly concerned about *Merlin's* specific use > case, which there are ways around. What I am concerned about is that we > may have users who have years of data stored in JSON text fields which > won't survive an upgrade to binary JSON, because we will stop allowing > certain things (ordering, duplicate keys) which are currently allowed in > those columns. At the very least, if we're going to have that kind of > backwards compatibilty break we'll want to call the new version 10.0. > > That's why naming old JSON as "json_text" won't work; it'll be a > hardened roadblock to upgrading. Agreed. I can't imagine a use-case that would warrant breaking the current behavior of "json". Either we live with just one, text-oriented, json type and "finesse" whatever performance gains we can without breaking compatibility; or we introduce additional types (I personally like adding 2 instead of one but just adding the binary one would be ok) which - barring an overwhelming desire by -core to group-self-flagellate - means giving the new type an as yet unused name. From a marketing perspective having 3 types with the following properties is an easy message to sell: 1) json - liberal interpretation w/ validation only; stored as text; output as-is 2) json_text - strict interpretation w/ validation only; stored as text; output as-is 3) json_binary - strict interpretation w/ validation & parsing; stored as binary; output "normalized" This way "json" seems less like a mistake but rather an intentional desire to introduce a liberal type that meets data exchange needs in the short term and now, later, a structured data storage mechanism similar to "hstore". Even if you have json_binary I can imaging that some people would want to be able to store the original strict json as-is. Sure, they can use text, but this way intent is made clear and validation is attached directly to the type as opposed to having to be done separately. The use-cases described for needing a liberal "json" prove this out. That said "json" would be an acceptable replacement for "json_text" in many cases and separate validation for "strict json" prior to storing into "json" isn't that heinous. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778655.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Looking at this a different way: could we just implement BSON and leave json alone? http://bsonspec.org/ David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778656.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 11/15/2013 04:00 PM, David Johnston wrote: > Looking at this a different way: could we just implement BSON and leave json > alone? > > http://bsonspec.org/ In short? No. For one thing, our storage format is different from theirs (better, frankly), and as a result is not compliant with their "standard". That's a reason why we won't use the name BSON, either, since it's a trademark of 10gen. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On 11/15/2013 07:32 PM, Josh Berkus wrote: > On 11/15/2013 04:00 PM, David Johnston wrote: >> Looking at this a different way: could we just implement BSON and leave json >> alone? >> >> http://bsonspec.org/ > In short? No. > > For one thing, our storage format is different from theirs (better, > frankly), and as a result is not compliant with their "standard". > > That's a reason why we won't use the name BSON, either, since it's a > trademark of 10gen. > What is more, it has restrictions which we do not wish to have. See for example its treatment of numerics. cheers andrew
On 11/16/2013 12:15 AM, Josh Berkus wrote: > On 11/15/2013 02:59 PM, Merlin Moncure wrote: >> On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> I think you may be on to something here. This might also be a way >> opt-in to fast(er) serialization (upthread it was noted this is >> unimportant; I'm skeptical). I deeply feel that two types is not the >> right path but I'm pretty sure that this can be finessed. >> >>> As far as I understand merlin is mostly ok with stored json being >>> normalised and the problem is just with constructing "extended" >>> json (a.k.a. "processing instructions") to be used as source for >>> specialised parsers and renderers. > Thing is, I'm not particularly concerned about *Merlin's* specific use > case, which there are ways around. What I am concerned about is that we > may have users who have years of data stored in JSON text fields which > won't survive an upgrade to binary JSON, because we will stop allowing > certain things (ordering, duplicate keys) which are currently allowed in > those columns. At the very least, if we're going to have that kind of > backwards compatibilty break we'll want to call the new version 10.0. > > That's why naming old JSON as "json_text" won't work; it'll be a > hardened roadblock to upgrading. Then perhaps name the "new binary json" as jsob (JavaScript Object Binary) or just jsobj (JavaScript Object) and keep current json for what it is, namely JavaScript Object Notation. Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Nov 16, 2013, at 12:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > Then perhaps name the "new binary json" as jsob (JavaScript Object Binary) > or just jsobj (JavaScript Object) and keep current json for what it is, > namely > JavaScript Object Notation. It’s still input and output as JSON, though. I still like JSONB best. David
On 11/16/2013 10:30 PM, David E. Wheeler wrote: > On Nov 16, 2013, at 12:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > >> Then perhaps name the "new binary json" as jsob (JavaScript Object Binary) >> or just jsobj (JavaScript Object) and keep current json for what it is, >> namely >> JavaScript Object Notation. > It’s still input and output as JSON, though. Yes, because JavaScript Object Notation *is* a serialization format (aka Notation) for converting JavaScript Objects to text format and back :) > I still like JSONB best. To me it feels redundant, like binarytextbinary the binary representation of JSON is JavaScript(-like) Object, not "binary json" So my vote would be either jsobj or jsdoc (as "document databases") tend to call the structured types "documents" Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 11/16/2013 02:04 PM, Hannu Krosing wrote: > On 11/16/2013 10:30 PM, David E. Wheeler wrote: >> I still like JSONB best. > To me it feels redundant, like binarytextbinary > > the binary representation of JSON is JavaScript(-like) Object, not > "binary json" JSONB is as close as we can get to "JSON", and it gives people the idea that this is the successor type to JSON. So +1 from me as well on "JSONB". -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Nov 16, 2013, at 2:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> It’s still input and output as JSON, though. > Yes, because JavaScript Object Notation *is* a serialization format > (aka Notation) for converting JavaScript Objects to text format > and back :) >> I still like JSONB best. > To me it feels redundant, like binarytextbinary > > the binary representation of JSON is JavaScript(-like) Object, not > "binary json" > > So my vote would be either jsobj or jsdoc (as "document databases") tend > to call the structured types "documents" You know that both types support scalar values right? 'a'::JSON works now, and 'a'::hstore works with the WIP patch. Forthat reason I would not think that "doc" or "obj" would be good choices. I like JSONB because: 1. The "B" means "binary" 2. The "B" means "second" 3. It's short 4. See also BYTEA. Best, David
David E. Wheeler-3 wrote > I like JSONB because: > > 1. The "B" means "binary" > 2. The "B" means "second" > 3. It's short > 4. See also BYTEA. "json_strict" : Not sure about the "bytea" reference off-hand... I was pondering "jsons" which meets the short property just fine and the trailing "s" would stand for "strict" which is the user-visible semantic that this type exhibits rather than some less-visible "binary" attribute which most users would not really care about. I dislike the implication of plural-ness that the "s" imparts, though. Implication of "second" doesn't seem that important since both types provide useful semantics. I can imagine where the short aspect will lead people to accidentally type "json" where they mean to use "jsonb" and having a just a single extra character will increase the likelihood they will not notice. Knowing about and having used "json_strict" previously it will be more probable that such users will noticeably feel something is missing if they drop the whole "_strict" suffix. So, I'll toss out "json_strict" for my bikeshed contribution. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778770.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
"David E. Wheeler" <david@justatheory.com> writes: > You know that both types support scalar values right? 'a'::JSON works now, > and 'a'::hstore works with the WIP patch. For that reason I would not think > that "doc" or "obj" would be good choices. I'm wondering about just pushing hstore in core (even if technically still an extension, install it by default, like we do for PLpgSQL), and calling it a day. If you need pre-9.4 JSON-is-text compatibility, use the json datatype, if you want something with general index support, use hstore. For bikeshedding purposes, what about calling it jstore, as in “we actually know how to store your json documents”? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 11/17/2013 09:02 PM, David E. Wheeler wrote: > On Nov 16, 2013, at 2:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > >>> It’s still input and output as JSON, though. >> Yes, because JavaScript Object Notation *is* a serialization format >> (aka Notation) for converting JavaScript Objects to text format >> and back :) >>> I still like JSONB best. >> To me it feels redundant, like binarytextbinary >> >> the binary representation of JSON is JavaScript(-like) Object, not >> "binary json" >> >> So my vote would be either jsobj or jsdoc (as "document databases") tend >> to call the structured types "documents" > You know that both types support scalar values right? > 'a'::JSON works now, Yeah, and I remember all the bikeshedding about how scalars should not be supported as they are "not really JSON" by standard ... At that time I was also quite vocal about not painting ourselves in corner by not normalising json on input and thus generating a backwards compatibility problem in case we would ever get "proper" json support. > and 'a'::hstore works with the WIP patch. For that reason I would not think that "doc" or "obj" would be good choices. this is like claiming that text should not be text because you can store a single character there as well. I feel that both "doc" and "obj" convey the meaning that it is a structured type meant for fast component lookup as opposed to jsoN(otation) type which is text. Also jsdoc/jsobj would be a natural bridge to pgdoc/pgobj which would be similar to "new json" but allow any type supported by postgresql as a value. (... and in several languages even scalars really are objects) > > I like JSONB because: > > 1. The "B" means "binary" "Binary" has really little to do with the fact that we normalise on input, which is the real significant feature of the new json type. > 2. The "B" means "second" Why not just json2 , (you know, like varchar2 in a certain other database ;) > 3. It's short jsobj and jsdoc are exactly as short as jsonb > 4. See also BYTEA. BYTEA is "byte array", so not really relevant. (unless you try to rhyme a byte-a, json-b sequence ;) ) Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 11/17/2013 10:51 PM, Dimitri Fontaine wrote: > "David E. Wheeler" <david@justatheory.com> writes: >> You know that both types support scalar values right? 'a'::JSON works now, >> and 'a'::hstore works with the WIP patch. For that reason I would not think >> that "doc" or "obj" would be good choices. > I'm wondering about just pushing hstore in core (even if technically > still an extension, install it by default, like we do for PLpgSQL), and > calling it a day. > > If you need pre-9.4 JSON-is-text compatibility, use the json datatype, > if you want something with general index support, use hstore. +1 for getting also hstore in I think hstore needs to keep its text format compatible with older hstore (in this discussion lets call this text format "hson", short for HStore Object Notation for added confusion :) > > For bikeshedding purposes, what about calling it jstore, +1 for jstore as well. I am happy with jstore, jsdoc, jsobj jstore/jsobj/jsdoc really is *not* JSON, but a bona-fide freeform structured datatype that happens to have JSON as convenient I/O format. You may want to use jstore even if you have never needed JSON as serialisation/transport format before. I do not like jsonB (sounds too much like json2, i.e. like we were trying to cover up a design accident) nor json_strict (as this is not really strict as it indeed does accept scalars, not just Arrays/Lists and Objects/Dictionaries as per JSON standard) > as in “we actually know how to store your json documents”? > > Regards, Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On Nov 17, 2013, at 1:51 PM, Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: > I'm wondering about just pushing hstore in core (even if technically > still an extension, install it by default, like we do for PLpgSQL), and > calling it a day. It’s syntax is different than JSON, so one would need to convert to and from JSON all the time to parse and serialize. PITA. > For bikeshedding purposes, what about calling it jstore, as in “we > actually know how to store your json documents”? -1 Sounds like a Java storage API. David
"David E. Wheeler" <david@justatheory.com> writes: > On Nov 17, 2013, at 1:51 PM, Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: >> I'm wondering about just pushing hstore in core (even if technically >> still an extension, install it by default, like we do for PLpgSQL), and >> calling it a day. > > It’s syntax is different than JSON, so one would need to convert to > and from JSON all the time to parse and serialize. PITA. Oh I misremembered about that, I though it would take JSON as input as-is and could be made to output JSON. And IIRC the community input at pgconf.eu has been to just always output json texts and get rid of the formating GUCs. Now, if it turns out that the new hstore is not dealing with json input and output, we could have json, jstore and hstore. -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Nov 17, 2013, at 2:26 PM, Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote: >> It’s syntax is different than JSON, so one would need to convert to >> and from JSON all the time to parse and serialize. PITA. > > Oh I misremembered about that, I though it would take JSON as input > as-is and could be made to output JSON. And IIRC the community input at > pgconf.eu has been to just always output json texts and get rid of the > formating GUCs. Yeah, but for back-compate, it has to use => instead of : to separate keys from values, and cannot use braces for a root-levelobject. :-( > Now, if it turns out that the new hstore is not dealing with json input > and output, we could have json, jstore and hstore. That's where this is headed, yes. David
On 11/17/2013 04:51 PM, Dimitri Fontaine wrote: > "David E. Wheeler" <david@justatheory.com> writes: >> You know that both types support scalar values right? 'a'::JSON works now, >> and 'a'::hstore works with the WIP patch. For that reason I would not think >> that "doc" or "obj" would be good choices. > I'm wondering about just pushing hstore in core (even if technically > still an extension, install it by default, like we do for PLpgSQL), and > calling it a day. That would be one of the silliest and most short-sighted decisions we have made in many years, IMNSHO. The demand for strong JSON support is enormous. I don't think I have ever received as many positive comments on any other feature I have worked on in the last 9 years. What these people want is not something jsonish, they want json, pure and simple. And they want it fast and featured and efficient. Much as I love hstore, it isn't json, and so it won't satisfy that demand. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > That would be one of the silliest and most short-sighted decisions we have > made in many years, IMNSHO. The demand for strong JSON support is enormous. One of the silliest and most short-sighted decisions we made recently might have been to actually ship that json variant in 9.2, after all. The most popular PostgreSQL commands in 9.4 are going to be: $ sudo apt-get install postgresql-contrib-9.4 # create extension jstore; # alter table foo alter column documents type jstore; # create index on foo using gist(documents); Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On 18/11/13 09:02, David E. Wheeler wrote: > On Nov 16, 2013, at 2:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > >>> It’s still input and output as JSON, though. >> Yes, because JavaScript Object Notation *is* a serialization format >> (aka Notation) for converting JavaScript Objects to text format >> and back :) >>> I still like JSONB best. >> To me it feels redundant, like binarytextbinary >> >> the binary representation of JSON is JavaScript(-like) Object, not >> "binary json" >> >> So my vote would be either jsobj or jsdoc (as "document databases") tend >> to call the structured types "documents" > You know that both types support scalar values right? 'a'::JSON works now, and 'a'::hstore works with the WIP patch. Forthat reason I would not think that "doc" or "obj" would be good choices. > > I like JSONB because: > > 1. The "B" means "binary" > 2. The "B" means "second" > 3. It's short > 4. See also BYTEA. > > Best, > > David > > > > Whatever, I think the first 4 characters have to 'JSON' - for easy identification. Cheers, Gavin
On 18/11/13 09:45, David Johnston wrote: > David E. Wheeler-3 wrote >> I like JSONB because: >> >> 1. The "B" means "binary" >> 2. The "B" means "second" >> 3. It's short >> 4. See also BYTEA. > "json_strict" : > > Not sure about the "bytea" reference off-hand... > > I was pondering "jsons" which meets the short property just fine and the > trailing "s" would stand for "strict" which is the user-visible semantic > that this type exhibits rather than some less-visible "binary" attribute > which most users would not really care about. I dislike the implication of > plural-ness that the "s" imparts, though. > > Implication of "second" doesn't seem that important since both types provide > useful semantics. > > I can imagine where the short aspect will lead people to accidentally type > "json" where they mean to use "jsonb" and having a just a single extra > character will increase the likelihood they will not notice. Knowing about > and having used "json_strict" previously it will be more probable that such > users will noticeably feel something is missing if they drop the whole > "_strict" suffix. > > So, I'll toss out "json_strict" for my bikeshed contribution. > > David J. > > > > > -- > View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778770.html > Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. > > +1
On 11/17/2013 05:44 PM, Dimitri Fontaine wrote: > Andrew Dunstan <andrew@dunslane.net> writes: >> That would be one of the silliest and most short-sighted decisions we have >> made in many years, IMNSHO. The demand for strong JSON support is enormous. > One of the silliest and most short-sighted decisions we made recently > might have been to actually ship that json variant in 9.2, after all. The fact is that we had been going round and round on Json for a while, and Robert Haas rightly made a move to break the Gordian knot. We would not have done ourselves any service by not accepting it. Hindsight is always 20-20, but even with what we have today people get excited, and more people move to use Postgres every day because we have that support. > > The most popular PostgreSQL commands in 9.4 are going to be: > > $ sudo apt-get install postgresql-contrib-9.4 > > # create extension jstore; > # alter table foo alter column documents type jstore; > # create index on foo using gist(documents); > Umm, not if I have anything to do with it. cheers andrew
> Now, if it turns out that the new hstore is not dealing with json input > and output, we could have json, jstore and hstore. Jstore isn't the worst name suggestion I've heard on this thread. The reason I prefer JSONB though, is that a new user looking for a place to put JSON data will clearly realize that JSON and JSONB are alternatives and related in some way. They won't necessarily expect that "jstore" has anything to do with JSON, especially when there is another type called "JSON". Quite a few people are liable to think it's something to do with Java. Besides, we might get sued by these people: http://www.jstor.org/ ;-) -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Nov 17, 2013, at 5:49 PM, Josh Berkus <josh@agliodbs.com> wrote: > Jstore isn't the worst name suggestion I've heard on this thread. The > reason I prefer JSONB though, is that a new user looking for a place to > put JSON data will clearly realize that JSON and JSONB are alternatives > and related in some way. They won't necessarily expect that "jstore" > has anything to do with JSON, especially when there is another type > called "JSON". Quite a few people are liable to think it's something to > do with Java. > > Besides, we might get sued by these people: http://www.jstor.org/ ;-) Okay, how about JDATE? ;-P David
On 18/11/13 14:51, David E. Wheeler wrote: > On Nov 17, 2013, at 5:49 PM, Josh Berkus <josh@agliodbs.com> wrote: > >> Jstore isn't the worst name suggestion I've heard on this thread. The >> reason I prefer JSONB though, is that a new user looking for a place to >> put JSON data will clearly realize that JSON and JSONB are alternatives >> and related in some way. They won't necessarily expect that "jstore" >> has anything to do with JSON, especially when there is another type >> called "JSON". Quite a few people are liable to think it's something to >> do with Java. >> >> Besides, we might get sued by these people: http://www.jstor.org/ ;-) > Okay, how about JDATE? ;-P > > David > > > I don't want a Japanese Date - would cause complications with my wife! :-) Cheers, Gavin
On 11/17/2013 08:49 PM, Josh Berkus wrote: >> Now, if it turns out that the new hstore is not dealing with json input >> and output, we could have json, jstore and hstore. > Jstore isn't the worst name suggestion I've heard on this thread. The > reason I prefer JSONB though, is that a new user looking for a place to > put JSON data will clearly realize that JSON and JSONB are alternatives > and related in some way. They won't necessarily expect that "jstore" > has anything to do with JSON, especially when there is another type > called "JSON". Quite a few people are liable to think it's something to > do with Java. > > Besides, we might get sued by these people: http://www.jstor.org/ ;-) > I don't think any name that doesn't begin with "json" is acceptable. I could live with "jsonb". It has the merit of brevity, but maybe it's a tad too close to "json" to be the right answer. cheers andrew
On Nov 17, 2013, at 8:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > I don't think any name that doesn't begin with "json" is acceptable. I could live with "jsonb". It has the merit of brevity,but maybe it's a tad too close to "json" to be the right answer. JSONFTW. David
On 11/18/2013 05:19 AM, Andrew Dunstan wrote: > > On 11/17/2013 08:49 PM, Josh Berkus wrote: >>> Now, if it turns out that the new hstore is not dealing with json input >>> and output, we could have json, jstore and hstore. >> Jstore isn't the worst name suggestion I've heard on this thread. The >> reason I prefer JSONB though, is that a new user looking for a place to >> put JSON data will clearly realize that JSON and JSONB are alternatives >> and related in some way. They won't necessarily expect that "jstore" >> has anything to do with JSON, especially when there is another type >> called "JSON". Quite a few people are liable to think it's something to >> do with Java. >> >> Besides, we might get sued by these people: http://www.jstor.org/ ;-) >> > > I don't think any name that doesn't begin with "json" is acceptable. > I could live with "jsonb". It has the merit of brevity, but maybe it's > a tad > too close to "json" to be the right answer. How about jsondoc, or jsonobj ? It is still reasonably 'json' but not too easy to confuse with existing json when typing And it perhaps hints better at the main difference from string-json, namely that it is an object and not textual source code / notation / processing info . Cheers -- Hannu Krosing PostgreSQL Consultant Performance, Scalability and High Availability 2ndQuadrant Nordic OÜ
On 11/15/13, 6:15 PM, Josh Berkus wrote: > Thing is, I'm not particularly concerned about *Merlin's* specific use > case, which there are ways around. What I am concerned about is that we > may have users who have years of data stored in JSON text fields which > won't survive an upgrade to binary JSON, because we will stop allowing > certain things (ordering, duplicate keys) which are currently allowed in > those columns. At the very least, if we're going to have that kind of > backwards compatibilty break we'll want to call the new version 10.0. We could do something like SQL/XML and specify the level of "validity" in a typmod, e.g., json(loose), json(strict), etc.
On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote: > I don't think any name that doesn't begin with "json" is acceptable. I could > live with "jsonb". It has the merit of brevity, but maybe it's a tad too > close to "json" to be the right answer. I think that seems right. Couple thoughts: *) Aside from the text in and out routines, how is 'jsbonb' different from the coming 'nested hstore'? Enough to justify two code bases? *) How much of the existing json API has to be copied over to the jsonb type and how exactly is that going to happen? For example, I figure we'd need a "record_to_jsonb" etc. for sure, but do we also need a jsonb_each()...can't we overload instead? Maybe we can cheat a little bit overload the functions so that one the existing APIs (hstore or json) can be recovered -- only adding what minimal functionality needs to be added to handle the type distinction (mostly on serialization routines and casts). What I'm driving at here is that it would be nice if the API was not strongly bifurcated: this would cause quite a bit of mindspace confusion. merlin
On 11/18/2013 09:38 AM, Merlin Moncure wrote: > On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote: >> I don't think any name that doesn't begin with "json" is acceptable. I could >> live with "jsonb". It has the merit of brevity, but maybe it's a tad too >> close to "json" to be the right answer. > I think that seems right. Couple thoughts: > > *) Aside from the text in and out routines, how is 'jsbonb' different > from the coming 'nested hstore'? Enough to justify two code bases? The discussion has been around making a common library that would be used for both. > > *) How much of the existing json API has to be copied over to the > jsonb type and how exactly is that going to happen? For example, I > figure we'd need a "record_to_jsonb" etc. for sure, but do we also > need a jsonb_each()...can't we overload instead? Overloading is what I was planning to do. cheers andrew
On 11/18/2013 06:13 AM, Peter Eisentraut wrote: > On 11/15/13, 6:15 PM, Josh Berkus wrote: >> Thing is, I'm not particularly concerned about *Merlin's* specific use >> case, which there are ways around. What I am concerned about is that we >> may have users who have years of data stored in JSON text fields which >> won't survive an upgrade to binary JSON, because we will stop allowing >> certain things (ordering, duplicate keys) which are currently allowed in >> those columns. At the very least, if we're going to have that kind of >> backwards compatibilty break we'll want to call the new version 10.0. > > We could do something like SQL/XML and specify the level of "validity" > in a typmod, e.g., json(loose), json(strict), etc. Doesn't work; with XML, the underlying storage format didn't change. With JSONB, it will ... so changing the typemod would require a total rewrite of the table. That's a POLS violation if I ever saw one -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
Merlin, > *) Aside from the text in and out routines, how is 'jsbonb' different > from the coming 'nested hstore'? Enough to justify two code bases? In/out functions and defaults are all different. Otherwise, the two types will be accessing the same code base, so no duplication. Think of is as XML vs. TEXT. > Maybe we can cheat a little bit overload the functions so that one the > existing APIs (hstore or json) can be recovered -- only adding what > minimal functionality needs to be added to handle the type distinction > (mostly on serialization routines and casts). What I'm driving at > here is that it would be nice if the API was not strongly bifurcated: > this would cause quite a bit of mindspace confusion. I'll also note that for functions designed for output to the client, it doesn't make much of a difference whether the result is JSON or JSONB, since the string representation will be identical. However, it makes a difference if the data is going to be stored, since a double conversion on a large JSON value would be expensive. In other words, we need a version of each function which outputs JSONB, but that version doesn't have to be the default if users don't specify. Note that this raises the issue of "first alternate data type ambiguity" again for overloading builtin functions. We really need that method of "prefering" a specific version of the function ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Mon, Nov 18, 2013 at 12:10 PM, Josh Berkus <josh@agliodbs.com> wrote: > Merlin, > >> *) Aside from the text in and out routines, how is 'jsbonb' different >> from the coming 'nested hstore'? Enough to justify two code bases? > > In/out functions and defaults are all different. Otherwise, the two > types will be accessing the same code base, so no duplication. Think of > is as XML vs. TEXT. > >> Maybe we can cheat a little bit overload the functions so that one the >> existing APIs (hstore or json) can be recovered -- only adding what >> minimal functionality needs to be added to handle the type distinction >> (mostly on serialization routines and casts). What I'm driving at >> here is that it would be nice if the API was not strongly bifurcated: >> this would cause quite a bit of mindspace confusion. > > I'll also note that for functions designed for output to the client, it > doesn't make much of a difference whether the result is JSON or JSONB, > since the string representation will be identical. However, it makes a > difference if the data is going to be stored, since a double conversion > on a large JSON value would be expensive. Hm, but it would matter wouldn't it...the jsonb representation would give output with the record fields reordered, deduplicated, etc. Also, presumably, the jsonb serialization would be necessarily slower for exactly that reason, although perhaps not enough to matter much. > In other words, we need a version of each function which outputs JSONB, > but that version doesn't have to be the default if users don't specify. > > Note that this raises the issue of "first alternate data type ambiguity" > again for overloading builtin functions. We really need that method of > "prefering" a specific version of the function ... You'd need explicit jsonb creating functions: record_to_jsonb, array_to_jsonb etc. AFAIK, these functions would be the only ones that would have to explicitly reference the jsonb type if you don't count casts. It's tempting to *not* make those functions as that would only require the user to specify jsonb for table columns...you'd then go from json to jsonb with a cast, perhaps even an implicit one. The disadvantage there is that you'd then get unsimplified json always. Hm -- on that note, is it technically feasible to *not* duplicate the json API implementations, but just (ab)use implicit casting between the APIs? That way the text API would own all the serialization routines as it does now but you'd run mutation and searching through jsonb... merlin
On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: > I am sure you could also devise an json encoding scheme > where white space is significant ;) I don't even have to think hard. If you want your JSON to be human-readable, it's entirely possible that you want it stored using the same whitespace that was present on input. There is a valid use case for normalizing whitespace, too, of course. Everyone on this thread who thinks that there is Only One Right Way To Do It should take a chill pill. There is, in fact, more than one right way to do it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Nov 19, 2013, at 8:14 AM, Robert Haas <robertmhaas@gmail.com> wrote: > Everyone on this thread who thinks that there is Only One Right Way To > Do It should take a chill pill. There is, in fact, more than one > right way to do it. You shoulda been a Perl hacker, Robert. D
On Tue, Nov 19, 2013 at 12:27 PM, David E. Wheeler <david@justatheory.com> wrote: > On Nov 19, 2013, at 8:14 AM, Robert Haas <robertmhaas@gmail.com> wrote: >> Everyone on this thread who thinks that there is Only One Right Way To >> Do It should take a chill pill. There is, in fact, more than one >> right way to do it. > > You shoulda been a Perl hacker, Robert. I don't hack on Perl, but I spent about 10 years hacking *in* Perl, so the phrasing was not a coincidence. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/19/2013 08:14 AM, Robert Haas wrote: > On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> I am sure you could also devise an json encoding scheme >> where white space is significant ;) > > I don't even have to think hard. If you want your JSON to be > human-readable, it's entirely possible that you want it stored using > the same whitespace that was present on input. There is a valid use > case for normalizing whitespace, too, of course. Given that JSON is a data interchange format, I suspect that there are an extremely large combination of factors which would result in an unimplementably large number of parser settings. For example, I personally would have use for a type which allowed the storage of JSON *fragments*. Therefore I am interested only in supporting two: a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's apps, and b) the optimal behavior for Hstore2/JSONB. (a) is defined as:* complete documents only (no fragments)* whitespace not significant* no reordering of keys* duplicatekeys allowed (b) is defined as:* complete documents only (no fragments)* whitespace not significant* reordering of keys* duplicate keysprohibited If people want other manglings of JSON, they can use TEXT fields and custom parsers written in PL/v8, the same way I do. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
On Tue, Nov 19, 2013 at 12:59 PM, Josh Berkus <josh@agliodbs.com> wrote: > On 11/19/2013 08:14 AM, Robert Haas wrote: >> On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >>> I am sure you could also devise an json encoding scheme >>> where white space is significant ;) >> >> I don't even have to think hard. If you want your JSON to be >> human-readable, it's entirely possible that you want it stored using >> the same whitespace that was present on input. There is a valid use >> case for normalizing whitespace, too, of course. > > Given that JSON is a data interchange format, I suspect that there are > an extremely large combination of factors which would result in an > unimplementably large number of parser settings. For example, I > personally would have use for a type which allowed the storage of JSON > *fragments*. Therefore I am interested only in supporting two: > > a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's > apps, and > > b) the optimal behavior for Hstore2/JSONB. > > (a) is defined as: > * complete documents only (no fragments) > * whitespace not significant > * no reordering of keys > * duplicate keys allowed > > (b) is defined as: > * complete documents only (no fragments) > * whitespace not significant > * reordering of keys > * duplicate keys prohibited > > If people want other manglings of JSON, they can use TEXT fields and > custom parsers written in PL/v8, the same way I do. For (a), I assume you mean "whitespace not significant, but preserved", because that is the current behavior, whereas for (b), I think we would want to say "whitespace neither significant nor preserved". Other than that, I completely agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/19/2013 12:59 PM, Josh Berkus wrote: > On 11/19/2013 08:14 AM, Robert Haas wrote: >> On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >>> I am sure you could also devise an json encoding scheme >>> where white space is significant ;) >> I don't even have to think hard. If you want your JSON to be >> human-readable, it's entirely possible that you want it stored using >> the same whitespace that was present on input. There is a valid use >> case for normalizing whitespace, too, of course. > Given that JSON is a data interchange format, I suspect that there are > an extremely large combination of factors which would result in an > unimplementably large number of parser settings. For example, I > personally would have use for a type which allowed the storage of JSON > *fragments*. Therefore I am interested only in supporting two: > > a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's > apps, and > > b) the optimal behavior for Hstore2/JSONB. > > (a) is defined as: > * complete documents only (no fragments) > * whitespace not significant > * no reordering of keys > * duplicate keys allowed > > (b) is defined as: > * complete documents only (no fragments) > * whitespace not significant > * reordering of keys > * duplicate keys prohibited > > If people want other manglings of JSON, they can use TEXT fields and > custom parsers written in PL/v8, the same way I do. > Fragments are currently allowed in a): andrew=# select '"a"'::json; json ------ "a" Given that, I'm not sure we shouldn't permit them in b) either. I think I lost that argument back in the 9.2 dev cycle. I really don't want to get to a situation where foo::json::jsonb can produce an error. cheers andrew
Andrew Dunstan wrote > Given that, I'm not sure we shouldn't permit them in b) either. I think > I lost that argument back in the 9.2 dev cycle. I really don't want to > get to a situation where foo::json::jsonb can produce an error. So what do you propose happens when the input json has duplicate keys? IMO A reasonable default cast function should error if the json contents require anything more than a straight parse to be stored into jsonb. If the user still needs to make the conversion we should have a standard and configurable parser function with json input and jsonb output. In this case the key-keep options would be "keep first encountered" or "keep last encountered" or "fail on duplicate" the last of which would be the default. I have not really pondered storing scalars into jsonb but before pondering usability are there any technical concerns. If the goal is to share the backend with hstore then current hstore does not allow for this and so the json aspect would either transfer back over or it would need customized code. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779221.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 20/11/13 05:14, Robert Haas wrote: > On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >> I am sure you could also devise an json encoding scheme >> where white space is significant ;) > I don't even have to think hard. If you want your JSON to be > human-readable, it's entirely possible that you want it stored using > the same whitespace that was present on input. There is a valid use > case for normalizing whitespace, too, of course. > > Everyone on this thread who thinks that there is Only One Right Way To > Do It should take a chill pill. There is, in fact, more than one > right way to do it. > There is only one obvious 'Right Way', and that is 'My Way'! :-) More seriously, there are obviously variants in what people consider useful human readable form of JSON output, but it is probably inefficient to store white space. Which suggests it might be useful to allow users to store rules so that the output and include the white space that they want. However, this is non-trivial - for example Eclipse allows Java/XML source to be formatted in different ways (here the source files are store with white space), but lacks a couple of options that I would like. Possibly, JSON output of white space would be less problematical? Cheers, Gavin
On 11/19/2013 01:43 PM, David Johnston wrote: > Andrew Dunstan wrote >> Given that, I'm not sure we shouldn't permit them in b) either. I think >> I lost that argument back in the 9.2 dev cycle. I really don't want to >> get to a situation where foo::json::jsonb can produce an error. > So what do you propose happens when the input json has duplicate keys? I propose that we do what V8 does, what hstore does (yes, I know it's not json, but consistency is good), what we do now with the json_get() functions, what almost every other JSON engine I know of does, namely that the last key wins. cheers andrew
Gavin Flower-2 wrote > More seriously, there are obviously variants in what people consider > useful human readable form of JSON output, but it is probably > inefficient to store white space. Enough to matter? Maybe the extra whitespace causes a marginal value to be toasted but, IIUC, for a value that is going to be toasted anyway the compression factors for both speed and space is going to make whitespace considerations insignificant. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779227.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 11/19/2013 01:43 PM, David Johnston wrote: > > I have not really pondered storing scalars into jsonb but before pondering > usability are there any technical concerns. If the goal is to share the > backend with hstore then current hstore does not allow for this and so the > json aspect would either transfer back over or it would need customized > code. > Your premise here is simply wrong. The new hstore code does support scalar root elements. cheers andew
On Tue, Nov 19, 2013 at 11:59 AM, Josh Berkus <josh@agliodbs.com> wrote: > On 11/19/2013 08:14 AM, Robert Haas wrote: >> On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote: >>> I am sure you could also devise an json encoding scheme >>> where white space is significant ;) >> >> I don't even have to think hard. If you want your JSON to be >> human-readable, it's entirely possible that you want it stored using >> the same whitespace that was present on input. There is a valid use >> case for normalizing whitespace, too, of course. > > Given that JSON is a data interchange format, I suspect that there are > an extremely large combination of factors which would result in an > unimplementably large number of parser settings. For example, I > personally would have use for a type which allowed the storage of JSON > *fragments*. Therefore I am interested only in supporting two: > > a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's I'm uncomfortable with the word 'legacy'. This suggests the new type will essentially deprecate the old type. jsonb will be likely be pessimal to large serializations. If you're not manipulating and searching the documents, why would you use it? It's going to take more space on disk and memory and should provide little benefit for present *as well as future code* . (note, it will provide extreme benefits for nosql type uses which is of huge strategic importance for the project). json and jsonb APIs should work together cleanly, and the documentation should suggest which types are different and better for which cases. merlin
On 11/19/2013 03:06 PM, Merlin Moncure wrote: >> Therefore I am interested only in supporting two: >> >> a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's > I'm uncomfortable with the word 'legacy'. This suggests the new type > will essentially deprecate the old type. "Existing" might be a better word. > jsonb will be likely be > pessimal to large serializations. If you're not manipulating and > searching the documents, why would you use it? It's going to take > more space on disk and memory and should provide little benefit for > present *as well as future code* . (note, it will provide extreme > benefits for nosql type uses which is of huge strategic importance for > the project). json and jsonb APIs should work together cleanly, and > the documentation should suggest which types are different and better > for which cases. I agree with most of this. cheers andrew
On Tue, Nov 19, 2013 at 1:43 PM, David Johnston <polobo@yahoo.com> wrote: > IMO A reasonable default cast function should error if the json contents > require anything more than a straight parse to be stored into jsonb. If the > user still needs to make the conversion we should have a standard and > configurable parser function with json input and jsonb output. In this case > the key-keep options would be "keep first encountered" or "keep last > encountered" or "fail on duplicate" the last of which would be the default. > > I have not really pondered storing scalars into jsonb but before pondering > usability are there any technical concerns. If the goal is to share the > backend with hstore then current hstore does not allow for this and so the > json aspect would either transfer back over or it would need customized > code. I confess to being a bit perplexed by why we want hstore and json to share a common binary format. hstore doesn't store hierarchical data; json does. If we design a binary format for json, doesn't that just obsolete store? Why go to a lot of trouble to extend our home-grown format if we've got a standard format to plug into? The thing that's really missing in all of these discussions (AFAICS) is the issue of creating index support for these types. If using some variant of the existing hstore format makes that easier, then I guess I understand the connection - but I'm not sure why or how that would be the case, and it would be nice to make the connection more explicit. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 11/20/2013 07:52 AM, Robert Haas wrote: > On Tue, Nov 19, 2013 at 1:43 PM, David Johnston <polobo@yahoo.com> wrote: >> IMO A reasonable default cast function should error if the json contents >> require anything more than a straight parse to be stored into jsonb. If the >> user still needs to make the conversion we should have a standard and >> configurable parser function with json input and jsonb output. In this case >> the key-keep options would be "keep first encountered" or "keep last >> encountered" or "fail on duplicate" the last of which would be the default. >> >> I have not really pondered storing scalars into jsonb but before pondering >> usability are there any technical concerns. If the goal is to share the >> backend with hstore then current hstore does not allow for this and so the >> json aspect would either transfer back over or it would need customized >> code. > I confess to being a bit perplexed by why we want hstore and json to > share a common binary format. hstore doesn't store hierarchical data; > json does. If we design a binary format for json, doesn't that just > obsolete store? Why go to a lot of trouble to extend our home-grown > format if we've got a standard format to plug into? > > The thing that's really missing in all of these discussions (AFAICS) > is the issue of creating index support for these types. If using some > variant of the existing hstore format makes that easier, then I guess > I understand the connection - but I'm not sure why or how that would > be the case, and it would be nice to make the connection more > explicit. > Oleg and Teodor have done quite a lot of work on a version of hstore that supports nested structures. See their pgcon talk. With some additions it has become in effect a non-standard notation for json. Rather than repeat that work, my suggestion has been that they abstract the common parts into a library that can be used by jsonb or whatever we end up calling it as well as nested hstore. I understand Teodor is working on this. In general I share your feelings, though. cheers andrew
On 11/18/2013 06:49 PM, Josh Berkus wrote: > On 11/18/2013 06:13 AM, Peter Eisentraut wrote: >> On 11/15/13, 6:15 PM, Josh Berkus wrote: >>> Thing is, I'm not particularly concerned about *Merlin's* specific use >>> case, which there are ways around. What I am concerned about is that we >>> may have users who have years of data stored in JSON text fields which >>> won't survive an upgrade to binary JSON, because we will stop allowing >>> certain things (ordering, duplicate keys) which are currently allowed in >>> those columns. At the very least, if we're going to have that kind of >>> backwards compatibilty break we'll want to call the new version 10.0. >> We could do something like SQL/XML and specify the level of "validity" >> in a typmod, e.g., json(loose), json(strict), etc. > Doesn't work; with XML, the underlying storage format didn't change. > With JSONB, it will ... so changing the typemod would require a total > rewrite of the table. That's a POLS violation if I ever saw one We do rewrites on typmod changes already. To me having json(string) and json(hstore) does not seem too bad. Cheers Hannu
Hannu Krosing-3 wrote > On 11/18/2013 06:49 PM, Josh Berkus wrote: >> On 11/18/2013 06:13 AM, Peter Eisentraut wrote: >>> On 11/15/13, 6:15 PM, Josh Berkus wrote: >>>> Thing is, I'm not particularly concerned about *Merlin's* specific use >>>> case, which there are ways around. What I am concerned about is that we >>>> may have users who have years of data stored in JSON text fields which >>>> won't survive an upgrade to binary JSON, because we will stop allowing >>>> certain things (ordering, duplicate keys) which are currently allowed >>>> in >>>> those columns. At the very least, if we're going to have that kind of >>>> backwards compatibilty break we'll want to call the new version 10.0. >>> We could do something like SQL/XML and specify the level of "validity" >>> in a typmod, e.g., json(loose), json(strict), etc. >> Doesn't work; with XML, the underlying storage format didn't change. >> With JSONB, it will ... so changing the typemod would require a total >> rewrite of the table. That's a POLS violation if I ever saw one > We do rewrites on typmod changes already. > > To me having json(string) and json(hstore) does not seem too bad. Three things: 1) How would this work in the face of functions that erase typemod information? 2) json [no type mod] would have to effectively default to json(string)? 3) how would #1 and #2 interact? I pondered the general idea but my (admittedly limited) gut feeling is that using typemod would possibly be technically untenable and from an end-user perspective would be even more confusing than having two types. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779428.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On 11/20/2013 04:52 AM, Robert Haas wrote: > I confess to being a bit perplexed by why we want hstore and json to > share a common binary format. hstore doesn't store hierarchical data; > json does. If we design a binary format for json, doesn't that just > obsolete store? Why go to a lot of trouble to extend our home-grown > format if we've got a standard format to plug into? See hstore2 patch from Teodor. That's what we're talking about, not hstore1, which as you point out is non-heirarchical. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Sat, Nov 16, 2013 at 12:32 AM, Josh Berkus <spandir="ltr"><<a href="mailto:josh@agliodbs.com" target="_blank">josh@agliodbs.com</a>></span> wrote:<br /><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On11/15/2013 04:00 PM, David Johnston wrote:<br /> > Looking at this a different way: could we just implementBSON and leave json<br /> > alone?<br /> ><br /> > <a href="http://bsonspec.org/" target="_blank">http://bsonspec.org/</a><br/><br /></div>In short? No.<br /><br /> For one thing, our storage format isdifferent from theirs (better,<br /> frankly), and as a result is not compliant with their "standard".</blockquote></div><br/></div><div class="gmail_extra">Not being super familiar with either BSON our JSONB whatadvantages are we gaining from the difference?<br /><br /></div><div class="gmail_extra">It might be interesting if wesupported the same binary representation so we could have a binary send/recv routines that don't need to do any serialization/deserialization.Especially since a standard format would potentially be skipping the serialization/deserializationon both the server and client.<br /><br /></div><div class="gmail_extra"><br clear="all" /><br/>-- <br />greg<br /></div></div>
On 11/20/2013 12:50 PM, Greg Stark wrote: > > On Sat, Nov 16, 2013 at 12:32 AM, Josh Berkus <josh@agliodbs.com > <mailto:josh@agliodbs.com>> wrote: > > On 11/15/2013 04:00 PM, David Johnston wrote: > > Looking at this a different way: could we just implement BSON > and leave json > > alone? > > > > http://bsonspec.org/ > > In short? No. > > For one thing, our storage format is different from theirs (better, > frankly), and as a result is not compliant with their "standard". > > > Not being super familiar with either BSON our JSONB what advantages > are we gaining from the difference? > > It might be interesting if we supported the same binary representation > so we could have a binary send/recv routines that don't need to do any > serialization/deserialization. Especially since a standard format > would potentially be skipping the serialization/deserialization on > both the server and client. > > > To start with, it doesn't support arbitrary precision numerics. That means that right off the bat it's only accepting a subset of what the JSON spec allows. 'Nuff said, I think. cheers andrew
Greg, > Not being super familiar with either BSON our JSONB what advantages are we > gaining from the difference? We have the JSONB/Hstore2 format *now*, and it can go into 9.4. This makes it inherently superior to any theoretical format. So any further discussion (below) is strictly academic, for the archives. > It might be interesting if we supported the same binary representation so > we could have a binary send/recv routines that don't need to do any > serialization/deserialization. Especially since a standard format would > potentially be skipping the serialization/deserialization on both the > server and client. Leaving aside that we don't want to implement 10gen's spec (because of major omissions like decimal numbers), the fundamental issue with binary-update-in-place is that nobody (certainly not 10gen) has devised a way to do it without having ginormous amounts of bloat in value storage. The way BSON allows for binary-update-in-place, as I understand it, is to pad all values with lots of zero bytes and to prohibit compression, either of which are much larger losses for performance than serialization is. In other words, binary-update-in-place seems like a clear win for heirarchical data storage, but practically it's not. Of course, an investigation into this by someone with much more knowledge of low-level storage than me (most of this list) is welcome. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
David Johnston <polobo@yahoo.com> writes: >> On 11/18/2013 06:13 AM, Peter Eisentraut wrote: >>> We could do something like SQL/XML and specify the level of "validity" >>> in a typmod, e.g., json(loose), json(strict), etc. > Three things: > 1) How would this work in the face of functions that erase typemod > information? You'd have to make the data self-identifying (which I think was the plan already), and ensure that *every* function taking "json" could cope with both formats on input. The typmod could only be expected to be enforced when storing or explicitly casting to one subformat, much like operations on "numeric" pay little attention to the original precision/scale if any. I agree that this solution isn't terribly workable, mainly because it'd break any third-party C functions that take json today. regards, tom lane
On 11/20/2013 01:36 PM, Tom Lane wrote: > > You'd have to make the data self-identifying (which I think was the plan > already), and ensure that *every* function taking "json" could cope with > both formats on input. The typmod could only be expected to be enforced > when storing or explicitly casting to one subformat, much like operations > on "numeric" pay little attention to the original precision/scale if any. > > I agree that this solution isn't terribly workable, mainly because it'd > break any third-party C functions that take json today. > > Yeah, I had come to this conclusion. I don't think we can bolt on typmods after the event. I don't think having a separate "jsonb" type will be a tragedy. I'm already planning on overloading the existing json functions and operators. cheers andrew
On Wed, Nov 20, 2013 at 12:50 PM, Greg Stark <stark@mit.edu> wrote: >> For one thing, our storage format is different from theirs (better, >> frankly), and as a result is not compliant with their "standard". > > Not being super familiar with either BSON our JSONB what advantages are we > gaining from the difference? BSON assumes, for example, that all integers fit in 64-bits and all floating point values can be accurately represented as float8. So not all JSON objects can be represented as BSON without loss of information. BSON also adds a bunch of extra types that are not part of JSON, like timestamps, regular expressions, and embedded documents. So not all BSON objects can be represented as JSON without loss of information. While it's tempting to think that BSON is a serialization format for JSON, and the name is meant to suggest that, it really isn't. It's just a serialization format for approximately whatever the authors thought would be useful, which happens to be kinda like JSON. Sorta. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2013-11-20 10:01:28 -0800, Josh Berkus wrote: > Greg, > > > Not being super familiar with either BSON our JSONB what advantages are we > > gaining from the difference? > > We have the JSONB/Hstore2 format *now*, and it can go into 9.4. That patch needs a *fair* amount of work though. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hi everyone
I used to work on a project storing large quantities of schema-less data, initially using MongoDB, then Postgres with JSON, and eventually I implemented BSON support for Postgres to get the best of both worlds: https://github.com/maciekgajewski/postgresbson
I don't think that JSONB is a good idea. There is a lot to learn from MongoDB's mistakes in this area.
1. As noted in this thread previously, JSON is a serialization format, not a document format.
2. Almost any structured data type, self-describing or not, can be serialized to/from JSON, but always using only subset of it, and interpreting it in it's own specific way.
3. JSON greatest strength is interoperability. It is a great feature of Postgres that JSON is stored as a text; it's basically a 'text, but you can do something with it'. There is many JSON implementations out there, and one should make no assumption about application's expectations.
For instance: JSON standard (RFS-4627) defines all number to be doubles. Yet I've seen application storing 64-bit integers (wouldn't fit in double precision), or even arbitrary precision integers. Most parsers are OK with that.
4. JSON greatest weakness is performance. Because of 1. it needs to be parsed before any useful information is extracted.
5. 1. and 3. are mutually exclusive; this is one of the most valuable takeaways I have from working with Mongo and BSON in particular. BSON is an attempt to create 'binary JSON', and a failed one. It is a poor serialization format: faster than JSON, but less flexible. Being binary, it is strongly typed, and it uses various gimmicks to preserve flexibility: implicit type casts, 3 different equality comparison functions etc. And it's not fully convertible to/from JSON; no binary format is.
It is a poor document format as well: it retains some of the JSON's performance problems: serial nature and ineffective storage.
6. Speed matters to some, and being able to generate binary data in application and send it to database without any serialization/parsing in between provides great optimization opportunity. One thing that Mongo guys got right is the fixed, well-defined binary representation. Application can use provided library to generate objects, and doesn't need to worry about server's version or endianess.
In the application I've mention before, switching from JSON to BSON (in Postgres 9.2, using postgresbson) increased throughput by an order of magnitude. It was an insert-heavy database with indexes on object fields. Both serializing in application and desalinizing in server was faster ~10x.
7. It seems to me that JSONB is going to repeat all the mistakes of BSON, it's going to be 'neither'. If there is an agreement that Postgres needs a 'document' format, why not acknowledge 5., and simply adopt one of the existing formats. Or even better: adopt none, provide many, provide binary send/recv and conversion to and from JSON, let the user choose.
The world is full of self-describing binary formats: BSON, MessagePack (http://msgpack.org/), protobuf, hierarchical H-Store is coming along. Adding another one would create confusion, and a situation similar to this: http://xkcd.com/927/
And a side note:
Postgres' greatest and most under-advertised feature is it's extensibility. People tend to notice only the features present in the core package, while there should be a huge banner on top of http://www.postgresql.org/: "Kids, we support all data types: we have XML, we have JSON, we have H-store, we have BSON, and all it with build-in indexing, storage compression and full transaction support!"
Maciej G.
Maciej, Thanks for feedback -- it's helpful to get a different perspective on this. > I used to work on a project storing large quantities of schema-less data, > initially using MongoDB, then Postgres with JSON, and eventually I > implemented BSON support for Postgres to get the best of both worlds: > https://github.com/maciekgajewski/postgresbson Huh, is that linked anywhere else? > > I don't think that JSONB is a good idea. There is a lot to learn from > MongoDB's mistakes in this area. > > 1. As noted in this thread previously, JSON is a serialization format, not > a document format. The momentum of the industry would argue against you. > 5. 1. and 3. are mutually exclusive. the this is one of the most valuable > takeaways I have from working with Mongo and BSON in particular. BSON is an > attempt to create 'binary JSON', and a failed one. BSON was also developed part-time, by an inexperienced developer, as a side effect of a different project. And they froze the API to early. And they only made a halfhearted effort to be compatible with JSON. So the fact that BSON is a failure doesn't tell us any more than "don't do this in a half-assed way". > 7. It seems to me that JSONB is going to repeat all the mistakes of BSON, > it's going to be 'neither'. If there is an agreement that Postgres needs a > 'document' format, why not acknowledge 5., and simply adopt one of the > existing formats. Or even better: adopt none, provide many, provide binary > send/recv and conversion to and from JSON, let the user choose. Well, I think others have already documented why we don't want BSON. > The world is full of self-describing binary formats: BSON, MessagePack ( > http://msgpack.org/), protobuf, hierarchical H-Store is coming along. > Adding another one would create confusion, and a situation similar to this: > http://xkcd.com/927/ Apparently you missed that JSONB is just Hstore2 with different in/out functions? > Postgres' greatest and most under-advertised feature is it's extensibility. I agree that we don't do enough to promote our extensions. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com