Thread: additional json functionality

additional json functionality

From

Andrew Dunstan

Date:

12 November 2013, 18:35:09

In the last year I have created a number of modules providing extra JSON 
functionality. I'd like to include some of these in release 9.4. 
Candidates for inclusion are:

json_build: which provides extra functionality for building up 
non-regualr and arbitrarily complex json, as well as a way of 
aggregating key value pairs as a json record. See 
<https://github.com/pgexperts/json_build>

json_object: take a one or two dimensional array of text and turn it 
into a json object, similar to the way hstore currently does. See 
<https://bitbucket.org/qooleot/json_object>

json_to_record: turn json into a record or recordset, using the same 
logic as json_pupolate_record and json_populate_recordset, but inferring 
the return type from the call instead of from the first argument. See 
<https://bitbucket.org/qooleot/json_to_record>

json_build is probably the most important in terms of additional 
functionality.

Both json_build and json_to_record have had to copy non-exposed parts of 
the internal Json code, which is one of the many reasons I would like to 
include them, particularly. These extensions have all been created in 
response to client requirements, so there is a definiite use case for 
all of them.

If there's agreement on taking these, I will prepare patches and submit 
them by the 15th.

cheers

andrew

Re: additional json functionality

From

Craig Ringer

Date:

13 November 2013, 07:33:57

On 11/13/2013 02:34 AM, Andrew Dunstan wrote:
> 
> If there's agreement on taking these, I will prepare patches and submit
> them by the 15th.

With JSON enhancement, my only concern is that there's work ongoing to
integrate the v2 development version of hstore with json, providing
typed hstore and an efficient binary storage format for json.

It might be worth seeing how that work is going and what functionality
needs to be added to it, rather than enhancing the existing json support
that may soon change dramatically.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services

Re: additional json functionality

From

Merlin Moncure

Date:

13 November 2013, 14:46:04

On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 11/13/2013 02:34 AM, Andrew Dunstan wrote:
>>
>> If there's agreement on taking these, I will prepare patches and submit
>> them by the 15th.
>
> With JSON enhancement, my only concern is that there's work ongoing to
> integrate the v2 development version of hstore with json, providing
> typed hstore and an efficient binary storage format for json.
>
> It might be worth seeing how that work is going and what functionality
> needs to be added to it, rather than enhancing the existing json support
> that may soon change dramatically.

I'm not so sure we should require hstore to do things like build
arbitrary json objects even though I agree that hstore will probably
displace json for must cases where you want to store nested data (as
opposed to (de-)serialize).  Andrew's patches just fill out a couple
of missing cases that are handled in the existing API.   Putting all
the patches together, ISTM there might be a function or two too many.
I'm not sure why the json_  prefix was abandoned for build_json_object
and build_json_array.

Also, json_object is pretty weird to me, I'm not sure I see the
advantage of a new serialization format, and I don't agree with the
statement "but it is the caller's reponsibility to ensure that keys
are not repeated.".  I think the caller should have no such
responsibility.  Keys should be able to repeated.  Also, I'm not sure
how the {k,v,k,v,k,v}...convention serialized into a string is very
useful in general practice.  I greatly prefer the aggregation and the
variadic methods in json_build.

Putting it all together, I'd consider:
*) dropping json_object (although maybe there is a case I'm not thinking about)
*) changing json_build function names to get the json prefix
*) adding a json object constructor that takes two parallel arrays as
arguments.

merlin

Re: additional json functionality

From

Andrew Dunstan

Date:

13 November 2013, 15:32:38

On 11/13/2013 09:45 AM, Merlin Moncure wrote:
> On Wed, Nov 13, 2013 at 1:33 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 11/13/2013 02:34 AM, Andrew Dunstan wrote:
>>> If there's agreement on taking these, I will prepare patches and submit
>>> them by the 15th.
>> With JSON enhancement, my only concern is that there's work ongoing to
>> integrate the v2 development version of hstore with json, providing
>> typed hstore and an efficient binary storage format for json.
>>
>> It might be worth seeing how that work is going and what functionality
>> needs to be added to it, rather than enhancing the existing json support
>> that may soon change dramatically.

I'm going to be fairly upset if I'm told I have to wait for the new 
format work, and then I'm later told it's too late to bring this into 9.4.

I think these are really orthogonal issues. Adding a new serialization 
format (which I have been discussing with Oleg and Teodor, and which I 
hope to help in bringing to JSON) will make some things lots faster than 
they now are, and might make some things easier or possible where now 
they are hard or impossible, but it won't remove any functionality 
requirement.

In particular, json_build, which lets you build up arbitrarily complex 
and irregular json in a way that's just not possible without using a PL 
right now, is quite an important requirement. I've given talks about it 
and JSON users have been quite excited by the possibilities it opens up.

The patch for json_to_record is quite small (two functions), and it has 
the advantage that unlike the json_populate_record functions you don't 
need to have or create a named type to use it. I think that makes it 
worth having in itself.

> I'm not so sure we should require hstore to do things like build
> arbitrary json objects even though I agree that hstore will probably
> displace json for must cases where you want to store nested data (as
> opposed to (de-)serialize).

I have no idea what this means.

The plan with the work that Oleg and Teodor are doing is to provide a 
set of common code that can be used by either a binary json 
representation (which will be able to be distinguished from a text 
representation, so there would be no pg_upgrade problems) or nested 
hstore. In effect, nested hstore and json would have pretty much 
identical capabilities, so using one ovber another should be largely a 
matter of preference than a forced choice. Frankly, I believe the 
audience for JSON is vastly larger, and I expect it to be the treeish 
data format of choice for almost all users.

> Andrew's patches just fill out a couple
> of missing cases that are handled in the existing API.   Putting all
> the patches together, ISTM there might be a function or two too many.
> I'm not sure why the json_  prefix was abandoned for build_json_object
> and build_json_array.

I'm quite happy to change it.

>
> Also, json_object is pretty weird to me, I'm not sure I see the
> advantage of a new serialization format,

What? there is no new serialization format. This is a way to generate a 
json object in the existing format from a one or two dimensional array 
of text. c.f. |existing function hstore(text[]) => hstore|

> and I don't agree with the
> statement "but it is the caller's reponsibility to ensure that keys
> are not repeated.".  I think the caller should have no such
> responsibility.  Keys should be able to repeated.

They can be repeated, as they can in the current json text format. 
However, the function makes no attempt to deal with repeated keys. If a 
key is repeated in the inout it will be repeated in the output. In this 
respect it differs from the hstore function.

Note too, that one effect of moving to a non-text representation of json 
will be that duplicated keys will be resolved (last value will win). But 
that's a much wider issue that this function.

> Also, I'm not sure
> how the {k,v,k,v,k,v}...convention serialized into a string is very
> useful in general practice.  I greatly prefer the aggregation and the
> variadic methods in json_build.

The extension was built before json_build. But it met a requirement that 
existed at the time. It probably wouldn't be a tragedy to leave it out, 
but there is probably a place for it just as there is for the hstore 
function.

>
> Putting it all together, I'd consider:
> *) dropping json_object (although maybe there is a case I'm not thinking about)
> *) changing json_build function names to get the json prefix
> *) adding a json object constructor that takes two parallel arrays as
> arguments.

The third point seems to conflict with the first. I'd only consider that 
if we *do* add the one-array version of json_object.

cheers

andrew

Re: additional json functionality

From

Merlin Moncure

Date:

13 November 2013, 16:37:17

On Wed, Nov 13, 2013 at 9:32 AM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> I'm not so sure we should require hstore to do things like build
>> arbitrary json objects even though I agree that hstore will probably
>> displace json for must cases where you want to store nested data (as
>> opposed to (de-)serialize).
>
> I have no idea what this means.

What I'm saying there is I agree: what's going with hstore is not
relevant here.  json features currently provide standalone
serialization and deserialization minus a couple of edge cases that
you are fixing up here.  Hstore will emerge as a json manipulation
engine.

Aside: I thought we blew it (as you know) by not unifying the hstore
and json APIs in the 9.2 cycle and now with the emerging json stuff I
really think so...it's way to late to do anything about it now even if
there was consensus on that point.

>> Also, json_object is pretty weird to me, I'm not sure I see the
>> advantage of a new serialization format,
>
> What? there is no new serialization format. This is a way to generate a json
> object in the existing format from a one or two dimensional array of text.
> c.f. |existing function hstore(text[]) => hstore|

Right --  I thought it took text, not text[] -- withdrawn.  I consider
a variant taking (text[], text[]) to be generally more practical than
the one argument version (either 1d or 2d variant).  Dealing with 2d
arrays is a headache unless you really know what you're doing.

>> and I don't agree with the
>> statement "but it is the caller's reponsibility to ensure that keys
>> are not repeated.".  I think the caller should have no such
>> responsibility.  Keys should be able to repeated.
>
> They can be repeated, as they can in the current json text format. However,
> the function makes no attempt to deal with repeated keys. If a key is
> repeated in the inout it will be repeated in the output. In this respect it
> differs from the hstore function.

Yes.  and I think this is one of the major advantages of the json API
vs hstore: you can serialize objects that hstore cannot -- at least
not without extra scaffolding (at least, AIUI, I haven't fully
grappled with the coming hstore stuff yet).  In other words, just
because key order and cardinality is unimportant in an associative
array, it does not in any way follow it is similarly unimportant for
object serialization.

> Note too, that one effect of moving to a non-text representation of json
> will be that duplicated keys will be resolved (last value will win). But
> that's a much wider issue that this function.

Right, exactly.  And I think this a pretty serious problem with 'non
text json' unless there is a relatively robust and fast process to
recompose the json properly for serialization purposes (but that's
mostly off topic for your proposed patch).

>> Also, I'm not sure
>> how the {k,v,k,v,k,v}...convention serialized into a string is very
>> useful in general practice.  I greatly prefer the aggregation and the
>> variadic methods in json_build.
>
> The extension was built before json_build. But it met a requirement that
> existed at the time. It probably wouldn't be a tragedy to leave it out, but
> there is probably a place for it just as there is for the hstore function.
>
>> Putting it all together, I'd consider:
>> *) dropping json_object (although maybe there is a case I'm not thinking
>> about)
>> *) changing json_build function names to get the json prefix
>> *) adding a json object constructor that takes two parallel arrays as
>> arguments.
>
> The third point seems to conflict with the first. I'd only consider that if
> we *do* add the one-array version of json_object.

ok, agreed.  so now I'm just saying to unify function names over json
prefix and maybe add text[], text[] variant for the object builder, or
maybe just drop json_object completely.

merlin

Re: additional json functionality

From

Andrew Dunstan

Date:

13 November 2013, 19:25:14

On 11/13/2013 11:37 AM, Merlin Moncure wrote:
>
> Yes.  and I think this is one of the major advantages of the json API
> vs hstore: you can serialize objects that hstore cannot -- at least
> not without extra scaffolding (at least, AIUI, I haven't fully
> grappled with the coming hstore stuff yet).  In other words, just
> because key order and cardinality is unimportant in an associative
> array, it does not in any way follow it is similarly unimportant for
> object serialization.

I think you're probably going to lose any argument that says we should 
necessarily preserve key order (and possibly key duplication) in 
objects. The standard doesn't support such a contention, either:
   An object is an unordered collection of zero or more name/value   pairs
   ...
   The names within an object SHOULD be unique.

Forcing us to preserve order and key duplication would be a pretty 
effective barrier to any performance improvements.

cheers

andrew

Re: additional json functionality

From

Merlin Moncure

Date:

13 November 2013, 21:58:37

On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> On 11/13/2013 11:37 AM, Merlin Moncure wrote:
>>
>> Yes.  and I think this is one of the major advantages of the json API
>> vs hstore: you can serialize objects that hstore cannot -- at least
>> not without extra scaffolding (at least, AIUI, I haven't fully
>> grappled with the coming hstore stuff yet).  In other words, just
>> because key order and cardinality is unimportant in an associative
>> array, it does not in any way follow it is similarly unimportant for
>> object serialization.
>
>    An object is an unordered collection ofz ero or more name/value
>    pairs
>    ...
>
>    The names within an object SHOULD be unique.
>
> Forcing us to preserve order and key duplication would be a pretty effective
> barrier to any performance improvements.

SHOULD != MUST. Here is the definition of object per RFC 4627.
 "An object structure is represented as a pair of curly brackets  surrounding zero or more name/value pairs (or
members). A name is a  string.  A single colon comes after each name, separating the name  from the value.  A single
commaseparates a value from a following  name.  The names within an object SHOULD be unique."

And SHOULD means "3. SHOULD. This word, or the adjective "RECOMMENDED", mean that
there may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course."

As far as I'm concerned, that settles things right there.  Beyond that
(although they do say 'unordered' above), as a consequence of your
argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should
be considered equivalent.  Another consequence is that creating
particular legal constructions should be discouraged.  I disagree with
this.

This is simply not the case with many json consuming clients.  It's a
nice idea but not how things work universally and that's exactly why
the rules were hedged in the RFC.  I have a couple of cases right now
where I'm producing key order sensitive json for some (admittedly not
very well designed) json consuming clients that are out of my control.

merlin

Re: additional json functionality

From

Josh Berkus

Date:

13 November 2013, 22:16:36

On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should
require hstore to do things like build
> Also, json_object is pretty weird to me, I'm not sure I see the
> advantage of a new serialization format, and I don't agree with the
> statement "but it is the caller's reponsibility to ensure that keys
> are not repeated.".

This is pretty standard in the programming languages I know of which use
JSON.

> I think the caller should have no such
> responsibility.  Keys should be able to repeated.

Apparently your experience with using JSON in practice has been fairly
different from mine; the projects I work on, the JSON is being
constantly converted back and forth to hashes and dictionaries, which
means that ordering is not preserved and keys have to be unique (or
become unique within one conversion cycle).  I think, based on the
language of the RFC and common practice, that it's completely valid for
us to require unique keys within JSON-manipulation routines.

Certainly the upcoming binary storage is going to require unique keys.
For that matter, both MongoDB and CouchDB store unique, unordered keys.And ever supporting CRUD functions (i.e. "update
thiskey") is going to

require uniqueness.

> Putting it all together, I'd consider:
> *) dropping json_object (although maybe there is a case I'm not
thinking about)
> *) changing json_build function names to get the json prefix
> *) adding a json object constructor that takes two parallel arrays as
> arguments.

I was with you until the third idea.  Huh?

The scripting languages I use (Perl, Python) have functions which
convert a list/array to a hash/dictionary.  In each case, the standard
input is a single list/array in the form [ k, v, k, v, k, v ].  Now,
while there are standard language functions which support munging two
parallel arrays into one hash (such as Python's zip()), these are less
frequently used.  Supporting the zip() option without supporting the [
k, v ] array option would be a bizarre and puzzling approach to most
programmers I know.  I can see three approaches which make sense:

1. we don't include json_object at all.
2. we include the existing json_object
3. we include json_object, plus a second json_object function which
takes two arrays

Keep in mind that all of Andrew's functions came out of real-life use
cases of writing applications which return JSON to the caller, so they
are based on real needs to fill holes in our JSON-building function library.

In the case of json_object, the need was to supply "column" labels
where, usually due to calculated columns, none exist in the input.  Take
the example where I want to return a bunch of aggregates from a table as
a series of json objects with user-friendly labels:

SELECT build_json_object( "dept", department, "total_costs", sum(costs),
"running_total", running_sum() )
FROM ....

Where it becomes even more useful is when you want the json label to be
the result of a calculated expression:

SELECT build_json_object ( department, sum() )

Yes, you could do this with a two-array version as well; it's just not
more intuitive, and in cases where you have dozens of columns, puts you
in column-counting hell.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Andrew Dunstan

Date:

13 November 2013, 22:33:54

On 11/13/2013 04:58 PM, Merlin Moncure wrote:
> On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> On 11/13/2013 11:37 AM, Merlin Moncure wrote:
>>> Yes.  and I think this is one of the major advantages of the json API
>>> vs hstore: you can serialize objects that hstore cannot -- at least
>>> not without extra scaffolding (at least, AIUI, I haven't fully
>>> grappled with the coming hstore stuff yet).  In other words, just
>>> because key order and cardinality is unimportant in an associative
>>> array, it does not in any way follow it is similarly unimportant for
>>> object serialization.
>>     An object is an unordered collection ofz ero or more name/value
>>     pairs
>>     ...
>>
>>     The names within an object SHOULD be unique.
>>
>> Forcing us to preserve order and key duplication would be a pretty effective
>> barrier to any performance improvements.
> SHOULD != MUST. Here is the definition of object per RFC 4627.
>
>    "An object structure is represented as a pair of curly brackets
>     surrounding zero or more name/value pairs (or members).  A name is a
>     string.  A single colon comes after each name, separating the name
>     from the value.  A single comma separates a value from a following
>     name.  The names within an object SHOULD be unique."
>
> And SHOULD means
>    "3. SHOULD. This word, or the adjective "RECOMMENDED", mean that
> there may exist valid reasons in particular circumstances to ignore a
> particular item, but the full implications must be understood and
> carefully weighed before choosing a different course."
>
> As far as I'm concerned, that settles things right there.  Beyond that
> (although they do say 'unordered' above), as a consequence of your
> argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should
> be considered equivalent.  Another consequence is that creating
> particular legal constructions should be discouraged.  I disagree with
> this.
>
> This is simply not the case with many json consuming clients.  It's a
> nice idea but not how things work universally and that's exactly why
> the rules were hedged in the RFC.  I have a couple of cases right now
> where I'm producing key order sensitive json for some (admittedly not
> very well designed) json consuming clients that are out of my control.
>


I understand the difference between "should" and "must". But there is 
nothing that REQUIRES us to preserve key order or duplicate keys. If you 
really need textual preservation, you should probably store the data as 
text and convert it to json to do json-ish things to it. If not, we're 
going to face huge demands to implement another type which almost 
everyone but you will move to in rapid order because it performs so much 
better. The strong consensus I have seen in discussions at conferences 
and elsewhere is to go the way we're going, instead.

cheers

andrew

Re: additional json functionality

From

Gavin Flower

Date:

13 November 2013, 22:44:39

On 14/11/13 11:33, Andrew Dunstan wrote:
>
> On 11/13/2013 04:58 PM, Merlin Moncure wrote:
>> On Wed, Nov 13, 2013 at 1:25 PM, Andrew Dunstan <andrew@dunslane.net> 
>> wrote:
>>> On 11/13/2013 11:37 AM, Merlin Moncure wrote:
>>>> Yes.  and I think this is one of the major advantages of the json API
>>>> vs hstore: you can serialize objects that hstore cannot -- at least
>>>> not without extra scaffolding (at least, AIUI, I haven't fully
>>>> grappled with the coming hstore stuff yet).  In other words, just
>>>> because key order and cardinality is unimportant in an associative
>>>> array, it does not in any way follow it is similarly unimportant for
>>>> object serialization.
>>>     An object is an unordered collection ofz ero or more name/value
>>>     pairs
>>>     ...
>>>
>>>     The names within an object SHOULD be unique.
>>>
>>> Forcing us to preserve order and key duplication would be a pretty 
>>> effective
>>> barrier to any performance improvements.
>> SHOULD != MUST. Here is the definition of object per RFC 4627.
>>
>>    "An object structure is represented as a pair of curly brackets
>>     surrounding zero or more name/value pairs (or members).  A name is a
>>     string.  A single colon comes after each name, separating the name
>>     from the value.  A single comma separates a value from a following
>>     name.  The names within an object SHOULD be unique."
>>
>> And SHOULD means
>>    "3. SHOULD. This word, or the adjective "RECOMMENDED", mean that
>> there may exist valid reasons in particular circumstances to ignore a
>> particular item, but the full implications must be understood and
>> carefully weighed before choosing a different course."
>>
>> As far as I'm concerned, that settles things right there. Beyond that
>> (although they do say 'unordered' above), as a consequence of your
>> argument the json strings {"a": 1, "b": 2} and {"b": 1, "a": 2} should
>> be considered equivalent.  Another consequence is that creating
>> particular legal constructions should be discouraged.  I disagree with
>> this.
>>
>> This is simply not the case with many json consuming clients. It's a
>> nice idea but not how things work universally and that's exactly why
>> the rules were hedged in the RFC.  I have a couple of cases right now
>> where I'm producing key order sensitive json for some (admittedly not
>> very well designed) json consuming clients that are out of my control.
>>
>
>
> I understand the difference between "should" and "must". But there is 
> nothing that REQUIRES us to preserve key order or duplicate keys. If 
> you really need textual preservation, you should probably store the 
> data as text and convert it to json to do json-ish things to it. If 
> not, we're going to face huge demands to implement another type which 
> almost everyone but you will move to in rapid order because it 
> performs so much better. The strong consensus I have seen in 
> discussions at conferences and elsewhere is to go the way we're going, 
> instead.
>
> cheers
>
> andrew
>
>
>
I can see that both points of view are valid, in different contexts.

Would be possible to have a boolean, such as 'strict' - so that unique & 
ordered was only imposed when strict was TRUE? Alternately, separate 
functions to allow the same choice?


Cheers,
Gavin

Re: additional json functionality

From

Merlin Moncure

Date:

13 November 2013, 23:09:11

On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should
> require hstore to do things like build
>> Also, json_object is pretty weird to me, I'm not sure I see the
>> advantage of a new serialization format, and I don't agree with the
>> statement "but it is the caller's reponsibility to ensure that keys
>> are not repeated.".
>
> This is pretty standard in the programming languages I know of which use
> JSON.
>
>> I think the caller should have no such
>> responsibility.  Keys should be able to repeated.
>
> Apparently your experience with using JSON in practice has been fairly
> different from mine; the projects I work on, the JSON is being
> constantly converted back and forth to hashes and dictionaries, which
> means that ordering is not preserved and keys have to be unique (or
> become unique within one conversion cycle).  I think, based on the
> language of the RFC and common practice, that it's completely valid for
> us to require unique keys within JSON-manipulation routines.

Common practice?  The internet is littered with complaints about
documents being spontaneously re-ordered and or de-duplicated in
various stacks.  Other stacks provide mechanisms for explicit key
order handling (see here: http://docs.python.org/2/library/json.html). Why do you think they did that?

I use pg/JSON all over the place.  In several cases I have to create
documents with ordered keys because the parser on the other side wants
them that way -- this is not a hypothetical argument.  The current
json serialization API handles that just fine and the hstore stuff
coming down the pike will not.  I guess that's a done deal based on
'performance'.  I'm clearly not the only one to have complained about
this though.

merln

Re: additional json functionality

From

Mike Blackwell

Date:

13 November 2013, 23:10:21

On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:

> Putting it all together, I'd consider:
> *) dropping json_object (although maybe there is a case I'm not
thinking about)
> *) changing json_build function names to get the json prefix
> *) adding a json object constructor that takes two parallel arrays as
> arguments.

I was with you until the third idea. Huh?

I actually had a use case for this today, though with hstore, importing a fixed length record with something along the lines of:

hstore(

ARRAY['field 1', 'field 2', 'field 3'],

regexp_matches(fixed_field,'(.{4})(.{10})(.{5})')

)

__________________________________________________________________________________
Mike Blackwell | Technical Analyst, Distribution Services/Rollout Management | RR Donnelley
1750 Wallace Ave | St Charles, IL 60174-3401
Office: 630.313.7818
Mike.Blackwell@rrd.com
http://www.rrdonnelley.com

Re: additional json functionality

From

Josh Berkus

Date:

13 November 2013, 23:20:30

Merlin,


> I use pg/JSON all over the place.  In several cases I have to create
> documents with ordered keys because the parser on the other side wants
> them that way -- this is not a hypothetical argument.  The current
> json serialization API handles that just fine and the hstore stuff
> coming down the pike will not.  I guess that's a done deal based on
> 'performance'.  I'm clearly not the only one to have complained about
> this though.

It's not just a matter of "performance".  It's the basic conflict of
JSON as document format vs. JSON as data storage.  For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

XML went through the same thing, which is part of how we got a bunch of
incompatible "dialects" of XML.

Now, your use case does show us that there's a case to be made for still
having text JSON even after we have binary JSON.  There's a strong
simplicity argument against that, though ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Hannu Krosing

Date:

14 November 2013, 00:01:37

On 11/14/2013 12:09 AM, Merlin Moncure wrote:
> On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:
>> On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should
>> require hstore to do things like build
>>> Also, json_object is pretty weird to me, I'm not sure I see the
>>> advantage of a new serialization format, and I don't agree with the
>>> statement "but it is the caller's reponsibility to ensure that keys
>>> are not repeated.".
>> This is pretty standard in the programming languages I know of which use
>> JSON.
>>
>>> I think the caller should have no such
>>> responsibility.  Keys should be able to repeated.
>> Apparently your experience with using JSON in practice has been fairly
>> different from mine; the projects I work on, the JSON is being
>> constantly converted back and forth to hashes and dictionaries, which
>> means that ordering is not preserved and keys have to be unique (or
>> become unique within one conversion cycle).  I think, based on the
>> language of the RFC and common practice, that it's completely valid for
>> us to require unique keys within JSON-manipulation routines.
> Common practice?  The internet is littered with complaints about
> documents being spontaneously re-ordered and or de-duplicated in
> various stacks.  Other stacks provide mechanisms for explicit key
> order handling (see here: http://docs.python.org/2/library/json.html).
>   Why do you think they did that?
>
> I use pg/JSON all over the place.  In several cases I have to create
> documents with ordered keys because the parser on the other side wants
> them that way -- this is not a hypothetical argument.  The current
> json serialization API handles that just fine and the hstore stuff
> coming down the pike will not.  
I guess we should not replace current JSON type with hstore based
one, but add something json-like based on nested hstore instead.

Maybe call it jsdoc or jdoc or jsobj or somesuch.

For some time I was also pretty perplexed by by some PostgreSQL JSON
type discussions where JSON was not really being a defined as
the the type constructed from its string representation, and even not
a string which results from serialising an existing javascript object,
but rather a source code, which can be parsed into a structured type.

So PostgreSQL "json" type is *not* a structured type like hstore is but
is really a string type with a few syntax checks.

Some of the json_* functions are then defined on top of this
"json-source" type which treat this source as if it were actual
structured type.

It is kind of defining an "int-notation" type, which acts like an integer
when added to another integer, but is required to  also keep its original
representation:

select '1+1'::"int-notation" + 2;
==> 4
select '1+1'::"int-notation"
==> 1+1

> I guess that's a done deal based on
> 'performance'.  I'm clearly not the only one to have complained about
> this though.
I am pretty sure we can not move to internal object representation and
preserve the current 'json source" behaviour.

this is why I recommend not replacing json, but rather adding another
built-in
for real structured type.

then you can keep using current json for the earlier-quoted uses of
"processing instructions" and do real data manipulation on jsdoc/jsobj type.

Also most of the current json functions should also be moved to work on
jsobj instead with explicit cast from json to jsobj


Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

"David E. Wheeler"

Date:

14 November 2013, 00:39:32

On Nov 13, 2013, at 2:41 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:

> Would be possible to have a boolean, such as 'strict' - so that unique & ordered was only imposed when strict was
TRUE?Alternately, separate functions to allow the same choice? 

It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO.

Best,

David

Re: additional json functionality

From

Andrew Dunstan

Date:

14 November 2013, 00:42:30

On 11/13/2013 07:01 PM, Hannu Krosing wrote:
>
> I guess we should not replace current JSON type with hstore based
> one, but add something json-like based on nested hstore instead.

Well, that's two voices for that course of action.

Interesting that I don't think I heard a single voice for this either at 
pgCon or pgOpen, although I spent large amounts of time at both talking 
to people about Json, so I'd be interested to hear more voices.

It would actually simplify things in a way if we do that - we've been 
working on a way of doing this that wouldn't upset pg_upgrade. This 
would render that effort unnecessary.

However it will complicate things for users who will have to choose 
between the data types, and function authors who will possibly have to 
write versions of functions to work with both types.

>
> Also most of the current json functions should also be moved to work on
> jsobj instead with explicit cast from json to jsobj
>

Sure, we can overload them - that's probably the least of our worries.

cheers

andrew

Re: additional json functionality

From

Andrew Dunstan

Date:

14 November 2013, 00:45:44

On 11/13/2013 07:39 PM, David E. Wheeler wrote:
> On Nov 13, 2013, at 2:41 PM, Gavin Flower <GavinFlower@archidevsys.co.nz> wrote:
>
>> Would be possible to have a boolean, such as 'strict' - so that unique & ordered was only imposed when strict was
TRUE?Alternately, separate functions to allow the same choice?

> It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO.

I don't in the least understand how it could be a pretty printing 
option. If we move to a binary rep using the hstore stuff order will be 
destroyed and not stored anywhere, and duplicate keys will be lost. Once 
that's done, how would a pretty print option restore the lost info?

cheers

andrew

Re: additional json functionality

From

"David E. Wheeler"

Date:

14 November 2013, 00:51:07

On Nov 13, 2013, at 4:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

>> It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO.
>
> I don't in the least understand how it could be a pretty printing option. If we move to a binary rep using the hstore
stufforder will be destroyed and not stored anywhere, and duplicate keys will be lost. Once that's done, how would a
prettyprint option restore the lost info? 

I meant ordering the keys, usually in lexicographic order. I agree that preserving order is untenable.

Best,

David

Re: additional json functionality

From

Garick Hamlin

Date:

14 November 2013, 14:47:22

On Wed, Nov 13, 2013 at 04:50:49PM -0800, David E. Wheeler wrote:
> On Nov 13, 2013, at 4:45 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> 
> >> It should be a pretty-printing function option, perhaps, but not core to the type itself, IMO.
> > 
> > I don't in the least understand how it could be a pretty printing option.
> > If we move to a binary rep using the hstore stuff order will be destroyed
> > and not stored anywhere, and duplicate keys will be lost. Once that's done,
> > how would a pretty print option restore the lost info?
> 
> I meant ordering the keys, usually in lexicographic order. I agree that preserving order is untenable.

There is a canonical form.

http://tools.ietf.org/html/draft-staykov-hu-json-canonical-form-00

A Canonical form would be very useful.  Thats a bit trickier than sorting the 
keys and I don't know there is an accepted canonical form for json yet that
can represent all json documents.  (The canonical form is not the pretty form, 
but I think the key ordering should be the same.)

It might be nice to have a more general canonical form if one emerges from 
somewhere that could encode any json.  Since without something like this,
hashing can only be well specified for the 'sensible subset of json' used in
security protocols.

Garick

Re: additional json functionality

From

Merlin Moncure

Date:

14 November 2013, 15:07:36

On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> On 11/14/2013 12:09 AM, Merlin Moncure wrote:
>> On Wed, Nov 13, 2013 at 4:16 PM, Josh Berkus <josh@agliodbs.com> wrote:
>>> On 11/13/2013 06:45 AM, Merlin Moncure wrote:> I'm not so sure we should
>>> require hstore to do things like build
>>>> Also, json_object is pretty weird to me, I'm not sure I see the
>>>> advantage of a new serialization format, and I don't agree with the
>>>> statement "but it is the caller's reponsibility to ensure that keys
>>>> are not repeated.".
>>> This is pretty standard in the programming languages I know of which use
>>> JSON.
>>>
>>>> I think the caller should have no such
>>>> responsibility.  Keys should be able to repeated.
>>> Apparently your experience with using JSON in practice has been fairly
>>> different from mine; the projects I work on, the JSON is being
>>> constantly converted back and forth to hashes and dictionaries, which
>>> means that ordering is not preserved and keys have to be unique (or
>>> become unique within one conversion cycle).  I think, based on the
>>> language of the RFC and common practice, that it's completely valid for
>>> us to require unique keys within JSON-manipulation routines.
>> Common practice?  The internet is littered with complaints about
>> documents being spontaneously re-ordered and or de-duplicated in
>> various stacks.  Other stacks provide mechanisms for explicit key
>> order handling (see here: http://docs.python.org/2/library/json.html).
>>   Why do you think they did that?
>>
>> I use pg/JSON all over the place.  In several cases I have to create
>> documents with ordered keys because the parser on the other side wants
>> them that way -- this is not a hypothetical argument.  The current
>> json serialization API handles that just fine and the hstore stuff
>> coming down the pike will not.
> I guess we should not replace current JSON type with hstore based
> one, but add something json-like based on nested hstore instead.
>
> Maybe call it jsdoc or jdoc or jsobj or somesuch.

This is exactly what needs to be done, full stop (how about: hstore).
It really comes down to this: changing the serialization behaviors
that have been in production for 2 releases (three if you count the
extension) is bad enough, but making impossible some legal json
constructions which are currently possible is an unacceptable
compatibility break.  It's going to break applications I've currently
put into production with no clear workaround.  This is quite frankly
not ok and and I'm calling foul.  The RFC may claim that these
constructions are dubious but that's irrelevant.  It's up to the
parser to decide that and when serializing you are not in control of
the parser.

Had the json type been stuffed into an extension, there would be a
clearer path to get to where you want to go since we could have walled
off the old functionality and introduced side by side API calls.  As
things stand now, I don't see a clean path to do that.

> I use pg/JSON all over the place.  In several cases I have to create
> documents with ordered keys because the parser on the other side wants
> them that way -- this is not a hypothetical argument.  The current
> json serialization API handles that just fine and the hstore stuff
> coming down the pike will not.  I guess that's a done deal based on
> 'performance'.  I'm clearly not the only one to have complained about
> this though.

It's not just a matter of "performance".  It's the basic conflict of
JSON as document format vs. JSON as data storage.  For the latter,
unique, unordered keys are required, or certain functionality isn't
remotely possible: indexing, in-place key update, transformations, etc.

On Wed, Nov 13, 2013 at 5:20 PM, Josh Berkus <josh@agliodbs.com> wrote:
> It's not just a matter of "performance".  It's the basic conflict of
> JSON as document format vs. JSON as data storage.  For the latter,
> unique, unordered keys are required, or certain functionality isn't
> remotely possible: indexing, in-place key update, transformations, etc.

That's not very convincing.  What *exactly* is impossible and why to
you think it justifies breaking compatibility with current
applications?   The way forward seems pretty straightforward: given
that hstore is getting nesting power and is moving closer to the json
way of doing things it is essentially 'binary mode json'.  I'm ok with
de-duplication and key ordering when moving into that structure since
it's opt in and doesn't break the serialization behaviors we have
today.  If you want to go further and unify the types then you have to
go through the design work to maintain compatibility.

Furthermore, I bet the performance argument isn't so clear cut either.The current json type is probably faster at bulk
serialization
precisely because you *dont* need to deduplicate and reorder keys: the
serialization operates without context.  It will certainly be much
better for in place manipulations but it's not nearly as simple as you
are making it out to be.

merlin

Re: additional json functionality

From

Hannu Krosing

Date:

14 November 2013, 15:43:05

On 11/14/2013 01:42 AM, Andrew Dunstan wrote:
>
> On 11/13/2013 07:01 PM, Hannu Krosing wrote:
>>
>> I guess we should not replace current JSON type with hstore based
>> one, but add something json-like based on nested hstore instead.
>
>
> Well, that's two voices for that course of action.
I am not really "for" it (I would have liked to have a
json_object/json_structure instead of
json_string as the meaning of "json") but I think there is quite strong
argument
for not breaking backwards compatibility.
>
> Interesting that I don't think I heard a single voice for this either
> at pgCon or pgOpen,
I attended neither, but I did voice my preferences for _not_ having the
"json-as-source-code"
type on the mailing lists during previous json discussions.

> although I spent large amounts of time at both talking to people about
> Json, so I'd be interested to hear more voices.
>
> It would actually simplify things in a way if we do that - we've been
> working on
> a way of doing this that wouldn't upset pg_upgrade. This would render
> that effort unnecessary.
I wonder how hard it would be to rename current json to json_source and
have a new
nested-hstore based json ?

>
> However it will complicate things for users who will have to choose
> between the data types,
> and function authors who will possibly have to write versions of
> functions to work with both types.
You mostly want the functions for json-object type.

This is supported by the fact that current functions on json-source
treat it as json-object (for example key lookup gives you the value
of latest key and not a list of all matching key values).

You may want some new functions on json-source
(maybe json_source_enumerate_key_values(json, key))
but the current ones are really for json-object.


-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

Hannu Krosing

Date:

14 November 2013, 15:47:05

On 11/14/2013 12:20 AM, Josh Berkus wrote:
> Merlin,
>
>
>> I use pg/JSON all over the place.  In several cases I have to create
>> documents with ordered keys because the parser on the other side wants
>> them that way -- this is not a hypothetical argument.  The current
>> json serialization API handles that just fine and the hstore stuff
>> coming down the pike will not.  I guess that's a done deal based on
>> 'performance'.  I'm clearly not the only one to have complained about
>> this though.
> It's not just a matter of "performance".  It's the basic conflict of
> JSON as document format vs. JSON as data storage.  For the latter,
> unique, unordered keys are required, or certain functionality isn't
> remotely possible: indexing, in-place key update, transformations, etc.
>
> XML went through the same thing, which is part of how we got a bunch of
> incompatible "dialects" of XML.
>
> Now, your use case does show us that there's a case to be made for still
> having text JSON even after we have binary JSON. 
text-json could easily be a domain (text + check that it is convertible
to json)

maybe it is even possible to teach pg_upgrade to do this automatically
>  There's a strong simplicity argument against that, though ...
I think it confuses most people, similar to how storing 1+1 as
"processing instructions" instead of just evaluationg it and storing 2 :)

OTOH we are in this mess now and have to solve the backwards
compatibility somehow.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

Merlin Moncure

Date:

14 November 2013, 16:06:25

On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> This is supported by the fact that current functions on json-source
> treat it as json-object (for example key lookup gives you the value
> of latest key and not a list of all matching key values).

yeah. hm. that's a good point.

Maybe there's a middle ground here: I bet the compatibility issues
would be minimized to an acceptable level if the 'xxx_to_json'
functions maintained their current behaviors; they would construct the
json type in a special internal mode that would behave like the
current type does.   In other words, the marshalling into binary
structure could happen when:

*) stored do a column in a table
*) when any modifying routine is called, updating a key, value, etc
*) manually via a function

but not at cast time.  This preserves compatibility for the important
points and allows serialization of structures that are difficult with
the binary mode variant.

merln

Re: additional json functionality

From

Hannu Krosing

Date:

14 November 2013, 16:42:52

On 11/14/2013 04:07 PM, Merlin Moncure wrote:
> On Wed, Nov 13, 2013 at 6:01 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>>
>> I guess we should not replace current JSON type with hstore based
>> one, but add something json-like based on nested hstore instead.
>>
>> Maybe call it jsdoc or jdoc or jsobj or somesuch.
> This is exactly what needs to be done, full stop (how about: hstore).
hstore has completely different i/o formats and thus has similar
backwards compatibility problems.
> It really comes down to this: changing the serialization behaviors
It is really not "serialisation behaviours" as there is nothing you
can sensibly serialise to have repeated keys.

I agree that you can "generate" such JSON which would be valid
input tu any json parser, but no JavaScript Object which really serializes
to such JSON.
> that have been in production for 2 releases (three if you count the
> extension) is bad enough, but making impossible some legal json
> constructions which are currently possible is an unacceptable
> compatibility break.  
we should have disallowed this from the beginning and should
have encourages using text as storage for JavaScript source code.
> It's going to break applications I've currently
> put into production with no clear workaround.  
we could rename the old json type during pg_upgrade, but this
would likely break at least implicit casts in functions.
> This is quite frankly
> not ok and and I'm calling foul.  The RFC may claim that these
> constructions are dubious but that's irrelevant.  It's up to the
> parser to decide that and when serializing you are not in control of
> the parser.
You could choose a sane serializer ;)

The main argument here is still weather "json" is source
code or serialization result for JavaScript Object (Notation).

> Had the json type been stuffed into an extension, there would be a
> clearer path to get to where you want to go since we could have walled
> off the old functionality and introduced side by side API calls.  As
> things stand now, I don't see a clean path to do that.

>> I use pg/JSON all over the place.  In several cases I have to create
>> documents with ordered keys because the parser on the other side wants
>> them that way -- this is not a hypothetical argument.  
But one could argue that this is not "json" either but rather some
json-like input format for special parsers.

Current recommendation is to use "text" for these kinds of things.

>> The current
>> json serialization API handles that just fine and the hstore stuff
>> coming down the pike will not.  I guess that's a done deal based on
>> 'performance'.  I'm clearly not the only one to have complained about
>> this though.
> It's not just a matter of "performance".  It's the basic conflict of
> JSON as document format vs. JSON as data storage.  For the latter,
> unique, unordered keys are required, or certain functionality isn't
> remotely possible: indexing, in-place key update, transformations, etc.
All these would be possible if we redefined json as another notation
for XML instead of string representation of JavaScript Object :)

And things could really be "in-place" only inside pl/language functions,
as PostgreSQL is still MVCC.

What should be faster is access to nested values, though I suspect
that it is not significantly faster unless you have very large json
documents.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

Hannu Krosing

Date:

14 November 2013, 16:55:09

On 11/14/2013 05:06 PM, Merlin Moncure wrote:
> On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>> This is supported by the fact that current functions on json-source
>> treat it as json-object (for example key lookup gives you the value
>> of latest key and not a list of all matching key values).
> yeah. hm. that's a good point.
>
> Maybe there's a middle ground here: I bet the compatibility issues
> would be minimized to an acceptable level if the 'xxx_to_json'
> functions maintained their current behaviors; they would construct the
> json type in a special internal mode that would behave like the
> current type does.   
Do you have any xxx_to_json usage which can generate a field with
multiple equal keys ?

Or is it just about preserving order ?
> In other words, the marshalling into binary
> structure could happen when:
>
> *) stored do a column in a table
> *) when any modifying routine is called, updating a key, value, etc
> *) manually via a function
>
> but not at cast time.  This preserves compatibility for the important
> points and allows serialization of structures that are difficult with
> the binary mode variant.
Seems like this would not play nice with how PostgreSQL type system work
in general, but could be a way forward if you say that you really do not
need
to store the order-preserving, multi-valued json.

But in this case it could also be possible for these function to just
generate
json-format "text", and with proper casts this would act exactly as you
describe
above, no ?

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

Merlin Moncure

Date:

14 November 2013, 17:29:07

On Thu, Nov 14, 2013 at 10:54 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> On 11/14/2013 05:06 PM, Merlin Moncure wrote:
>> On Thu, Nov 14, 2013 at 9:42 AM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>>> This is supported by the fact that current functions on json-source
>>> treat it as json-object (for example key lookup gives you the value
>>> of latest key and not a list of all matching key values).
>> yeah. hm. that's a good point.
>>
>> Maybe there's a middle ground here: I bet the compatibility issues
>> would be minimized to an acceptable level if the 'xxx_to_json'
>> functions maintained their current behaviors; they would construct the
>> json type in a special internal mode that would behave like the
>> current type does.
>
> Do you have any xxx_to_json usage which can generate a field with
> multiple equal keys ?

Absolutely -- that's what I've been saying all along.  For example:
IIRC the end consumer is jqgrid, although the structure format may be
being done to satisfy some intermediate transformations perhaps in GWT
or in the browser itself.  The point is I didn't define the structure
(I think it sucks too), it was given to me to create and I did.  The
object's dynamic keys and values are moved into json structure by
passing two parallel arrays into a userland function similar to what
Andrew is proposing with json_build functionality.

{   "classDisplayName": null,   "rows": [       {           "PropertyName": "xxx",           "Row": 1,
"Group":"Executive Dashboard",           "MetricName": "Occupancy",           "2012": "95.4%",           "Q2": "96.5%",
         "Q3": "96.3%",           "Q4": "94.8%",           "2013": "95.1%",           "Q2": "94.1%",           "Q3":
"96.0%",          "Q4": "96.1%"       },       {           "PropertyName": "xxx",           "Row": 2,
"Group":"Executive Dashboard",           "MetricName": "Occupancy",           "2012": "95.9%",           "Q2": "97.3%",
         "Q3": "95.7%",           "Q4": "95.2%",           "2013": "93.9%",           "Q2": "93.4%",           "Q3":
"95.3%",          "Q4": "95.1%"       }   ]

}

>> but not at cast time.  This preserves compatibility for the important
>> points and allows serialization of structures that are difficult with
>> the binary mode variant.
>
> Seems like this would not play nice with how PostgreSQL type system work
> in general, but could be a way forward if you say that you really do not
> need
> to store the order-preserving, multi-valued json.

Yes, exactly. I'm OK with simplifying the structure for storage
purposes because in that context postgres is the parser and gets to
decide what the precise behaviors are.  Simplifying the stored
structures during upgrade is an OK concession to make, I think.  It is
not safe to assume the structure should be simplified when
serializing.

> But in this case it could also be possible for these function to just
> generate
> json-format "text", and with proper casts this would act exactly as you
> describe
> above, no ?

I think so. if I'm following you correctly.  Maybe you get the best of
both worlds and (mostly) maintaining compatibility by deferring the
decomposition into binary structure in certain contexts.  I'd even
throw in the equality operator (which, thankfully, we haven't defined
yet) as a place where decomposition could happen.  Pretty much any
scenario that isn't involved in raw assembly and output.

merlin

Re: additional json functionality

From

"David E. Wheeler"

Date:

14 November 2013, 17:35:03

On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

> This is exactly what needs to be done, full stop (how about: hstore).
> It really comes down to this: changing the serialization behaviors
> that have been in production for 2 releases (three if you count the
> extension) is bad enough, but making impossible some legal json
> constructions which are currently possible is an unacceptable
> compatibility break.  It's going to break applications I've currently
> put into production with no clear workaround.  This is quite frankly
> not ok and and I'm calling foul.  The RFC may claim that these
> constructions are dubious but that's irrelevant.  It's up to the
> parser to decide that and when serializing you are not in control of
> the parser.

The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something to
beguaranteed? Just because people have come to depend on something doesn’t mean we can’t change it. It’s one thing if
wesaid this was a feature you could depend on, but AFAIK we haven’t. And frankly, the dupes have caused problems for
someof my colleagues at work. To me, it’s a bug (or, at best, a mis-feature) that causes more issues than it prevents. 

In my experience, no JSON parser guarantees key order or duplication. You can’t have dupes and there is no ordering in
aPerl hash, Objective-C NSDictionary, or JavaScript object. There is of course order and there can be dupes in a JSON
string,but not in the objects built from it. If you go in and out of a parser, dupes are eliminated and key order is
notpreserved. I expect the same from JSON storage. 

With no guarantees of preserved ordering or duplication, and with no formal expectation of such by JSON parsers written
forvarious programming languages, I think there is little to be lost by removing those aspects of the JSON type. For
those(hopefully rare) situations where such expectations exist, the JSON should be stored as text, as Hannu suggests. 

My $0.02.

Best,

David

Re: additional json functionality

From

Merlin Moncure

Date:

14 November 2013, 19:17:28

On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
<david@justatheory.com> wrote:
> On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> This is exactly what needs to be done, full stop (how about: hstore).
>> It really comes down to this: changing the serialization behaviors
>> that have been in production for 2 releases (three if you count the
>> extension) is bad enough, but making impossible some legal json
>> constructions which are currently possible is an unacceptable
>> compatibility break.  It's going to break applications I've currently
>> put into production with no clear workaround.  This is quite frankly
>> not ok and and I'm calling foul.  The RFC may claim that these
>> constructions are dubious but that's irrelevant.  It's up to the
>> parser to decide that and when serializing you are not in control of
>> the parser.
>
> The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something
tobe guaranteed?

It doesn't, but the row_to_json function has a very clear mechanism of
action.  And, 'not being documented' is not the standard for latitude
to make arbitrary changes to existing function behaviors.

> In my experience, no JSON parser guarantees key order or duplication.

I found one in about two seconds.  http://docs.python.org/2/library/json.html

"object_pairs_hook, if specified will be called with the result of
every JSON object decoded with an ordered list of pairs. The return
value ofobject_pairs_hook will be used instead of the dict. This
feature can be used to implement custom decoders that rely on the
order that the key and value pairs are decoded (for example,
collections.OrderedDict() will remember the order of insertion). If
object_hook is also defined, the object_pairs_hooktakes priority."

That makes the rest of your argument moot.  Plus, I quite clearly am
dealing with parsers that do.

merlin

Re: additional json functionality

From

Hannu Krosing

Date:

14 November 2013, 19:54:23

On 11/14/2013 08:17 PM, Merlin Moncure wrote:
> On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
> <david@justatheory.com> wrote:
>> On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>>> This is exactly what needs to be done, full stop (how about: hstore).
>>> It really comes down to this: changing the serialization behaviors
>>> that have been in production for 2 releases (three if you count the
>>> extension) is bad enough, but making impossible some legal json
>>> constructions which are currently possible is an unacceptable
>>> compatibility break.  It's going to break applications I've currently
>>> put into production with no clear workaround.  This is quite frankly
>>> not ok and and I'm calling foul.  The RFC may claim that these
>>> constructions are dubious but that's irrelevant.  It's up to the
>>> parser to decide that and when serializing you are not in control of
>>> the parser.
>> The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something
tobe guaranteed?
 
> It doesn't, but the row_to_json function has a very clear mechanism of
> action.  And, 'not being documented' is not the standard for latitude
> to make arbitrary changes to existing function behaviors.
the whole hash*() function family was changed based on "not documented"
premise, so we do have a precedent .
>
>> In my experience, no JSON parser guarantees key order or duplication.
> I found one in about two seconds.  http://docs.python.org/2/library/json.html
>
> "object_pairs_hook, if specified will be called with the result of
> every JSON object decoded with an ordered list of pairs. The return
> value ofobject_pairs_hook will be used instead of the dict. This
> feature can be used to implement custom decoders that rely on the
> order that the key and value pairs are decoded (for example,
> collections.OrderedDict() will remember the order of insertion). If
> object_hook is also defined, the object_pairs_hooktakes priority."
>
> That makes the rest of your argument moot.  Plus, I quite clearly am
> dealing with parsers that do.
I am sure you could also devise an json encoding scheme
where white space is significant ;)

The question is, how much of it should json *type* support.

As discussed in other thread, most of your requirements
would be met by having json/row/row set-to-text serializer
functions which output json-formatted "text".

Then if you actually want to save this as easy to manipulate
json document, you can save this "text" to a field of type
"json", which does de-duplication and loses order.

So my suggestion is to upgrade existing json data type to
text - or maybe json_text with format check - when upgrading
to 9.4, to change current function which output "json"  to
output "text" and have new "json" type which stores proper
JavaScript Object - like structured data.

I would like to go a step further and have it automatically support
not only the json data types as data but all postgresql data types
by including type oid in the binary encoding, but this is probably not
something for "json" but rather for a new "pgdoc" data type in 9.5

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

David Johnston

Date:

14 November 2013, 20:26:05

Hannu Krosing-5 wrote
> On 11/14/2013 08:17 PM, Merlin Moncure wrote:
>> On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
>> <

> david@

> > wrote:
>>> On Nov 14, 2013, at 7:07 AM, Merlin Moncure <

> mmoncure@

> > wrote:
>>>
>>>> This is exactly what needs to be done, full stop (how about: hstore).
>>>> It really comes down to this: changing the serialization behaviors
>>>> that have been in production for 2 releases (three if you count the
>>>> extension) is bad enough, but making impossible some legal json
>>>> constructions which are currently possible is an unacceptable
>>>> compatibility break.  

The current json format is a minimally conforming (i.e., does not enforce
the "should not contain duplicates" suggestion) structured json validating
type that stores its input as-is once validated.  Its presence is going to
probably cause difficulties with function API for reasons already mentioned
but its place in core type-library is already firmly established.  Andrew's
API additions seem like good things to have for this type.  I haven't seen
any comments on this but do these functions facilitate creating json that
can have duplicates and that maintain order?  Even if we accept input to
json with these limitations we are not obligated to make our own json output
minimally conforming - though we should at maintain such if it is already in
place.

> So my suggestion is to upgrade existing json data type to
> text - or maybe json_text with format check - when upgrading
> to 9.4, to change current function which output "json"  to
> output "text" and have new "json" type which stores proper
> JavaScript Object - like structured data.

Technically a down-grade but anyway...

How does this work with a pg_dump/pg_restore upgrade?

If we want to have maximally conforming json type(s) we can still create
them.  I'd say we'd still want two versions, similar in a way to how we have
"bytea" and "text" even though any text can technically be stored like
"bytea".  The constructor API for both would want to be identical with the
only real difference being that text->json_source would be layout preserving
(i.e., validation only) while text->json_binary would be a true parsing
conversion.  Likewise json_source->text would output the same input while
json_binary->text would output the canonical form (pretty-printing and such
would need to be initiated via functions).

If things are going to be a little more complex anyway why not just go and
toss in the kitchen sink too?  This way we provide maximal flexibility. 
From a development perspective some features (indexes, equality, in-place
updates and related modification API) may only make sense on a subset of the
available types but trade-offs are a fact of life.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778406.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: additional json functionality

From

Merlin Moncure

Date:

15 November 2013, 14:35:17

On Thu, Nov 14, 2013 at 1:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> On 11/14/2013 08:17 PM, Merlin Moncure wrote:
>> On Thu, Nov 14, 2013 at 11:34 AM, David E. Wheeler
>> <david@justatheory.com> wrote:
>>> On Nov 14, 2013, at 7:07 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>>
>>>> This is exactly what needs to be done, full stop (how about: hstore).
>>>> It really comes down to this: changing the serialization behaviors
>>>> that have been in production for 2 releases (three if you count the
>>>> extension) is bad enough, but making impossible some legal json
>>>> constructions which are currently possible is an unacceptable
>>>> compatibility break.  It's going to break applications I've currently
>>>> put into production with no clear workaround.  This is quite frankly
>>>> not ok and and I'm calling foul.  The RFC may claim that these
>>>> constructions are dubious but that's irrelevant.  It's up to the
>>>> parser to decide that and when serializing you are not in control of
>>>> the parser.
>>> The current JSON type preserves key order and duplicates. But is it documented that this is a feature, or something
tobe guaranteed?

>> It doesn't, but the row_to_json function has a very clear mechanism of
>> action.  And, 'not being documented' is not the standard for latitude
>> to make arbitrary changes to existing function behaviors.
> the whole hash*() function family was changed based on "not documented"
> premise, so we do have a precedent .
>>
>>> In my experience, no JSON parser guarantees key order or duplication.
>> I found one in about two seconds.  http://docs.python.org/2/library/json.html
>>
>> "object_pairs_hook, if specified will be called with the result of
>> every JSON object decoded with an ordered list of pairs. The return
>> value ofobject_pairs_hook will be used instead of the dict. This
>> feature can be used to implement custom decoders that rely on the
>> order that the key and value pairs are decoded (for example,
>> collections.OrderedDict() will remember the order of insertion). If
>> object_hook is also defined, the object_pairs_hooktakes priority."
>>
>> That makes the rest of your argument moot.  Plus, I quite clearly am
>> dealing with parsers that do.
> I am sure you could also devise an json encoding scheme
> where white space is significant ;)
>
> The question is, how much of it should json *type* support.
>
> As discussed in other thread, most of your requirements
> would be met by having json/row/row set-to-text serializer
> functions which output json-formatted "text".

No, that would not work putting aside the fact it would require
rewriting heaps of code.  What I do now inside the json wrapping
routines is create things like

{ "x": [   {dynamic object},   {dynamic object},   ... ], "y": ..., ...
}

The only way to do it is to build 'dynamic object' into json in
advance of the outer xxx_to_json call.  The 'dynamic object' is
created out of a json builder that takes a paired array -- basically a
variant of Andrew's 'json_build' upthread.  If the 'json serializer'
outputted text, the 'outer' to_json call would then re-escape the
object.  I can't use hstore for that purpose precisely because of the
transformations it does on the object.

Stepping back, I'm using json serialization as a kind of 'supercharged
crosstab'.  To any client that can parse json, json serialization
completely displaces crosstabbing -- it's superior in every way.  I
am, if you may, kind of leading research efforts in the area and I can
tell you with absolute certainty that breaking this behavior is a
mistake.

Forcing hstore-ish output mechanisms removes the ability to handle
certain important edge cases that work just fine today. If that
ability was taken away, it would be a very bitter pill for me to
swallow and would have certain ramifications for me professionally; I
went out on a pretty big limb and pushed pg/json aggressively (over
strenuous objection) in an analytics product which is now in the final
stages of beta testing.  I would hate to see the conclusion of the
case study be "Ultimately we had to migrate the code back to Hibernate
due to compatibility issues".

Here are the options on the table:
1) convert existing json type to binary flavor (notwithstanding objections)
2) maintain side by side types, one representing binary, one text.
unfortunately, i think the text one must get the name 'json' due to
unfortunate previous decision.
3) merge the behaviors into a single type and get the best of both
worlds (as suggested upthread).

I think we need to take a *very* hard look at #3 before exploring #1
or #2: Haven't through it through yet but it may be possible to handle
this in such a way that will be mostly transparent to the end user and
may have other benefits such as a faster path for serialization.

merlin

Re: additional json functionality

From

"David E. Wheeler"

Date:

15 November 2013, 19:52:06

On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:

> Here are the options on the table:
> 1) convert existing json type to binary flavor (notwithstanding objections)
> 2) maintain side by side types, one representing binary, one text.
> unfortunately, i think the text one must get the name 'json' due to
> unfortunate previous decision.
> 3) merge the behaviors into a single type and get the best of both
> worlds (as suggested upthread).
>
> I think we need to take a *very* hard look at #3 before exploring #1
> or #2: Haven't through it through yet but it may be possible to handle
> this in such a way that will be mostly transparent to the end user and
> may have other benefits such as a faster path for serialization.

If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial (see
http://theory.so/pg/2013/10/23/testing-nested-hstore/and http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a
coupleof examples) --- without undue maintenance overhead, then great. 

I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at
$work.

Best,

David

Re: additional json functionality

From

Merlin Moncure

Date:

15 November 2013, 20:25:31

On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote:
> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>
>> Here are the options on the table:
>> 1) convert existing json type to binary flavor (notwithstanding objections)
>> 2) maintain side by side types, one representing binary, one text.
>> unfortunately, i think the text one must get the name 'json' due to
>> unfortunate previous decision.
>> 3) merge the behaviors into a single type and get the best of both
>> worlds (as suggested upthread).
>>
>> I think we need to take a *very* hard look at #3 before exploring #1
>> or #2: Haven't through it through yet but it may be possible to handle
>> this in such a way that will be mostly transparent to the end user and
>> may have other benefits such as a faster path for serialization.
>
> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial
(seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/
fora couple of examples) --- without undue maintenance overhead, then great. 
>
> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches at
$work.

Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:

*) 'json' type internally has a binary as well a text representation.
The text representation is basically the current type behavior
(duduplicated unordered).  The binary representation is the hstore-ish
variant.  The text mode is discarded when it's deemed no longer
appropriate to be needed, and, once gone, can never be rebuilt as it
was.

*) only the binary internal representation ever gets stored to disk
(or anything else).

*) the text mode is preferred for output if it is there.  otherwise, a
deduplicated, reordered text representation is generated

*) When literal text is casted to json, the binary structure is built
up and kept alongside binary mode.   So, if you went: 'select '{"a":
1,     "a": 2}'::json', you'd get the same thing back.  (This is how
it works now.).  but, if you went: 'insert into foo select '{"a": 1, "a": 2}'::json returning *', you'd get {"a": 2}
backessentially 
(although technically that would be a kind of race).

*) When the json is stored to table, the text representation gets
immediately discarded on the basis that it's no longer the true
representation of the data.

*) Ditto when making any equality operation (not as sure on this point).

*) Ditto when doing any operation that mutates the structure in any
way. the text representation is immutable except during serialization
and if it gets invalidated it gets destroyed.

*) New API function: json_simplify(); or some such.  It reorders and
dedups from user's point of view (but really just kills off the text
representation)

*) once the text mode is gone, you get basically the proposed 'hstore' behavior.

*) serialization functions are generally used in contexts that do not
store anything but get output as query data.  They create *only* the
text mode.  However, if the resultant json is stored anywhere, the
text mode is destroyed and replaced with binary variant.  This is both
a concession to the current behavior and an optimization of
'serialization-in-query' for which I think the binary mode is pessimal
performance wise.  so, xxx_to_json serialization functions work
exactly as they do now which fixes my problem essentially.

*) if you are unhappy with duplicates in the above, just get use to
calling  json_simpify() on the serialized json (or deal with in on the
client side).

This is all pretty glossy, but maybe there is a way forward...

merlin

Re: additional json functionality

From

Andrew Dunstan

Date:

15 November 2013, 20:38:06

On 11/15/2013 03:25 PM, Merlin Moncure wrote:
> On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote:
>> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>>> Here are the options on the table:
>>> 1) convert existing json type to binary flavor (notwithstanding objections)
>>> 2) maintain side by side types, one representing binary, one text.
>>> unfortunately, i think the text one must get the name 'json' due to
>>> unfortunate previous decision.
>>> 3) merge the behaviors into a single type and get the best of both
>>> worlds (as suggested upthread).
>>>
>>> I think we need to take a *very* hard look at #3 before exploring #1
>>> or #2: Haven't through it through yet but it may be possible to handle
>>> this in such a way that will be mostly transparent to the end user and
>>> may have other benefits such as a faster path for serialization.
>> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial
(seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/
fora couple of examples) --- without undue maintenance overhead, then great. 
>>
>> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches
at$work. 
> Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
>
> *) 'json' type internally has a binary as well a text representation.
> The text representation is basically the current type behavior
> (duduplicated unordered).  The binary representation is the hstore-ish
> variant.  The text mode is discarded when it's deemed no longer
> appropriate to be needed, and, once gone, can never be rebuilt as it
> was.
>
> *) only the binary internal representation ever gets stored to disk
> (or anything else).
>
> *) the text mode is preferred for output if it is there.  otherwise, a
> deduplicated, reordered text representation is generated
>
> *) When literal text is casted to json, the binary structure is built
> up and kept alongside binary mode.   So, if you went: 'select '{"a":
> 1,     "a": 2}'::json', you'd get the same thing back.  (This is how
> it works now.).  but, if you went: 'insert into foo select '{"a": 1,
>    "a": 2}'::json returning *', you'd get {"a": 2} back essentially
> (although technically that would be a kind of race).
>
> *) When the json is stored to table, the text representation gets
> immediately discarded on the basis that it's no longer the true
> representation of the data.
>
> *) Ditto when making any equality operation (not as sure on this point).
>
> *) Ditto when doing any operation that mutates the structure in any
> way. the text representation is immutable except during serialization
> and if it gets invalidated it gets destroyed.
>
> *) New API function: json_simplify(); or some such.  It reorders and
> dedups from user's point of view (but really just kills off the text
> representation)
>
> *) once the text mode is gone, you get basically the proposed 'hstore' behavior.
>
> *) serialization functions are generally used in contexts that do not
> store anything but get output as query data.  They create *only* the
> text mode.  However, if the resultant json is stored anywhere, the
> text mode is destroyed and replaced with binary variant.  This is both
> a concession to the current behavior and an optimization of
> 'serialization-in-query' for which I think the binary mode is pessimal
> performance wise.  so, xxx_to_json serialization functions work
> exactly as they do now which fixes my problem essentially.
>
> *) if you are unhappy with duplicates in the above, just get use to
> calling  json_simpify() on the serialized json (or deal with in on the
> client side).
>
> This is all pretty glossy, but maybe there is a way forward...
>


It's making my head hurt, to be honest, and it sounds like a recipe for
years and years of inconsistencies and bugs.

I don't want to have two types, but I think I'd probably rather have two
clean types than this. I can't imagine it being remotely acceptable to
have behaviour depend in whether or not something was ever stored, which
is what this looks like.

cheers

andrew




>
>

Re: additional json functionality

From

Merlin Moncure

Date:

15 November 2013, 20:50:10

On Fri, Nov 15, 2013 at 2:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>
> On 11/15/2013 03:25 PM, Merlin Moncure wrote:
>>
>> On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com>
>> wrote:
>>>
>>> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>>
>>>> Here are the options on the table:
>>>> 1) convert existing json type to binary flavor (notwithstanding
>>>> objections)
>>>> 2) maintain side by side types, one representing binary, one text.
>>>> unfortunately, i think the text one must get the name 'json' due to
>>>> unfortunate previous decision.
>>>> 3) merge the behaviors into a single type and get the best of both
>>>> worlds (as suggested upthread).
>>>>
>>>> I think we need to take a *very* hard look at #3 before exploring #1
>>>> or #2: Haven't through it through yet but it may be possible to handle
>>>> this in such a way that will be mostly transparent to the end user and
>>>> may have other benefits such as a faster path for serialization.
>>>
>>> If it’s possible to preserve order and still get the advantages of binary
>>> representation --- which are substantial (see
>>> http://theory.so/pg/2013/10/23/testing-nested-hstore/ and
>>> http://theory.so/pg/2013/10/25/indexing-nested-hstore/ for a couple of
>>> examples) --- without undue maintenance overhead, then great.
>>>
>>> I am completely opposed to duplicate key preservation in JSON, though. It
>>> has caused us a fair number of headaches at $work.
>>
>> Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
>>
>> *) 'json' type internally has a binary as well a text representation.
>> The text representation is basically the current type behavior
>> (duduplicated unordered).  The binary representation is the hstore-ish
>> variant.  The text mode is discarded when it's deemed no longer
>> appropriate to be needed, and, once gone, can never be rebuilt as it
>> was.
>>
>> *) only the binary internal representation ever gets stored to disk
>> (or anything else).
>>
>> *) the text mode is preferred for output if it is there.  otherwise, a
>> deduplicated, reordered text representation is generated
>>
>> *) When literal text is casted to json, the binary structure is built
>> up and kept alongside binary mode.   So, if you went: 'select '{"a":
>> 1,     "a": 2}'::json', you'd get the same thing back.  (This is how
>> it works now.).  but, if you went: 'insert into foo select '{"a": 1,
>>    "a": 2}'::json returning *', you'd get {"a": 2} back essentially
>> (although technically that would be a kind of race).
>>
>> *) When the json is stored to table, the text representation gets
>> immediately discarded on the basis that it's no longer the true
>> representation of the data.
>>
>> *) Ditto when making any equality operation (not as sure on this point).
>>
>> *) Ditto when doing any operation that mutates the structure in any
>> way. the text representation is immutable except during serialization
>> and if it gets invalidated it gets destroyed.
>>
>> *) New API function: json_simplify(); or some such.  It reorders and
>> dedups from user's point of view (but really just kills off the text
>> representation)
>>
>> *) once the text mode is gone, you get basically the proposed 'hstore'
>> behavior.
>>
>> *) serialization functions are generally used in contexts that do not
>> store anything but get output as query data.  They create *only* the
>> text mode.  However, if the resultant json is stored anywhere, the
>> text mode is destroyed and replaced with binary variant.  This is both
>> a concession to the current behavior and an optimization of
>> 'serialization-in-query' for which I think the binary mode is pessimal
>> performance wise.  so, xxx_to_json serialization functions work
>> exactly as they do now which fixes my problem essentially.
>>
>> *) if you are unhappy with duplicates in the above, just get use to
>> calling  json_simpify() on the serialized json (or deal with in on the
>> client side).
>>
>> This is all pretty glossy, but maybe there is a way forward...
>>
>
>
> It's making my head hurt, to be honest, and it sounds like a recipe for
> years and years of inconsistencies and bugs.
>
> I don't want to have two types, but I think I'd probably rather have two
> clean types than this. I can't imagine it being remotely acceptable to have
> behaviour depend in whether or not something was ever stored, which is what
> this looks like.

Well, maybe so.  My main gripe with the 'two types' solutions is that:
1) current type is already in core (that is, not an extension). In
hindsight, I think this was a huge mistake.
2) current type has grabbed the 'json' type name and the 'json_xxx' API.
3) current type is getting used all over the place

'Two types' means that (AIUI) you can't mess around with the existing
API too much. And the new type (due out in 2016?) will be something of
a second citizen.  The ramifications of dealing with the bifurcation
is what makes *my* head hurt.  Every day the json stuff is getting
more and more widely adopted.  9.4 isn't going to drop until 2014 best
case and it won't be widely deployed in the enterprise until 2015 and
beyond.  So you're going to have a huge code base operating on the
'legacy' json type.

merlin

Re: additional json functionality

From

Josh Berkus

Date:

15 November 2013, 20:55:04

On 11/15/2013 12:25 PM, Merlin Moncure wrote:
> Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
> 
> *) 'json' type internally has a binary as well a text representation.
> The text representation is basically the current type behavior

<snip long detailed explanation of behavior-dependant type>

That's not at all workable.  Users would be completely unable to predict
or understand the JSON type and how it acts.  That's not just violating
POLS; that's bashing POLS' head in with a shovel.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Merlin Moncure

Date:

15 November 2013, 20:56:58

On Fri, Nov 15, 2013 at 2:54 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 11/15/2013 12:25 PM, Merlin Moncure wrote:
>> Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
>>
>> *) 'json' type internally has a binary as well a text representation.
>> The text representation is basically the current type behavior
>
> <snip long detailed explanation of behavior-dependant type>
>
> That's not at all workable.  Users would be completely unable to predict
> or understand the JSON type and how it acts.  That's not just violating
> POLS; that's bashing POLS' head in with a shovel.

All right: make a new type then, and leave the current one alone please.

merlin

Re: additional json functionality

From

Andres Freund

Date:

15 November 2013, 20:57:54

On 2013-11-15 12:54:53 -0800, Josh Berkus wrote:
> On 11/15/2013 12:25 PM, Merlin Moncure wrote:
> > Kinda yes, kinda no.  Here's a rough sketch of what I'm thinking:
> > 
> > *) 'json' type internally has a binary as well a text representation.
> > The text representation is basically the current type behavior
> 
> <snip long detailed explanation of behavior-dependant type>
> 
> That's not at all workable.  Users would be completely unable to predict
> or understand the JSON type and how it acts.  That's not just violating
> POLS; that's bashing POLS' head in with a shovel.

It's also not currently possible to implement such a behaviour inside a
type's functions. You'd need core code cooperation.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: additional json functionality

From

"David E. Wheeler"

Date:

15 November 2013, 21:12:27

On Nov 15, 2013, at 12:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

> It's making my head hurt, to be honest, and it sounds like a recipe for years and years of inconsistencies and bugs.
>
> I don't want to have two types, but I think I'd probably rather have two clean types than this. I can't imagine it
beingremotely acceptable to have behaviour depend in whether or not something was ever stored, which is what this looks
like.

I disklike having two types (no, three -- there is hstore, too!). But if there is consensus for it (and I am not at all
convincedthat there is at this point), I can live with it. Docs would have to be pretty explicit, though. 

David

Re: additional json functionality

From

Josh Berkus

Date:

15 November 2013, 21:18:29

On 11/15/2013 01:12 PM, David E. Wheeler wrote:
> On Nov 15, 2013, at 12:37 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> 
>> It's making my head hurt, to be honest, and it sounds like a recipe for years and years of inconsistencies and
bugs.
>>
>> I don't want to have two types, but I think I'd probably rather have two clean types than this. I can't imagine it
beingremotely acceptable to have behaviour depend in whether or not something was ever stored, which is what this looks
like.
> 
> I disklike having two types (no, three -- there is hstore, too!). But if there is consensus for it (and I am not at
allconvinced that there is at this point), I can live with it. Docs would have to be pretty explicit, though.

I would be happy to do a survey on how common key ordering and/or
duplicate keys are in postgresql+json.  However, I'm not clear on what
set of survey responses would decide us in either direction.  Even as a
pool of one, Merlin's case is a pretty persuasive example ... and, as he
points out, there will be applications built around 9.3's JSON which
havent even been written yet.

I believe this was a danger we recognized when we added the JSON type,
including the possibility that a future binary type might need to be a
separate type due to compatibility issues.  The only sad thing is the
naming; it would be better for the new type to carry the JSON name in
the future, but there's no way to make that work that I can think of.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

David Johnston

Date:

15 November 2013, 21:20:18

Merlin Moncure-2 wrote
>> I don't want to have two types, but I think I'd probably rather have two
>> clean types than this. I can't imagine it being remotely acceptable to
>> have
>> behaviour depend in whether or not something was ever stored, which is
>> what
>> this looks like.
> 
> Well, maybe so.  My main gripe with the 'two types' solutions is that:
> 1) current type is already in core (that is, not an extension). In
> hindsight, I think this was a huge mistake.
> 2) current type has grabbed the 'json' type name and the 'json_xxx' API.
> 3) current type is getting used all over the place
> 
> 'Two types' means that (AIUI) you can't mess around with the existing
> API too much. And the new type (due out in 2016?) will be something of
> a second citizen.  The ramifications of dealing with the bifurcation
> is what makes *my* head hurt.  Every day the json stuff is getting
> more and more widely adopted.  9.4 isn't going to drop until 2014 best
> case and it won't be widely deployed in the enterprise until 2015 and
> beyond.  So you're going to have a huge code base operating on the
> 'legacy' json type.
> 
> merlin

The current type can store the exact same data as what a hash-like type
could store.  It can also store stuff a hash-like type would not be able to
store.  From my reading the main reason for adding the new hash-like type
would be to increase the performance characteristics of using said type. So:

1) if reasonable performance can be had with the current type the new type
would be unnecessary
2) if #1 is not possible then the new type trades of leniency in format for
performance improvements

One implication of #2 is that existing json that wants the improved
performance will need to undergo a full-table rewrite in order to be
converted.

Both output textual representations are identical and function overloading
and API should be able to maintained substantially identical between the two
types.

David J



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778628.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: additional json functionality

From

"ktm@rice.edu"

Date:

15 November 2013, 21:47:11

On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote:
> 
> I believe this was a danger we recognized when we added the JSON type,
> including the possibility that a future binary type might need to be a
> separate type due to compatibility issues.  The only sad thing is the
> naming; it would be better for the new type to carry the JSON name in
> the future, but there's no way to make that work that I can think of.
> 
> -- 
> Josh Berkus
> PostgreSQL Experts Inc.
> http://pgexperts.com
> 

What about a GUC for json version? Then you could choose and they
could both be call json.

Regards,
Ken

Re: additional json functionality

From

Tom Lane

Date:

15 November 2013, 21:53:42

"ktm@rice.edu" <ktm@rice.edu> writes:
> On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote:
>> I believe this was a danger we recognized when we added the JSON type,
>> including the possibility that a future binary type might need to be a
>> separate type due to compatibility issues.  The only sad thing is the
>> naming; it would be better for the new type to carry the JSON name in
>> the future, but there's no way to make that work that I can think of.

> What about a GUC for json version? Then you could choose and they
> could both be call json.

GUCs that change user-visible semantics have historically proven to be
much less good ideas than they seem at first glance.
        regards, tom lane

Re: additional json functionality

From

Andrew Dunstan

Date:

15 November 2013, 22:02:23

On 11/15/2013 04:53 PM, Tom Lane wrote:
> "ktm@rice.edu" <ktm@rice.edu> writes:
>> On Fri, Nov 15, 2013 at 01:18:22PM -0800, Josh Berkus wrote:
>>> I believe this was a danger we recognized when we added the JSON type,
>>> including the possibility that a future binary type might need to be a
>>> separate type due to compatibility issues.  The only sad thing is the
>>> naming; it would be better for the new type to carry the JSON name in
>>> the future, but there's no way to make that work that I can think of.
>> What about a GUC for json version? Then you could choose and they
>> could both be call json.
> GUCs that change user-visible semantics have historically proven to be
> much less good ideas than they seem at first glance.
>
>             


Yeah, it would be a total foot gun here I think.

I've come to the conclusion that the only possible solution is to have a 
separate type. That's a bit sad, but there it is. The upside is that 
this will make the work Teodor has mentioned simpler. (Desperately 
making lemonade from lemons here.)


cheers

andrew

Re: additional json functionality

From

"David E. Wheeler"

Date:

15 November 2013, 22:16:51

On Nov 15, 2013, at 2:02 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

> Yeah, it would be a total foot gun here I think.
>
> I've come to the conclusion that the only possible solution is to have a separate type. That's a bit sad, but there
itis. The upside is that this will make the work Teodor has mentioned simpler. (Desperately making lemonade from lemons
here.)

Fine. My bikeshedding: Call the new type "jsonb". “B” for “binary.” Also, the old one is implicitly "jsona". Get it?

David

Re: additional json functionality

From

Hannu Krosing

Date:

15 November 2013, 22:33:24

On 11/15/2013 09:25 PM, Merlin Moncure wrote:
> On Fri, Nov 15, 2013 at 1:51 PM, David E. Wheeler <david@justatheory.com> wrote:
>> On Nov 15, 2013, at 6:35 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
>>
>>> Here are the options on the table:
>>> 1) convert existing json type to binary flavor (notwithstanding objections)
>>> 2) maintain side by side types, one representing binary, one text.
>>> unfortunately, i think the text one must get the name 'json' due to
>>> unfortunate previous decision.
>>> 3) merge the behaviors into a single type and get the best of both
>>> worlds (as suggested upthread).
>>>
>>> I think we need to take a *very* hard look at #3 before exploring #1
>>> or #2: Haven't through it through yet but it may be possible to handle
>>> this in such a way that will be mostly transparent to the end user and
>>> may have other benefits such as a faster path for serialization.
>> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial
(seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/
fora couple of examples) --- without undue maintenance overhead, then great.
 
>>
>> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches
at$work.
 
Let's just  change the current json-constructing functions return type to
json_text which is exactly like text with 2 extra properties:

1) it is syntax-checked for valid json (that is it can be cast to json)

and

2) if included in outer json as data, it is included directly and is not
quoted like text


With just these two it should possible to have the following

a) Merlin and others can keep (ab)using json_text as this   wonderfully versatile format for feeding json parsers and
visualiserswhich accept duplicates and consider order significant
 

b) cast this to binary json object if de-duplication and fast access to
internals is needed

I do not think we need anything else for this

As far as I understand merlin is mostly ok with stored json being
normalised and the problem is just with constructing "extended"
json (a.k.a. "processing instructions") to be used as source for
specialised parsers and renderers.

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

Merlin Moncure

Date:

15 November 2013, 22:59:12

On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>>>> I think we need to take a *very* hard look at #3 before exploring #1
>>>> or #2: Haven't through it through yet but it may be possible to handle
>>>> this in such a way that will be mostly transparent to the end user and
>>>> may have other benefits such as a faster path for serialization.
>>> If it’s possible to preserve order and still get the advantages of binary representation --- which are substantial
(seehttp://theory.so/pg/2013/10/23/testing-nested-hstore/ and http://theory.so/pg/2013/10/25/indexing-nested-hstore/
fora couple of examples) --- without undue maintenance overhead, then great. 
>>>
>>> I am completely opposed to duplicate key preservation in JSON, though. It has caused us a fair number of headaches
at$work. 
> Let's just  change the current json-constructing functions return type to
> json_text which is exactly like text with 2 extra properties:
>
> 1) it is syntax-checked for valid json (that is it can be cast to json)
>
> and
>
> 2) if included in outer json as data, it is included directly and is not
> quoted like text
>
>
> With just these two it should possible to have the following
>
> a) Merlin and others can keep (ab)using json_text as this
>     wonderfully versatile format for feeding json parsers and
>     visualisers which accept duplicates and consider order significant
>
> b) cast this to binary json object if de-duplication and fast access to
> internals is needed
>
> I do not think we need anything else for this

I think you may be on to something here.  This might also be a way
opt-in to fast(er) serialization (upthread it was noted this is
unimportant; I'm skeptical).  I deeply feel that two types is not the
right path but I'm pretty sure that this can be finessed.

> As far as I understand merlin is mostly ok with stored json being
> normalised and the problem is just with constructing "extended"
> json (a.k.a. "processing instructions") to be used as source for
> specialised parsers and renderers.

yes, this is correct.

merlin

Re: additional json functionality

From

Josh Berkus

Date:

15 November 2013, 23:15:34

On 11/15/2013 02:59 PM, Merlin Moncure wrote:
>  On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> I think you may be on to something here.  This might also be a way
> opt-in to fast(er) serialization (upthread it was noted this is
> unimportant; I'm skeptical).  I deeply feel that two types is not the
> right path but I'm pretty sure that this can be finessed.
> 
>> As far as I understand merlin is mostly ok with stored json being
>> normalised and the problem is just with constructing "extended"
>> json (a.k.a. "processing instructions") to be used as source for
>> specialised parsers and renderers.

Thing is, I'm not particularly concerned about *Merlin's* specific use
case, which there are ways around. What I am concerned about is that we
may have users who have years of data stored in JSON text fields which
won't survive an upgrade to binary JSON, because we will stop allowing
certain things (ordering, duplicate keys) which are currently allowed in
those columns.  At the very least, if we're going to have that kind of
backwards compatibilty break we'll want to call the new version 10.0.

That's why naming old JSON as "json_text" won't work; it'll be a
hardened roadblock to upgrading.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

David Johnston

Date:

15 November 2013, 23:56:30

Josh Berkus wrote
> On 11/15/2013 02:59 PM, Merlin Moncure wrote:
>>  On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing <

> hannu@

> > wrote:
>> I think you may be on to something here.  This might also be a way
>> opt-in to fast(er) serialization (upthread it was noted this is
>> unimportant; I'm skeptical).  I deeply feel that two types is not the
>> right path but I'm pretty sure that this can be finessed.
>> 
>>> As far as I understand merlin is mostly ok with stored json being
>>> normalised and the problem is just with constructing "extended"
>>> json (a.k.a. "processing instructions") to be used as source for
>>> specialised parsers and renderers.
> 
> Thing is, I'm not particularly concerned about *Merlin's* specific use
> case, which there are ways around. What I am concerned about is that we
> may have users who have years of data stored in JSON text fields which
> won't survive an upgrade to binary JSON, because we will stop allowing
> certain things (ordering, duplicate keys) which are currently allowed in
> those columns.  At the very least, if we're going to have that kind of
> backwards compatibilty break we'll want to call the new version 10.0.
> 
> That's why naming old JSON as "json_text" won't work; it'll be a
> hardened roadblock to upgrading.

Agreed.  I can't imagine a use-case that would warrant breaking the current
behavior of "json".  Either we live with just one, text-oriented, json type
and "finesse" whatever performance gains we can without breaking
compatibility; or we introduce additional types (I personally like adding 2
instead of one but just adding the binary one would be ok) which - barring
an overwhelming desire by -core to group-self-flagellate - means giving the
new type an as yet unused name.

From a marketing perspective having 3 types with the following properties is
an easy message to sell:

1) json - liberal interpretation w/ validation only; stored as text; output
as-is
2) json_text - strict interpretation w/ validation only; stored as text;
output as-is
3) json_binary - strict interpretation w/ validation & parsing; stored as
binary; output "normalized"

This way "json" seems less like a mistake but rather an intentional desire
to introduce a liberal type that meets data exchange needs in the short term
and now, later, a structured data storage mechanism similar to "hstore".

Even if you have json_binary I can imaging that some people would want to be
able to store the original strict json as-is.  Sure, they can use text, but
this way intent is made clear and validation is attached directly to the
type as opposed to having to be done separately.  The use-cases described
for needing a liberal "json" prove this out.  That said "json" would be an
acceptable replacement for "json_text" in many cases and separate validation
for "strict json" prior to storing into "json" isn't that heinous.

David J.

--
View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778655.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: additional json functionality

From

David Johnston

Date:

16 November 2013, 00:00:13

Looking at this a different way: could we just implement BSON and leave json
alone?

http://bsonspec.org/

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778656.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: additional json functionality

From

Josh Berkus

Date:

16 November 2013, 00:32:38

On 11/15/2013 04:00 PM, David Johnston wrote:
> Looking at this a different way: could we just implement BSON and leave json
> alone?
> 
> http://bsonspec.org/

In short?  No.

For one thing, our storage format is different from theirs (better,
frankly), and as a result is not compliant with their "standard".

That's a reason why we won't use the name BSON, either, since it's a
trademark of 10gen.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Andrew Dunstan

Date:

16 November 2013, 00:44:18

On 11/15/2013 07:32 PM, Josh Berkus wrote:
> On 11/15/2013 04:00 PM, David Johnston wrote:
>> Looking at this a different way: could we just implement BSON and leave json
>> alone?
>>
>> http://bsonspec.org/
> In short?  No.
>
> For one thing, our storage format is different from theirs (better,
> frankly), and as a result is not compliant with their "standard".
>
> That's a reason why we won't use the name BSON, either, since it's a
> trademark of 10gen.
>


What is more, it has restrictions which we do not wish to have. See for 
example its treatment of numerics.

cheers

andrew

Re: additional json functionality

From

Hannu Krosing

Date:

16 November 2013, 20:06:38

On 11/16/2013 12:15 AM, Josh Berkus wrote:
> On 11/15/2013 02:59 PM, Merlin Moncure wrote:
>>  On Fri, Nov 15, 2013 at 4:31 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>> I think you may be on to something here.  This might also be a way
>> opt-in to fast(er) serialization (upthread it was noted this is
>> unimportant; I'm skeptical).  I deeply feel that two types is not the
>> right path but I'm pretty sure that this can be finessed.
>>
>>> As far as I understand merlin is mostly ok with stored json being
>>> normalised and the problem is just with constructing "extended"
>>> json (a.k.a. "processing instructions") to be used as source for
>>> specialised parsers and renderers.
> Thing is, I'm not particularly concerned about *Merlin's* specific use
> case, which there are ways around. What I am concerned about is that we
> may have users who have years of data stored in JSON text fields which
> won't survive an upgrade to binary JSON, because we will stop allowing
> certain things (ordering, duplicate keys) which are currently allowed in
> those columns.  At the very least, if we're going to have that kind of
> backwards compatibilty break we'll want to call the new version 10.0.
>
> That's why naming old JSON as "json_text" won't work; it'll be a
> hardened roadblock to upgrading.
Then perhaps name the "new binary json" as jsob (JavaScript Object Binary)
or just jsobj (JavaScript Object) and keep current json for what it is,
namely
JavaScript Object Notation.

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

"David E. Wheeler"

Date:

16 November 2013, 21:31:03

On Nov 16, 2013, at 12:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

> Then perhaps name the "new binary json" as jsob (JavaScript Object Binary)
> or just jsobj (JavaScript Object) and keep current json for what it is,
> namely
> JavaScript Object Notation.

It’s still input and output as JSON, though. I still like JSONB best.

David

Re: additional json functionality

From

Hannu Krosing

Date:

16 November 2013, 22:04:35

On 11/16/2013 10:30 PM, David E. Wheeler wrote:
> On Nov 16, 2013, at 12:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>
>> Then perhaps name the "new binary json" as jsob (JavaScript Object Binary)
>> or just jsobj (JavaScript Object) and keep current json for what it is,
>> namely
>> JavaScript Object Notation.
> It’s still input and output as JSON, though.
Yes, because JavaScript Object Notation *is* a serialization format
(aka Notation) for converting JavaScript Objects to text format
and back :)
>  I still like JSONB best.
To me it feels redundant, like binarytextbinary

the binary representation of JSON is JavaScript(-like) Object, not
"binary json"

So my vote would be either jsobj or jsdoc (as "document databases") tend
to call the structured types "documents"

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

Josh Berkus

Date:

17 November 2013, 00:13:31

On 11/16/2013 02:04 PM, Hannu Krosing wrote:
> On 11/16/2013 10:30 PM, David E. Wheeler wrote:
>>  I still like JSONB best.
> To me it feels redundant, like binarytextbinary
> 
> the binary representation of JSON is JavaScript(-like) Object, not
> "binary json"

JSONB is as close as we can get to "JSON", and it gives people the idea
that this is the successor type to JSON.  So +1 from me as well on "JSONB".

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

"David E. Wheeler"

Date:

17 November 2013, 20:03:16

On Nov 16, 2013, at 2:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:

>> It’s still input and output as JSON, though.
> Yes, because JavaScript Object Notation *is* a serialization format
> (aka Notation) for converting JavaScript Objects to text format
> and back :)
>> I still like JSONB best.
> To me it feels redundant, like binarytextbinary
>
> the binary representation of JSON is JavaScript(-like) Object, not
> "binary json"
>
> So my vote would be either jsobj or jsdoc (as "document databases") tend
> to call the structured types "documents"

You know that both types support scalar values right? 'a'::JSON works now, and 'a'::hstore works with the WIP patch.
Forthat reason I would not think that "doc" or "obj" would be good choices. 

I like JSONB because:

1. The "B" means "binary"
2. The "B" means "second"
3. It's short
4. See also BYTEA.

Best,

David

Re: additional json functionality

From

David Johnston

Date:

17 November 2013, 20:45:39

David E. Wheeler-3 wrote
> I like JSONB because:
> 
> 1. The "B" means "binary"
> 2. The "B" means "second"
> 3. It's short
> 4. See also BYTEA.

"json_strict" :

Not sure about the "bytea" reference off-hand...

I was pondering "jsons" which meets the short property just fine and the
trailing "s" would stand for "strict" which is the user-visible semantic
that this type exhibits rather than some less-visible "binary" attribute
which most users would not really care about.  I dislike the implication of
plural-ness that the "s" imparts, though.

Implication of "second" doesn't seem that important since both types provide
useful semantics.

I can imagine where the short aspect will lead people to accidentally type
"json" where they mean to use "jsonb" and having a just a single extra
character will increase the likelihood they will not notice.  Knowing about
and having used "json_strict" previously it will be more probable that such
users will noticeably feel something is missing if they drop the whole
"_strict" suffix.

So, I'll toss out "json_strict" for my bikeshed contribution.

David J.



--
View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778770.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: additional json functionality

From

Dimitri Fontaine

Date:

17 November 2013, 21:51:46

"David E. Wheeler" <david@justatheory.com> writes:
> You know that both types support scalar values right? 'a'::JSON works now,
> and 'a'::hstore works with the WIP patch. For that reason I would not think
> that "doc" or "obj" would be good choices.

I'm wondering about just pushing hstore in core (even if technically
still an extension, install it by default, like we do for PLpgSQL), and
calling it a day.

If you need pre-9.4 JSON-is-text compatibility, use the json datatype,
if you want something with general index support, use hstore.

For bikeshedding purposes, what about calling it jstore, as in “we
actually know how to store your json documents”?

Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: additional json functionality

From

Hannu Krosing

Date:

17 November 2013, 22:01:35

On 11/17/2013 09:02 PM, David E. Wheeler wrote:
> On Nov 16, 2013, at 2:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>
>>> It’s still input and output as JSON, though.
>> Yes, because JavaScript Object Notation *is* a serialization format
>> (aka Notation) for converting JavaScript Objects to text format
>> and back :)
>>> I still like JSONB best.
>> To me it feels redundant, like binarytextbinary
>>
>> the binary representation of JSON is JavaScript(-like) Object, not
>> "binary json"
>>
>> So my vote would be either jsobj or jsdoc (as "document databases") tend
>> to call the structured types "documents"
> You know that both types support scalar values right? 
> 'a'::JSON works now, 
Yeah, and I remember all the bikeshedding about how
scalars should not be supported as they are
"not really JSON" by standard ...

At that time I was also quite vocal about not painting
ourselves in corner by not normalising json on input and
thus generating a backwards compatibility problem in
case we would ever get "proper" json support.
> and 'a'::hstore works with the WIP patch. For that reason I would not think that "doc" or "obj" would be good
choices.
this is like claiming that text should not be text because you
can store a single character there as well.

I feel that both "doc" and "obj" convey the meaning that it is a
structured type meant for fast component lookup as opposed to
jsoN(otation) type which is text.

Also jsdoc/jsobj would be a natural bridge to pgdoc/pgobj which would be
similar to "new json" but allow any type supported by postgresql as a value.

(... and in several languages even scalars really are objects)
>
> I like JSONB because:
>
> 1. The "B" means "binary"
"Binary" has really little to do with the fact that we
normalise on input, which is the real significant feature
of the new json type.
> 2. The "B" means "second"
Why not just json2 , (you know, like varchar2 in a certain other database ;)
> 3. It's short
jsobj and jsdoc are exactly as short as jsonb
> 4. See also BYTEA.
BYTEA is "byte array", so not really relevant.

(unless you try to rhyme a byte-a, json-b sequence ;) )

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

Hannu Krosing

Date:

17 November 2013, 22:15:28

On 11/17/2013 10:51 PM, Dimitri Fontaine wrote:
> "David E. Wheeler" <david@justatheory.com> writes:
>> You know that both types support scalar values right? 'a'::JSON works now,
>> and 'a'::hstore works with the WIP patch. For that reason I would not think
>> that "doc" or "obj" would be good choices.
> I'm wondering about just pushing hstore in core (even if technically
> still an extension, install it by default, like we do for PLpgSQL), and
> calling it a day.
>
> If you need pre-9.4 JSON-is-text compatibility, use the json datatype,
> if you want something with general index support, use hstore.
+1 for getting also hstore in

I think hstore needs to keep its text format compatible with older hstore
(in this discussion lets call this text format "hson", short for
HStore Object Notation for added confusion :)
>
> For bikeshedding purposes, what about calling it jstore, 
+1 for jstore as well. I am happy with jstore, jsdoc, jsobj

jstore/jsobj/jsdoc really is *not* JSON, but a bona-fide freeform
structured datatype that happens to have JSON as convenient
I/O format.

You may want to use jstore even if you have never needed
JSON as serialisation/transport format before.

I do not like jsonB (sounds too much like json2, i.e. like we were
trying to cover up a design accident) nor json_strict (as this is not
really strict as it indeed does accept scalars, not just Arrays/Lists
and Objects/Dictionaries as per JSON standard)
> as in “we actually know how to store your json documents”?
>
> Regards,

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

"David E. Wheeler"

Date:

17 November 2013, 22:20:28

On Nov 17, 2013, at 1:51 PM, Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:

> I'm wondering about just pushing hstore in core (even if technically
> still an extension, install it by default, like we do for PLpgSQL), and
> calling it a day.

It’s syntax is different than JSON, so one would need to convert to and from JSON all the time to parse and serialize.
PITA.

> For bikeshedding purposes, what about calling it jstore, as in “we
> actually know how to store your json documents”?

-1 Sounds like a Java storage API.

David

Re: additional json functionality

From

Dimitri Fontaine

Date:

17 November 2013, 22:26:50

"David E. Wheeler" <david@justatheory.com> writes:
> On Nov 17, 2013, at 1:51 PM, Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:
>> I'm wondering about just pushing hstore in core (even if technically
>> still an extension, install it by default, like we do for PLpgSQL), and
>> calling it a day.
>
> It’s syntax is different than JSON, so one would need to convert to
> and from JSON all the time to parse and serialize. PITA.

Oh I misremembered about that, I though it would take JSON as input
as-is and could be made to output JSON. And IIRC the community input at
pgconf.eu has been to just always output json texts and get rid of the
formating GUCs.

Now, if it turns out that the new hstore is not dealing with json input
and output, we could have json, jstore and hstore.

--
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: additional json functionality

From

"David E. Wheeler"

Date:

17 November 2013, 22:28:45

On Nov 17, 2013, at 2:26 PM, Dimitri Fontaine <dimitri@2ndQuadrant.fr> wrote:

>> It’s syntax is different than JSON, so one would need to convert to
>> and from JSON all the time to parse and serialize. PITA.
>
> Oh I misremembered about that, I though it would take JSON as input
> as-is and could be made to output JSON. And IIRC the community input at
> pgconf.eu has been to just always output json texts and get rid of the
> formating GUCs.

Yeah, but for back-compate, it has to use => instead of : to separate keys from values, and cannot use braces for a
root-levelobject. :-( 

> Now, if it turns out that the new hstore is not dealing with json input
> and output, we could have json, jstore and hstore.

That's where this is headed, yes.

David

Re: additional json functionality

From

Andrew Dunstan

Date:

17 November 2013, 22:31:34

On 11/17/2013 04:51 PM, Dimitri Fontaine wrote:
> "David E. Wheeler" <david@justatheory.com> writes:
>> You know that both types support scalar values right? 'a'::JSON works now,
>> and 'a'::hstore works with the WIP patch. For that reason I would not think
>> that "doc" or "obj" would be good choices.
> I'm wondering about just pushing hstore in core (even if technically
> still an extension, install it by default, like we do for PLpgSQL), and
> calling it a day.

That would be one of the silliest and most short-sighted decisions we 
have made in many years, IMNSHO. The demand for strong JSON support is 
enormous. I don't think I have ever received as many positive comments 
on any other feature I have worked on in the last 9 years. What these 
people want is not something jsonish, they want json, pure and simple. 
And they want it fast and featured and efficient.

Much as I love hstore, it isn't json, and so it won't satisfy that demand.

cheers

andrew

Re: additional json functionality

From

Dimitri Fontaine

Date:

17 November 2013, 22:44:35

Andrew Dunstan <andrew@dunslane.net> writes:
> That would be one of the silliest and most short-sighted decisions we have
> made in many years, IMNSHO. The demand for strong JSON support is enormous.

One of the silliest and most short-sighted decisions we made recently
might have been to actually ship that json variant in 9.2, after all.

The most popular PostgreSQL commands in 9.4 are going to be:
 $ sudo apt-get install postgresql-contrib-9.4
 # create extension jstore; # alter table foo alter column documents type jstore; # create index on foo using
gist(documents);

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr     PostgreSQL : Expertise, Formation et Support

Re: additional json functionality

From

Gavin Flower

Date:

17 November 2013, 22:48:54

On 18/11/13 09:02, David E. Wheeler wrote:
> On Nov 16, 2013, at 2:04 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>
>>> It’s still input and output as JSON, though.
>> Yes, because JavaScript Object Notation *is* a serialization format
>> (aka Notation) for converting JavaScript Objects to text format
>> and back :)
>>> I still like JSONB best.
>> To me it feels redundant, like binarytextbinary
>>
>> the binary representation of JSON is JavaScript(-like) Object, not
>> "binary json"
>>
>> So my vote would be either jsobj or jsdoc (as "document databases") tend
>> to call the structured types "documents"
> You know that both types support scalar values right? 'a'::JSON works now, and 'a'::hstore works with the WIP patch.
Forthat reason I would not think that "doc" or "obj" would be good choices.
 
>
> I like JSONB because:
>
> 1. The "B" means "binary"
> 2. The "B" means "second"
> 3. It's short
> 4. See also BYTEA.
>
> Best,
>
> David
>
>
>      
>
Whatever, I think the first 4 characters have to 'JSON' - for easy 
identification.


Cheers,
Gavin

Re: additional json functionality

From

Gavin Flower

Date:

17 November 2013, 22:51:48

On 18/11/13 09:45, David Johnston wrote:
> David E. Wheeler-3 wrote
>> I like JSONB because:
>>
>> 1. The "B" means "binary"
>> 2. The "B" means "second"
>> 3. It's short
>> 4. See also BYTEA.
> "json_strict" :
>
> Not sure about the "bytea" reference off-hand...
>
> I was pondering "jsons" which meets the short property just fine and the
> trailing "s" would stand for "strict" which is the user-visible semantic
> that this type exhibits rather than some less-visible "binary" attribute
> which most users would not really care about.  I dislike the implication of
> plural-ness that the "s" imparts, though.
>
> Implication of "second" doesn't seem that important since both types provide
> useful semantics.
>
> I can imagine where the short aspect will lead people to accidentally type
> "json" where they mean to use "jsonb" and having a just a single extra
> character will increase the likelihood they will not notice.  Knowing about
> and having used "json_strict" previously it will be more probable that such
> users will noticeably feel something is missing if they drop the whole
> "_strict" suffix.
>
> So, I'll toss out "json_strict" for my bikeshed contribution.
>
> David J.
>   
>
>
>
> --
> View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5778770.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
+1

Re: additional json functionality

From

Andrew Dunstan

Date:

17 November 2013, 23:20:56

On 11/17/2013 05:44 PM, Dimitri Fontaine wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> That would be one of the silliest and most short-sighted decisions we have
>> made in many years, IMNSHO. The demand for strong JSON support is enormous.
> One of the silliest and most short-sighted decisions we made recently
> might have been to actually ship that json variant in 9.2, after all.

The fact is that we had been going round and round on Json for a while, 
and Robert Haas rightly made a move to break the Gordian knot. We would 
not have done ourselves any service by not accepting it.

Hindsight is always 20-20, but even with what we have today people get 
excited, and more people move to use Postgres every day because we have 
that support.

>
> The most popular PostgreSQL commands in 9.4 are going to be:
>
>    $ sudo apt-get install postgresql-contrib-9.4
>
>    # create extension jstore;
>    # alter table foo alter column documents type jstore;
>    # create index on foo using gist(documents);
>

Umm, not if I have anything to do with it.

cheers

andrew

Re: additional json functionality

From

Josh Berkus

Date:

18 November 2013, 01:49:31

> Now, if it turns out that the new hstore is not dealing with json input
> and output, we could have json, jstore and hstore.

Jstore isn't the worst name suggestion I've heard on this thread.  The
reason I prefer JSONB though, is that a new user looking for a place to
put JSON data will clearly realize that JSON and JSONB are alternatives
and related in some way.  They won't necessarily expect that "jstore"
has anything to do with JSON, especially when there is another type
called "JSON".  Quite a few people are liable to think it's something to
do with Java.

Besides, we might get sued by these people: http://www.jstor.org/  ;-)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

"David E. Wheeler"

Date:

18 November 2013, 01:52:18

On Nov 17, 2013, at 5:49 PM, Josh Berkus <josh@agliodbs.com> wrote:

> Jstore isn't the worst name suggestion I've heard on this thread.  The
> reason I prefer JSONB though, is that a new user looking for a place to
> put JSON data will clearly realize that JSON and JSONB are alternatives
> and related in some way.  They won't necessarily expect that "jstore"
> has anything to do with JSON, especially when there is another type
> called "JSON".  Quite a few people are liable to think it's something to
> do with Java.
> 
> Besides, we might get sued by these people: http://www.jstor.org/  ;-)

Okay, how about JDATE? ;-P

David

Re: additional json functionality

From

Gavin Flower

Date:

18 November 2013, 02:01:39

On 18/11/13 14:51, David E. Wheeler wrote:
> On Nov 17, 2013, at 5:49 PM, Josh Berkus <josh@agliodbs.com> wrote:
>
>> Jstore isn't the worst name suggestion I've heard on this thread.  The
>> reason I prefer JSONB though, is that a new user looking for a place to
>> put JSON data will clearly realize that JSON and JSONB are alternatives
>> and related in some way.  They won't necessarily expect that "jstore"
>> has anything to do with JSON, especially when there is another type
>> called "JSON".  Quite a few people are liable to think it's something to
>> do with Java.
>>
>> Besides, we might get sued by these people: http://www.jstor.org/  ;-)
> Okay, how about JDATE? ;-P
>
> David
>
>
>
I don't want a Japanese Date - would cause complications with my wife!  :-)


Cheers,
Gavin

Re: additional json functionality

From

Andrew Dunstan

Date:

18 November 2013, 04:19:14

On 11/17/2013 08:49 PM, Josh Berkus wrote:
>> Now, if it turns out that the new hstore is not dealing with json input
>> and output, we could have json, jstore and hstore.
> Jstore isn't the worst name suggestion I've heard on this thread.  The
> reason I prefer JSONB though, is that a new user looking for a place to
> put JSON data will clearly realize that JSON and JSONB are alternatives
> and related in some way.  They won't necessarily expect that "jstore"
> has anything to do with JSON, especially when there is another type
> called "JSON".  Quite a few people are liable to think it's something to
> do with Java.
>
> Besides, we might get sued by these people: http://www.jstor.org/  ;-)
>

I don't think any name that doesn't begin with "json" is acceptable. I 
could live with "jsonb". It has the merit of brevity, but maybe it's a 
tad too close to "json" to be the right answer.

cheers

andrew

Re: additional json functionality

From

"David E. Wheeler"

Date:

18 November 2013, 04:21:22

On Nov 17, 2013, at 8:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote:

> I don't think any name that doesn't begin with "json" is acceptable. I could live with "jsonb". It has the merit of
brevity,but maybe it's a tad too close to "json" to be the right answer. 

JSONFTW.

David

Re: additional json functionality

From

Hannu Krosing

Date:

18 November 2013, 09:02:49

On 11/18/2013 05:19 AM, Andrew Dunstan wrote:
>
> On 11/17/2013 08:49 PM, Josh Berkus wrote:
>>> Now, if it turns out that the new hstore is not dealing with json input
>>> and output, we could have json, jstore and hstore.
>> Jstore isn't the worst name suggestion I've heard on this thread.  The
>> reason I prefer JSONB though, is that a new user looking for a place to
>> put JSON data will clearly realize that JSON and JSONB are alternatives
>> and related in some way.  They won't necessarily expect that "jstore"
>> has anything to do with JSON, especially when there is another type
>> called "JSON".  Quite a few people are liable to think it's something to
>> do with Java.
>>
>> Besides, we might get sued by these people: http://www.jstor.org/  ;-)
>>
>
> I don't think any name that doesn't begin with "json" is acceptable.
> I could live with "jsonb". It has the merit of brevity, but maybe it's
> a tad
> too close to "json" to be the right answer.
How about jsondoc, or jsonobj ?

It is still reasonably 'json' but not too easy to confuse with existing
json
when typing

And it perhaps hints better at the main difference from string-json, namely
that it is an object and not textual source code / notation / processing
info .

Cheers

-- 
Hannu Krosing
PostgreSQL Consultant
Performance, Scalability and High Availability
2ndQuadrant Nordic OÜ

Re: additional json functionality

From

Peter Eisentraut

Date:

18 November 2013, 14:13:33

On 11/15/13, 6:15 PM, Josh Berkus wrote:
> Thing is, I'm not particularly concerned about *Merlin's* specific use
> case, which there are ways around. What I am concerned about is that we
> may have users who have years of data stored in JSON text fields which
> won't survive an upgrade to binary JSON, because we will stop allowing
> certain things (ordering, duplicate keys) which are currently allowed in
> those columns.  At the very least, if we're going to have that kind of
> backwards compatibilty break we'll want to call the new version 10.0.

We could do something like SQL/XML and specify the level of "validity"
in a typmod, e.g., json(loose), json(strict), etc.

Re: additional json functionality

From

Merlin Moncure

Date:

18 November 2013, 14:38:30

On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
> I don't think any name that doesn't begin with "json" is acceptable. I could
> live with "jsonb". It has the merit of brevity, but maybe it's a tad too
> close to "json" to be the right answer.

I think that seems right.  Couple thoughts:

*) Aside from the text in and out routines, how is 'jsbonb' different
from the coming 'nested hstore'?   Enough to justify two code bases?

*) How much of the existing json API has to be copied over to the
jsonb type and how exactly is that going to happen?  For example, I
figure we'd need a "record_to_jsonb" etc. for sure, but do we also
need a jsonb_each()...can't we overload instead?

Maybe we can cheat a little bit overload the functions so that one the
existing APIs (hstore or json) can be recovered -- only adding what
minimal functionality needs to be added to handle the type distinction
(mostly on serialization routines and casts).  What I'm driving at
here is that it would be nice if the API was not strongly bifurcated:
this would cause quite a bit of mindspace confusion.

merlin

Re: additional json functionality

From

Andrew Dunstan

Date:

18 November 2013, 14:41:28

On 11/18/2013 09:38 AM, Merlin Moncure wrote:
> On Sun, Nov 17, 2013 at 10:19 PM, Andrew Dunstan <andrew@dunslane.net> wrote:
>> I don't think any name that doesn't begin with "json" is acceptable. I could
>> live with "jsonb". It has the merit of brevity, but maybe it's a tad too
>> close to "json" to be the right answer.
> I think that seems right.  Couple thoughts:
>
> *) Aside from the text in and out routines, how is 'jsbonb' different
> from the coming 'nested hstore'?   Enough to justify two code bases?

The discussion has been around making a common library that would be 
used for both.


>
> *) How much of the existing json API has to be copied over to the
> jsonb type and how exactly is that going to happen?  For example, I
> figure we'd need a "record_to_jsonb" etc. for sure, but do we also
> need a jsonb_each()...can't we overload instead?


Overloading is what I was planning to do.


cheers

andrew

Re: additional json functionality

From

Josh Berkus

Date:

18 November 2013, 17:49:35

On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
> On 11/15/13, 6:15 PM, Josh Berkus wrote:
>> Thing is, I'm not particularly concerned about *Merlin's* specific use
>> case, which there are ways around. What I am concerned about is that we
>> may have users who have years of data stored in JSON text fields which
>> won't survive an upgrade to binary JSON, because we will stop allowing
>> certain things (ordering, duplicate keys) which are currently allowed in
>> those columns.  At the very least, if we're going to have that kind of
>> backwards compatibilty break we'll want to call the new version 10.0.
> 
> We could do something like SQL/XML and specify the level of "validity"
> in a typmod, e.g., json(loose), json(strict), etc.

Doesn't work; with XML, the underlying storage format didn't change.
With JSONB, it will ... so changing the typemod would require a total
rewrite of the table.  That's a POLS violation if I ever saw one

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Josh Berkus

Date:

18 November 2013, 18:10:35

Merlin,

> *) Aside from the text in and out routines, how is 'jsbonb' different
> from the coming 'nested hstore'?   Enough to justify two code bases?

In/out functions and defaults are all different.  Otherwise, the two
types will be accessing the same code base, so no duplication.  Think of
is as XML vs. TEXT.

> Maybe we can cheat a little bit overload the functions so that one the
> existing APIs (hstore or json) can be recovered -- only adding what
> minimal functionality needs to be added to handle the type distinction
> (mostly on serialization routines and casts).  What I'm driving at
> here is that it would be nice if the API was not strongly bifurcated:
> this would cause quite a bit of mindspace confusion.

I'll also note that for functions designed for output to the client, it
doesn't make much of a difference whether the result is JSON or JSONB,
since the string representation will be identical.  However, it makes a
difference if the data is going to be stored, since a double conversion
on a large JSON value would be expensive.

In other words, we need a version of each function which outputs JSONB,
but that version doesn't have to be the default if users don't specify.

Note that this raises the issue of "first alternate data type ambiguity"
again for overloading builtin functions.  We really need that method of
"prefering" a specific version of the function ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Merlin Moncure

Date:

18 November 2013, 19:00:31

On Mon, Nov 18, 2013 at 12:10 PM, Josh Berkus <josh@agliodbs.com> wrote:
> Merlin,
>
>> *) Aside from the text in and out routines, how is 'jsbonb' different
>> from the coming 'nested hstore'?   Enough to justify two code bases?
>
> In/out functions and defaults are all different.  Otherwise, the two
> types will be accessing the same code base, so no duplication.  Think of
> is as XML vs. TEXT.
>
>> Maybe we can cheat a little bit overload the functions so that one the
>> existing APIs (hstore or json) can be recovered -- only adding what
>> minimal functionality needs to be added to handle the type distinction
>> (mostly on serialization routines and casts).  What I'm driving at
>> here is that it would be nice if the API was not strongly bifurcated:
>> this would cause quite a bit of mindspace confusion.
>
> I'll also note that for functions designed for output to the client, it
> doesn't make much of a difference whether the result is JSON or JSONB,
> since the string representation will be identical.  However, it makes a
> difference if the data is going to be stored, since a double conversion
> on a large JSON value would be expensive.

Hm, but it would matter wouldn't it...the jsonb representation would
give output with the record fields reordered, deduplicated, etc.
Also, presumably, the jsonb serialization would be necessarily slower
for exactly that reason, although perhaps not enough to matter much.

> In other words, we need a version of each function which outputs JSONB,
> but that version doesn't have to be the default if users don't specify.
>
> Note that this raises the issue of "first alternate data type ambiguity"
> again for overloading builtin functions.  We really need that method of
> "prefering" a specific version of the function ...

You'd need explicit jsonb creating functions: record_to_jsonb,
array_to_jsonb etc.  AFAIK, these functions would be the only ones
that would have to explicitly reference the jsonb type if you don't
count casts.

It's tempting to *not* make those functions as that would only require
the user to specify jsonb for table columns...you'd then go from json
to jsonb with a cast, perhaps even an implicit one.  The disadvantage
there is that you'd then get unsimplified json always.

Hm -- on that note, is it technically feasible to *not* duplicate the
json API implementations, but just (ab)use implicit casting between
the APIs?  That way the text API would own all the serialization
routines as it does now but you'd run mutation and searching through
jsonb...

merlin

Re: additional json functionality

From

Robert Haas

Date:

19 November 2013, 16:14:47

On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
> I am sure you could also devise an json encoding scheme
> where white space is significant ;)

I don't even have to think hard.  If you want your JSON to be
human-readable, it's entirely possible that you want it stored using
the same whitespace that was present on input.  There is a valid use
case for normalizing whitespace, too, of course.

Everyone on this thread who thinks that there is Only One Right Way To
Do It should take a chill pill.  There is, in fact, more than one
right way to do it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: additional json functionality

From

"David E. Wheeler"

Date:

19 November 2013, 17:27:37

On Nov 19, 2013, at 8:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:

> Everyone on this thread who thinks that there is Only One Right Way To
> Do It should take a chill pill.  There is, in fact, more than one
> right way to do it.

You shoulda been a Perl hacker, Robert.

D

Re: additional json functionality

From

Robert Haas

Date:

19 November 2013, 17:29:49

On Tue, Nov 19, 2013 at 12:27 PM, David E. Wheeler
<david@justatheory.com> wrote:
> On Nov 19, 2013, at 8:14 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Everyone on this thread who thinks that there is Only One Right Way To
>> Do It should take a chill pill.  There is, in fact, more than one
>> right way to do it.
>
> You shoulda been a Perl hacker, Robert.

I don't hack on Perl, but I spent about 10 years hacking *in* Perl, so
the phrasing was not a coincidence.  :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: additional json functionality

From

Josh Berkus

Date:

19 November 2013, 17:59:58

On 11/19/2013 08:14 AM, Robert Haas wrote:
> On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>> I am sure you could also devise an json encoding scheme
>> where white space is significant ;)
> 
> I don't even have to think hard.  If you want your JSON to be
> human-readable, it's entirely possible that you want it stored using
> the same whitespace that was present on input.  There is a valid use
> case for normalizing whitespace, too, of course.

Given that JSON is a data interchange format, I suspect that there are
an extremely large combination of factors which would result in an
unimplementably large number of parser settings.  For example, I
personally would have use for a type which allowed the storage of JSON
*fragments*.  Therefore I am interested only in supporting two:

a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's
apps, and

b) the optimal behavior for Hstore2/JSONB.

(a) is defined as:* complete documents only (no fragments)* whitespace not significant* no reordering of keys*
duplicatekeys allowed

(b) is defined as:* complete documents only (no fragments)* whitespace not significant* reordering of keys* duplicate
keysprohibited    

If people want other manglings of JSON, they can use TEXT fields and
custom parsers written in PL/v8, the same way I do.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Robert Haas

Date:

19 November 2013, 18:09:22

On Tue, Nov 19, 2013 at 12:59 PM, Josh Berkus <josh@agliodbs.com> wrote:
> On 11/19/2013 08:14 AM, Robert Haas wrote:
>> On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>>> I am sure you could also devise an json encoding scheme
>>> where white space is significant ;)
>>
>> I don't even have to think hard.  If you want your JSON to be
>> human-readable, it's entirely possible that you want it stored using
>> the same whitespace that was present on input.  There is a valid use
>> case for normalizing whitespace, too, of course.
>
> Given that JSON is a data interchange format, I suspect that there are
> an extremely large combination of factors which would result in an
> unimplementably large number of parser settings.  For example, I
> personally would have use for a type which allowed the storage of JSON
> *fragments*.  Therefore I am interested only in supporting two:
>
> a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's
> apps, and
>
> b) the optimal behavior for Hstore2/JSONB.
>
> (a) is defined as:
>         * complete documents only (no fragments)
>         * whitespace not significant
>         * no reordering of keys
>         * duplicate keys allowed
>
> (b) is defined as:
>         * complete documents only (no fragments)
>         * whitespace not significant
>         * reordering of keys
>         * duplicate keys prohibited
>
> If people want other manglings of JSON, they can use TEXT fields and
> custom parsers written in PL/v8, the same way I do.

For (a), I assume you mean "whitespace not significant, but
preserved", because that is the current behavior, whereas for (b), I
think we would want to say "whitespace neither significant nor
preserved".  Other than that, I completely agree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: additional json functionality

From

Andrew Dunstan

Date:

19 November 2013, 18:09:38

On 11/19/2013 12:59 PM, Josh Berkus wrote:
> On 11/19/2013 08:14 AM, Robert Haas wrote:
>> On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>>> I am sure you could also devise an json encoding scheme
>>> where white space is significant ;)
>> I don't even have to think hard.  If you want your JSON to be
>> human-readable, it's entirely possible that you want it stored using
>> the same whitespace that was present on input.  There is a valid use
>> case for normalizing whitespace, too, of course.
> Given that JSON is a data interchange format, I suspect that there are
> an extremely large combination of factors which would result in an
> unimplementably large number of parser settings.  For example, I
> personally would have use for a type which allowed the storage of JSON
> *fragments*.  Therefore I am interested only in supporting two:
>
> a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's
> apps, and
>
> b) the optimal behavior for Hstore2/JSONB.
>
> (a) is defined as:
>     * complete documents only (no fragments)
>     * whitespace not significant
>     * no reordering of keys
>     * duplicate keys allowed
>
> (b) is defined as:
>     * complete documents only (no fragments)
>     * whitespace not significant
>     * reordering of keys
>     * duplicate keys prohibited    
>
> If people want other manglings of JSON, they can use TEXT fields and
> custom parsers written in PL/v8, the same way I do.
>


Fragments are currently allowed in a):
   andrew=# select '"a"'::json;     json   ------     "a"


Given that, I'm not sure we shouldn't permit them in b) either. I think 
I lost that argument back in the 9.2 dev cycle. I really don't want to 
get to a situation where foo::json::jsonb can produce an error.

cheers

andrew

Re: additional json functionality

From

David Johnston

Date:

19 November 2013, 18:43:34

Andrew Dunstan wrote
> Given that, I'm not sure we shouldn't permit them in b) either. I think 
> I lost that argument back in the 9.2 dev cycle. I really don't want to 
> get to a situation where foo::json::jsonb can produce an error.

So what do you propose happens when the input json has duplicate keys?  

IMO A reasonable default cast function should error if the json contents
require anything more than a straight parse to be stored into jsonb.  If the
user still needs to make the conversion we should have a standard and
configurable parser function with json input and jsonb output.  In this case
the key-keep options would be "keep first encountered" or "keep last
encountered" or "fail on duplicate" the last of which would be the default.

I have not really pondered storing scalars into jsonb but before pondering
usability are there any technical concerns.  If the goal is to share the
backend with hstore then current hstore does not allow for this and so the
json aspect would either transfer back over or it would need customized
code.

David J.





--
View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779221.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: additional json functionality

From

Gavin Flower

Date:

19 November 2013, 18:55:06

On 20/11/13 05:14, Robert Haas wrote:
> On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>> I am sure you could also devise an json encoding scheme
>> where white space is significant ;)
> I don't even have to think hard.  If you want your JSON to be
> human-readable, it's entirely possible that you want it stored using
> the same whitespace that was present on input.  There is a valid use
> case for normalizing whitespace, too, of course.
>
> Everyone on this thread who thinks that there is Only One Right Way To
> Do It should take a chill pill.  There is, in fact, more than one
> right way to do it.
>
There is only one obvious 'Right Way', and that is 'My Way'!  :-)

More seriously, there are obviously variants in what people consider 
useful human readable form of JSON output, but it is probably 
inefficient to store white space.  Which suggests it might be useful to 
allow users to store rules so that the output and include the white 
space that they want.  However, this is non-trivial - for example 
Eclipse allows Java/XML source to be formatted in different ways (here 
the source files are store with white space), but lacks a couple of 
options that I would like.  Possibly, JSON output of white space would 
be less problematical?

Cheers,
Gavin

Re: additional json functionality

From

Andrew Dunstan

Date:

19 November 2013, 18:58:54

On 11/19/2013 01:43 PM, David Johnston wrote:
> Andrew Dunstan wrote
>> Given that, I'm not sure we shouldn't permit them in b) either. I think
>> I lost that argument back in the 9.2 dev cycle. I really don't want to
>> get to a situation where foo::json::jsonb can produce an error.
> So what do you propose happens when the input json has duplicate keys?

I propose that we do what V8 does, what hstore does (yes, I know it's 
not json, but consistency is good), what we do now with the json_get() 
functions, what almost every other JSON engine I know of does, namely 
that the last key wins.

cheers

andrew

Re: additional json functionality

From

David Johnston

Date:

19 November 2013, 18:59:36

Gavin Flower-2 wrote
> More seriously, there are obviously variants in what people consider 
> useful human readable form of JSON output, but it is probably 
> inefficient to store white space.  

Enough to matter?  Maybe the extra whitespace causes a marginal value to be
toasted but, IIUC, for a value that is going to be toasted anyway the
compression factors for both speed and space is going to make whitespace
considerations insignificant.

David J.




--
View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779227.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: additional json functionality

From

Andrew Dunstan

Date:

19 November 2013, 19:02:37

On 11/19/2013 01:43 PM, David Johnston wrote:


>
> I have not really pondered storing scalars into jsonb but before pondering
> usability are there any technical concerns.  If the goal is to share the
> backend with hstore then current hstore does not allow for this and so the
> json aspect would either transfer back over or it would need customized
> code.
>

Your premise here is simply wrong. The new hstore code does support 
scalar root elements.

cheers

andew

Re: additional json functionality

From

Merlin Moncure

Date:

19 November 2013, 20:06:18

On Tue, Nov 19, 2013 at 11:59 AM, Josh Berkus <josh@agliodbs.com> wrote:
> On 11/19/2013 08:14 AM, Robert Haas wrote:
>> On Thu, Nov 14, 2013 at 2:54 PM, Hannu Krosing <hannu@2ndquadrant.com> wrote:
>>> I am sure you could also devise an json encoding scheme
>>> where white space is significant ;)
>>
>> I don't even have to think hard.  If you want your JSON to be
>> human-readable, it's entirely possible that you want it stored using
>> the same whitespace that was present on input.  There is a valid use
>> case for normalizing whitespace, too, of course.
>
> Given that JSON is a data interchange format, I suspect that there are
> an extremely large combination of factors which would result in an
> unimplementably large number of parser settings.  For example, I
> personally would have use for a type which allowed the storage of JSON
> *fragments*.  Therefore I am interested only in supporting two:
>
> a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's

I'm uncomfortable with the word 'legacy'.   This suggests the new type
will essentially deprecate the old type.  jsonb will be likely be
pessimal to large serializations.   If you're not manipulating and
searching the documents, why would you use it?  It's going to take
more space on disk and memory and should provide little benefit for
present *as well as future code* .  (note, it will provide extreme
benefits for nosql type uses which is of huge strategic importance for
the project).  json and jsonb APIs should work together cleanly, and
the documentation should suggest which types are different and better
for which cases.

merlin

Re: additional json functionality

From

Andrew Dunstan

Date:

19 November 2013, 20:33:06

On 11/19/2013 03:06 PM, Merlin Moncure wrote:


>>   Therefore I am interested only in supporting two:
>>
>> a) the legacy behavior from 9.2 and 9.3 so we don't destroy people's
> I'm uncomfortable with the word 'legacy'.   This suggests the new type
> will essentially deprecate the old type.


"Existing" might be a better word.

> jsonb will be likely be
> pessimal to large serializations.   If you're not manipulating and
> searching the documents, why would you use it?  It's going to take
> more space on disk and memory and should provide little benefit for
> present *as well as future code* .  (note, it will provide extreme
> benefits for nosql type uses which is of huge strategic importance for
> the project).  json and jsonb APIs should work together cleanly, and
> the documentation should suggest which types are different and better
> for which cases.


I agree with most of this.

cheers

andrew

Re: additional json functionality

From

Robert Haas

Date:

20 November 2013, 12:53:03

On Tue, Nov 19, 2013 at 1:43 PM, David Johnston <polobo@yahoo.com> wrote:
> IMO A reasonable default cast function should error if the json contents
> require anything more than a straight parse to be stored into jsonb.  If the
> user still needs to make the conversion we should have a standard and
> configurable parser function with json input and jsonb output.  In this case
> the key-keep options would be "keep first encountered" or "keep last
> encountered" or "fail on duplicate" the last of which would be the default.
>
> I have not really pondered storing scalars into jsonb but before pondering
> usability are there any technical concerns.  If the goal is to share the
> backend with hstore then current hstore does not allow for this and so the
> json aspect would either transfer back over or it would need customized
> code.

I confess to being a bit perplexed by why we want hstore and json to
share a common binary format.  hstore doesn't store hierarchical data;
json does.  If we design a binary format for json, doesn't that just
obsolete store?  Why go to a lot of trouble to extend our home-grown
format if we've got a standard format to plug into?

The thing that's really missing in all of these discussions (AFAICS)
is the issue of creating index support for these types.  If using some
variant of the existing hstore format makes that easier, then I guess
I understand the connection - but I'm not sure why or how that would
be the case, and it would be nice to make the connection more
explicit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: additional json functionality

From

Andrew Dunstan

Date:

20 November 2013, 13:18:16

On 11/20/2013 07:52 AM, Robert Haas wrote:
> On Tue, Nov 19, 2013 at 1:43 PM, David Johnston <polobo@yahoo.com> wrote:
>> IMO A reasonable default cast function should error if the json contents
>> require anything more than a straight parse to be stored into jsonb.  If the
>> user still needs to make the conversion we should have a standard and
>> configurable parser function with json input and jsonb output.  In this case
>> the key-keep options would be "keep first encountered" or "keep last
>> encountered" or "fail on duplicate" the last of which would be the default.
>>
>> I have not really pondered storing scalars into jsonb but before pondering
>> usability are there any technical concerns.  If the goal is to share the
>> backend with hstore then current hstore does not allow for this and so the
>> json aspect would either transfer back over or it would need customized
>> code.
> I confess to being a bit perplexed by why we want hstore and json to
> share a common binary format.  hstore doesn't store hierarchical data;
> json does.  If we design a binary format for json, doesn't that just
> obsolete store?  Why go to a lot of trouble to extend our home-grown
> format if we've got a standard format to plug into?
>
> The thing that's really missing in all of these discussions (AFAICS)
> is the issue of creating index support for these types.  If using some
> variant of the existing hstore format makes that easier, then I guess
> I understand the connection - but I'm not sure why or how that would
> be the case, and it would be nice to make the connection more
> explicit.
>


Oleg and Teodor have done quite a lot of work on a version of hstore 
that supports nested structures. See their pgcon talk. With some 
additions it has become in effect a non-standard notation for json. 
Rather than repeat that work, my suggestion has been that they abstract 
the common parts into a library that can be used by jsonb or whatever we 
end up calling it as well as nested hstore. I understand Teodor is 
working on this.

In general I share your feelings, though.

cheers

andrew

Re: additional json functionality

From

Hannu Krosing

Date:

20 November 2013, 16:22:31

On 11/18/2013 06:49 PM, Josh Berkus wrote:
> On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
>> On 11/15/13, 6:15 PM, Josh Berkus wrote:
>>> Thing is, I'm not particularly concerned about *Merlin's* specific use
>>> case, which there are ways around. What I am concerned about is that we
>>> may have users who have years of data stored in JSON text fields which
>>> won't survive an upgrade to binary JSON, because we will stop allowing
>>> certain things (ordering, duplicate keys) which are currently allowed in
>>> those columns.  At the very least, if we're going to have that kind of
>>> backwards compatibilty break we'll want to call the new version 10.0.
>> We could do something like SQL/XML and specify the level of "validity"
>> in a typmod, e.g., json(loose), json(strict), etc.
> Doesn't work; with XML, the underlying storage format didn't change.
> With JSONB, it will ... so changing the typemod would require a total
> rewrite of the table.  That's a POLS violation if I ever saw one
We do rewrites on typmod changes already.

To me having json(string) and json(hstore) does not seem too bad.

Cheers
Hannu

Re: additional json functionality

From

David Johnston

Date:

20 November 2013, 16:37:43

Hannu Krosing-3 wrote
> On 11/18/2013 06:49 PM, Josh Berkus wrote:
>> On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
>>> On 11/15/13, 6:15 PM, Josh Berkus wrote:
>>>> Thing is, I'm not particularly concerned about *Merlin's* specific use
>>>> case, which there are ways around. What I am concerned about is that we
>>>> may have users who have years of data stored in JSON text fields which
>>>> won't survive an upgrade to binary JSON, because we will stop allowing
>>>> certain things (ordering, duplicate keys) which are currently allowed
>>>> in
>>>> those columns.  At the very least, if we're going to have that kind of
>>>> backwards compatibilty break we'll want to call the new version 10.0.
>>> We could do something like SQL/XML and specify the level of "validity"
>>> in a typmod, e.g., json(loose), json(strict), etc.
>> Doesn't work; with XML, the underlying storage format didn't change.
>> With JSONB, it will ... so changing the typemod would require a total
>> rewrite of the table.  That's a POLS violation if I ever saw one
> We do rewrites on typmod changes already.
> 
> To me having json(string) and json(hstore) does not seem too bad.

Three things:

1) How would this work in the face of functions that erase typemod
information?
2) json [no type mod] would have to effectively default to json(string)?
3) how would #1 and #2 interact?

I pondered the general idea but my (admittedly limited) gut feeling is that
using typemod would possibly be technically untenable and from an end-user
perspective would be even more confusing than having two types.

David J.






--
View this message in context:
http://postgresql.1045698.n5.nabble.com/additional-json-functionality-tp5777975p5779428.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

Re: additional json functionality

From

Josh Berkus

Date:

20 November 2013, 17:43:54

On 11/20/2013 04:52 AM, Robert Haas wrote:
> I confess to being a bit perplexed by why we want hstore and json to
> share a common binary format.  hstore doesn't store hierarchical data;
> json does.  If we design a binary format for json, doesn't that just
> obsolete store?  Why go to a lot of trouble to extend our home-grown
> format if we've got a standard format to plug into?

See hstore2 patch from Teodor.  That's what we're talking about, not
hstore1, which as you point out is non-heirarchical.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Greg Stark

Date:

20 November 2013, 17:51:44

<div dir="ltr"><div class="gmail_extra"><br /><div class="gmail_quote">On Sat, Nov 16, 2013 at 12:32 AM, Josh Berkus
<spandir="ltr"><<a href="mailto:josh@agliodbs.com" target="_blank">josh@agliodbs.com</a>></span> wrote:<br
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div
class="im">On11/15/2013 04:00 PM, David Johnston wrote:<br /> > Looking at this a different way: could we just
implementBSON and leave json<br /> > alone?<br /> ><br /> > <a href="http://bsonspec.org/"
target="_blank">http://bsonspec.org/</a><br/><br /></div>In short?  No.<br /><br /> For one thing, our storage format
isdifferent from theirs (better,<br /> frankly), and as a result is not compliant with their
"standard".</blockquote></div><br/></div><div class="gmail_extra">Not being super familiar with either BSON our JSONB
whatadvantages are we gaining from the difference?<br /><br /></div><div class="gmail_extra">It might be interesting if
wesupported the same binary representation so we could have a binary send/recv routines that don't need to do any
serialization/deserialization.Especially since a standard format would potentially be skipping the
serialization/deserializationon both the server and client.<br /><br /></div><div class="gmail_extra"><br clear="all"
/><br/>-- <br />greg<br /></div></div>

Re: additional json functionality

From

Andrew Dunstan

Date:

20 November 2013, 18:01:24

On 11/20/2013 12:50 PM, Greg Stark wrote:
>
> On Sat, Nov 16, 2013 at 12:32 AM, Josh Berkus <josh@agliodbs.com 
> <mailto:josh@agliodbs.com>> wrote:
>
>     On 11/15/2013 04:00 PM, David Johnston wrote:
>     > Looking at this a different way: could we just implement BSON
>     and leave json
>     > alone?
>     >
>     > http://bsonspec.org/
>
>     In short?  No.
>
>     For one thing, our storage format is different from theirs (better,
>     frankly), and as a result is not compliant with their "standard".
>
>
> Not being super familiar with either BSON our JSONB what advantages 
> are we gaining from the difference?
>
> It might be interesting if we supported the same binary representation 
> so we could have a binary send/recv routines that don't need to do any 
> serialization/deserialization. Especially since a standard format 
> would potentially be skipping the serialization/deserialization on 
> both the server and client.
>
>
>


To start with, it doesn't support arbitrary precision numerics. That 
means that right off the bat it's only accepting a subset of what the 
JSON spec allows. 'Nuff said, I think.

cheers

andrew

Re: Storage formats for JSON WAS: additional json functionality

From

Josh Berkus

Date:

20 November 2013, 18:01:37

Greg,

> Not being super familiar with either BSON our JSONB what advantages are we
> gaining from the difference?

We have the JSONB/Hstore2 format *now*, and it can go into 9.4.  This
makes it inherently superior to any theoretical format.  So any further
discussion (below) is strictly academic, for the archives.

> It might be interesting if we supported the same binary representation so
> we could have a binary send/recv routines that don't need to do any
> serialization/deserialization. Especially since a standard format would
> potentially be skipping the serialization/deserialization on both the
> server and client.

Leaving aside that we don't want to implement 10gen's spec (because of
major omissions like decimal numbers), the fundamental issue with
binary-update-in-place is that nobody (certainly not 10gen) has devised
a way to do it without having ginormous amounts of bloat in value
storage.  The way BSON allows for binary-update-in-place, as I
understand it, is to pad all values with lots of zero bytes and to
prohibit compression, either of which are much larger losses for
performance than serialization is.

In other words, binary-update-in-place seems like a clear win for
heirarchical data storage, but practically it's not.

Of course, an investigation into this by someone with much more
knowledge of low-level storage than me (most of this list) is welcome.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

Re: additional json functionality

From

Tom Lane

Date:

20 November 2013, 18:36:08

David Johnston <polobo@yahoo.com> writes:
>> On 11/18/2013 06:13 AM, Peter Eisentraut wrote:
>>> We could do something like SQL/XML and specify the level of "validity"
>>> in a typmod, e.g., json(loose), json(strict), etc.

> Three things:

> 1) How would this work in the face of functions that erase typemod
> information?

You'd have to make the data self-identifying (which I think was the plan
already), and ensure that *every* function taking "json" could cope with
both formats on input.  The typmod could only be expected to be enforced
when storing or explicitly casting to one subformat, much like operations
on "numeric" pay little attention to the original precision/scale if any.

I agree that this solution isn't terribly workable, mainly because it'd
break any third-party C functions that take json today.
        regards, tom lane

Re: additional json functionality

From

Andrew Dunstan

Date:

20 November 2013, 18:52:04

On 11/20/2013 01:36 PM, Tom Lane wrote:
>
> You'd have to make the data self-identifying (which I think was the plan
> already), and ensure that *every* function taking "json" could cope with
> both formats on input.  The typmod could only be expected to be enforced
> when storing or explicitly casting to one subformat, much like operations
> on "numeric" pay little attention to the original precision/scale if any.
>
> I agree that this solution isn't terribly workable, mainly because it'd
> break any third-party C functions that take json today.
>
>             


Yeah, I had come to this conclusion. I don't think we can bolt on 
typmods after the event.

I don't think having a separate "jsonb" type will be a tragedy.

I'm already planning on overloading the existing json functions and 
operators.


cheers

andrew

Re: additional json functionality

From

Robert Haas

Date:

20 November 2013, 22:07:21

On Wed, Nov 20, 2013 at 12:50 PM, Greg Stark <stark@mit.edu> wrote:
>> For one thing, our storage format is different from theirs (better,
>> frankly), and as a result is not compliant with their "standard".
>
> Not being super familiar with either BSON our JSONB what advantages are we
> gaining from the difference?

BSON assumes, for example, that all integers fit in 64-bits and all
floating point values can be accurately represented as float8.  So not
all JSON objects can be represented as BSON without loss of
information.

BSON also adds a bunch of extra types that are not part of JSON, like
timestamps, regular expressions, and embedded documents.  So not all
BSON objects can be represented as JSON without loss of information.

While it's tempting to think that BSON is a serialization format for
JSON, and the name is meant to suggest that, it really isn't.  It's
just a serialization format for approximately whatever the authors
thought would be useful, which happens to be kinda like JSON.  Sorta.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Storage formats for JSON WAS: additional json functionality

From

Andres Freund

Date:

20 November 2013, 23:26:11

On 2013-11-20 10:01:28 -0800, Josh Berkus wrote:
> Greg,
> 
> > Not being super familiar with either BSON our JSONB what advantages are we
> > gaining from the difference?
> 
> We have the JSONB/Hstore2 format *now*, and it can go into 9.4.

That patch needs a *fair* amount of work though.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

Re: additional json functionality

From

Maciej Gajewski

Date:

21 November 2013, 10:21:51

Hi everyone

I used to work on a project storing large quantities of schema-less data, initially using MongoDB, then Postgres with JSON, and eventually I implemented BSON support for Postgres to get the best of both worlds: https://github.com/maciekgajewski/postgresbson

I don't think that JSONB is a good idea. There is a lot to learn from MongoDB's mistakes in this area.

1. As noted in this thread previously, JSON is a serialization format, not a document format.

2. Almost any structured data type, self-describing or not, can be serialized to/from JSON, but always using only subset of it, and interpreting it in it's own specific way.

3. JSON greatest strength is interoperability. It is a great feature of Postgres that JSON is stored as a text; it's basically a 'text, but you can do something with it'. There is many JSON implementations out there, and one should make no assumption about application's expectations.

For instance: JSON standard (RFS-4627) defines all number to be doubles. Yet I've seen application storing 64-bit integers (wouldn't fit in double precision), or even arbitrary precision integers. Most parsers are OK with that.

4. JSON greatest weakness is performance. Because of 1. it needs to be parsed before any useful information is extracted.

5. 1. and 3. are mutually exclusive; this is one of the most valuable takeaways I have from working with Mongo and BSON in particular. BSON is an attempt to create 'binary JSON', and a failed one. It is a poor serialization format: faster than JSON, but less flexible. Being binary, it is strongly typed, and it uses various gimmicks to preserve flexibility: implicit type casts, 3 different equality comparison functions etc. And it's not fully convertible to/from JSON; no binary format is.

It is a poor document format as well: it retains some of the JSON's performance problems: serial nature and ineffective storage.

6. Speed matters to some, and being able to generate binary data in application and send it to database without any serialization/parsing in between provides great optimization opportunity. One thing that Mongo guys got right is the fixed, well-defined binary representation. Application can use provided library to generate objects, and doesn't need to worry about server's version or endianess.

In the application I've mention before, switching from JSON to BSON (in Postgres 9.2, using postgresbson) increased throughput by an order of magnitude. It was an insert-heavy database with indexes on object fields. Both serializing in application and desalinizing in server was faster ~10x.

7. It seems to me that JSONB is going to repeat all the mistakes of BSON, it's going to be 'neither'. If there is an agreement that Postgres needs a 'document' format, why not acknowledge 5., and simply adopt one of the existing formats. Or even better: adopt none, provide many, provide binary send/recv and conversion to and from JSON, let the user choose.

The world is full of self-describing binary formats: BSON, MessagePack (http://msgpack.org/), protobuf, hierarchical H-Store is coming along. Adding another one would create confusion, and a situation similar to this: http://xkcd.com/927/

And a side note:

Postgres' greatest and most under-advertised feature is it's extensibility. People tend to notice only the features present in the core package, while there should be a huge banner on top of http://www.postgresql.org/: "Kids, we support all data types: we have XML, we have JSON, we have H-store, we have BSON, and all it with build-in indexing, storage compression and full transaction support!"

Maciej G.

Re: additional json functionality

From

Josh Berkus

Date:

21 November 2013, 18:27:27

Maciej,

Thanks for feedback -- it's helpful to get a different perspective on this.

> I used to work on a project storing large quantities of schema-less data,
> initially using MongoDB, then Postgres with JSON, and eventually I
> implemented BSON support for Postgres to get the best of both worlds:
> https://github.com/maciekgajewski/postgresbson

Huh, is that linked anywhere else?

> 
> I don't think that JSONB is a good idea. There is a lot to learn from
> MongoDB's mistakes in this area.
> 
> 1. As noted in this thread previously, JSON is a serialization format, not
> a document format.

The momentum of the industry would argue against you.

> 5. 1. and 3. are mutually exclusive. the this is one of the most valuable
> takeaways I have from working with Mongo and BSON in particular. BSON is an
> attempt to create 'binary JSON', and a failed one.

BSON was also developed part-time, by an inexperienced developer, as a
side effect of a different project.  And they froze the API to early.
And they only made a halfhearted effort to be compatible with JSON.

So the fact that BSON is a failure doesn't tell us any more than "don't
do this in a half-assed way".

> 7. It seems to me that JSONB is going to repeat all the mistakes of BSON,
> it's going to be 'neither'. If there is an agreement that Postgres needs a
> 'document' format, why not acknowledge 5., and simply adopt one of the
> existing formats. Or even better: adopt none, provide many, provide binary
> send/recv and conversion to and from JSON, let the user choose.

Well, I think others have already documented why we don't want BSON.

> The world is full of self-describing binary formats: BSON, MessagePack (
> http://msgpack.org/), protobuf, hierarchical H-Store is coming along.
> Adding another one would create confusion, and a situation similar to this:
> http://xkcd.com/927/

Apparently you missed that JSONB is just Hstore2 with different in/out
functions?

> Postgres' greatest and most under-advertised feature is it's extensibility.

I agree that we don't do enough to promote our extensions.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com