Further issues with jsonb semantics, documentation - Mailing list pgsql-hackers
From | Peter Geoghegan |
---|---|
Subject | Further issues with jsonb semantics, documentation |
Date | |
Msg-id | CAM3SWZQFSWMi2aVi-Lun_JBYh-RfHQ3-0fm8TXpW8OLc+v8ZnQ@mail.gmail.com Whole thread Raw |
Responses |
Re: Further issues with jsonb semantics, documentation
Re: Further issues with jsonb semantics, documentation |
List | pgsql-hackers |
I've noticed some more issues with the jsonb documentation, and the new jsonb stuff generally. I didn't set out to give Andrew feedback on the semantics weeks after feature freeze, but unfortunately this feels like another discussion that we need to have now rather than later. "operator jsonb - integer" =================== Summary: I think that this operator has a problem, but a problem that can easily be fixed. I think it was a bad idea to allow array-style removal of object key/value pairs. ISTM that it implies a level of stability in the ordering that doesn't make sense. Besides, is it really all that useful? Consider this case: postgres=# select '{"c":5, "a":6, "b":7}'::jsonb - 1; ?column? ------------------ {"a": 6, "c": 5} (1 row) Clearly anyone expecting the value "a" to be removed here would be in for a surprise. Moreover, it is inconsistent with the established behavior of the corresponding array-wise subscript operator: postgres=# select '{"c":5, "a":6, "b":7}'::jsonb -> 1; ?column? ---------- [null] (1 row) I suggest, given that this is conceptually a data-modifying operator, that the minus operator/jsonb_delete() case raise an error rather than matching "operator jsonb -> integer" and returning NULL. I say this as the person who successfully argued that the -> operator case above should return NULL during the 9.4 beta period; returning SQL NULL for the delete/minus operator feels like going too far in the direction of permissiveness, even for jsonb; my expression index argument does not apply here as it did for the "operator jsonb -> integer" case. "operator jsonb - text" ================ Summary: I think that this operator is fine. Documentation needs work, though. The "operator jsonb - text" operator ought to be documented as in the attached patch, which is closer to the equivalent hstore operator, and emphasizes the "operator jsonb ? text" definition of a key. It should emphasize its similarity to the established "operator jsonb ? text" operator, and in particular that array elements behave as keys *iff* they're strings. "operator jsonb - text[]" (and *nested* deletion more generally) =============================================== Summary: I think that this operator has many problems, and should be scraped (although only as an operator). IMV nested deletion should only be handled by functions, and the way that nested deletion works in general should be slightly adjusted. The new "operator jsonb - text[]" operator is confusingly inconsistent with: A) "operator jsonb text" and: B) the established "operator hstore - text[]" operator, since that operator deletes all key/value pairs that have keys that match any of the right operand text array values. In contrast, this new operator is passed as its right operand an array of text elements that constitute a "path" (so the order in the rhs text[] operand matters). If the text element in the rhs text[] operand happens to be what would pass for a Postgres integer literal, it can be used to traverse lhs array values through subscripting at that nesting level. Regarding nested deletion behavior more generally, consider this example of how this can work out badly: postgres=# select jsonb_delete(jsonb_set('["a"]', '{5}', '"b"'), '{5}') ; jsonb_delete -------------- ["a", "b"] (1 row) Here, we're adding and then deleting an array element at offset 5 (the string "b"). But the element is never deleted by the outer jsonb_delete(), because we can't rely on the element actually being stored at offset 5. Seems a bit fragile. More importantly, consider the inconsistency with "operator jsonb text" ("point A" above): postgres=# select '["a"]'::jsonb ?| '{a}'::text[]; -- historic/9.4 behavior ?column? ---------- t (1 row) postgres=# select '["a"]'::jsonb - '{a}'::text[]; -- new to 9.5 operator, does not delete! ?column? ---------- ["a"] (1 row) Perhaps most questionably of all, the non-array based minus/delete operator (which I like) *does* have the same idea of matching a key as the established "operator jsonb ?| text[]" operator (and "operator jsonb ? text", etc): postgres=# select '["a"]'::jsonb - 'a'::text; -- new to 9.5 operator, *does* delete! ?column? ---------- [] (1 row) This conceptual model for manipulating jsonb is entirely new and novel to this new operator "operator text[]" (and jsonb_set()). "operator jsonb - text[]" categorization/conceptual model ========================================== Operators like the established "operator jsonb -> integer" operator (a jsonb "array-wise" operator) always seemed okay to me because the rhs operand really was a Postgres integer, and because it's explicitly an array-wise operator (just like "operator -> text" is explicitly object-wise). But now, with these new operators, you've added a "shadow type" system to certain rhs text[] operands, consisting of types not explicitly delineated by JSON-style double quotes (for strings, say). So there is kind of a second shadow type system in play, similar to that of jsonb except that text[] "shadow types" cannot be distinguished -- '{0}'::text[] could be intended to affect an array or an object with the key value "0" in one of its pairs. Accordingly, I would vote for changing this, and making the nested deletion stuff only care about object key values and array *string* elements. This backs out of the idea of a new "shadow type" system for text arrays. I think that this is conceptually a lot cleaner, while actually not being significantly less useful for most use cases (nesting tends to involve only objects anyway). As noted in my summary of the previous section, I would also vote for scraping "operator jsonb - text[]" as a jsonb_delete() wrapper (see closing summary below for more on why). While I appreciate that Andrew wanted to make deletion as flexible as possible, these inconsistencies feel arbitrary to me. Closing summary ============= At the very least, some of these new jsonb operators need to decide if they're: 1) Specifically "array-wise" or "object-wise", like the existing operators "operator -> integer", or "operator -> text". or: 2) "Key" operators. Operators that share the "operator jsonb ? text" idea of a key, and operate on jsonb datums accordingly. "Keys" here include array elements that are strings. As I've said, I prefer option 2 for deletion, which is why I like "operator jsonb - text" but not "operator jsonb - text[]", but either way it needs to be *a lot* clearer than it is at the moment. You might also say that there is a category 3 jsonb operator, of which the containment operator ("operator @>") is the best example. It takes jsonb on the rhs, and so naturally cares about types including container types on the lhs. Clearly these new operators are far enough away from category 3 that we cannot really contemplate moving them into category 3, particularly post feature freeze (even though, as I said, I think that would be a superior approach). Some of what I've said here is just my opinion, but I feel pretty confident that for example we don't want to add a fourth "operator category" to jsonb, a weird hybrid between category 2 and category 3. Having 3 "operator categories" seems quite enough. And by making *nested* deletion only possible through a function call to jsonb_delete() (by scrapping "operator jsonb - text[]" as I suggest), you also avoid having an *operator* that is still somewhat not like other operators in category 2 (since the other operators in category 2 don't care about nesting). That feels cleaner -- IMV the oddness of walking a path based on a text[] value ought to be confined to a well named function. Plus you can then add a new "operator jsonb - text[]" that matches "operator hstore - text[]" and comports with the existing "operator jsonb - text". If you don't like my taxonomy for jsonb operators, then feel free to come up with your own. As things stand, it is hard to describe a taxonomy that makes the operators easy to understand and non-surprising -- there'd be more exceptions than rules. I feel we need to be disciplined about this stuff, or jsonb will become much harder to use than it needs to be. Opinions? -- Peter Geoghegan
Attachment
pgsql-hackers by date: