Re: Ltree syntax improvement - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Ltree syntax improvement |
Date | |
Msg-id | 31385.1579634002@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Ltree syntax improvement (Dmitry Belyavsky <beldmit@gmail.com>) |
Responses |
Re: Ltree syntax improvement
|
List | pgsql-hackers |
Dmitry Belyavsky <beldmit@gmail.com> writes: > If the C part will be reviewed and considered mergeable, I'll update the > plpython tests. I haven't looked at any of the code involved in this, but I did examine the "failing" plpython test, and I'm quite distressed about that change of behavior. Simply removing the test case is certainly not okay, and I do not think that just changing it to accept this new behavior is okay either. As Nikita said upthread: >> 7. ltree_plpython test does not fail now because Python list is converted to a >> text and then to a ltree, and the textual representation of a Python list has >> become a valid ltree text: >> >> SELECT $$['foo', 'bar', 'baz']$$::ltree; >> ltree >> ------------------------- >> "['foo', 'bar', 'baz']" >> (1 row) >> >> So Python lists can be now successfully converted to ltrees without a transform, >> but the result is not that is expected ('foo.bar.baz'::ltree). If this case doesn't throw an error, then we're going to have a compatibility problem whenever somebody finally gets around to implementing the python-to-ltree transform properly, because it would break any code that might be relying on this (wrong) behavior. In general, I think it's a mistake to allow unquoted punctuation to be taken as part of an ltree label, which is what this patch is evidently doing. By doing that, you'll make it impossible for anyone to ever again extend the ltree syntax, because if they want to assign special meaning to braces or whatever, they can't do so without breaking existing applications. For example, if the existing code allowed double-quote or backslash as a label character, we'd already have rejected this patch as being too big a compatibility break. So it's not very forward-thinking to close off future improvements like this. Thus, what I think you should do is require non-alphanumeric label characters to be quoted, either via double-quotes or backslashes (although it's questionable whether we really need two independent quoting mechanisms here). That would preserve extensibility, and it'd also preserve our freedom to fix ltree_plpython later, since the case of interest would still be an error for now. And it would mean that you don't have subtly different rules for what's data in ltree versus what's data in lquery or ltxtquery. BTW, the general rule in existing backend code that's doing string parsing is to allow non-ASCII (high-bit-set) characters to be taken as data without inquiring too closely as to what they are. This avoids a bunch of locale and encoding issues without much loss of flexibility. (If we do ever extend the ltree syntax again, we'd certainly choose ASCII punctuation characters for whatever special symbols we need, else the feature might not be available in all encodings.) So for instance in your examples involving "Ñ", it's fine to take that as a label character without concern for locale/encoding. I'm not sure what I think about the whitespace business. It looks like what you propose is to strip unquoted leading and trailing whitespace but allow embedded whitespace. There's precedent for that, certainly, but I wonder whether it isn't too confusing. In any case you didn't document that. regards, tom lane
pgsql-hackers by date: