Re: Ltree syntax improvement - Mailing list pgsql-hackers
From | Nikita Glukhov |
---|---|
Subject | Re: Ltree syntax improvement |
Date | |
Msg-id | 38c5bbeb-c3eb-a016-0b8f-907461bbd0fc@postgrespro.ru Whole thread Raw |
In response to | Re: Ltree syntax improvement (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Ltree syntax improvement
|
List | pgsql-hackers |
Attached new version of the patch. I did major refactoring of ltree label parsing, extracting common parsing code for ltree, lquery, and ltxtquery. This greatly simplified state machines. On the advice of Tomas Vondra, I also extracted a preliminary patch with 'if' to 'switch' conversion.
On 21.01.2020 22:13, Tom Lane wrote:
Dmitry Belyavsky <beldmit@gmail.com> writes:If the C part will be reviewed and considered mergeable, I'll update the plpython tests.I haven't looked at any of the code involved in this, but I did examine the "failing" plpython test, and I'm quite distressed about that change of behavior. Simply removing the test case is certainly not okay, and I do not think that just changing it to accept this new behavior is okay either. As Nikita said upthread:7. ltree_plpython test does not fail now because Python list is converted to a text and then to a ltree, and the textual representation of a Python list has become a valid ltree text: SELECT $$['foo', 'bar', 'baz']$$::ltree; ltree ------------------------- "['foo', 'bar', 'baz']" (1 row) So Python lists can be now successfully converted to ltrees without a transform, but the result is not that is expected ('foo.bar.baz'::ltree).If this case doesn't throw an error, then we're going to have a compatibility problem whenever somebody finally gets around to implementing the python-to-ltree transform properly, because it would break any code that might be relying on this (wrong) behavior. In general, I think it's a mistake to allow unquoted punctuation to be taken as part of an ltree label, which is what this patch is evidently doing. By doing that, you'll make it impossible for anyone to ever again extend the ltree syntax, because if they want to assign special meaning to braces or whatever, they can't do so without breaking existing applications. For example, if the existing code allowed double-quote or backslash as a label character, we'd already have rejected this patch as being too big a compatibility break. So it's not very forward-thinking to close off future improvements like this. Thus, what I think you should do is require non-alphanumeric label characters to be quoted, either via double-quotes or backslashes (although it's questionable whether we really need two independent quoting mechanisms here). That would preserve extensibility, and it'd also preserve our freedom to fix ltree_plpython later, since the case of interest would still be an error for now. And it would mean that you don't have subtly different rules for what's data in ltree versus what's data in lquery or ltxtquery.
Now non-alphanumeric label characters should be escaped in ltree, lquery and ltxtquery. Plpython tests does not require changes now.
BTW, the general rule in existing backend code that's doing string parsing is to allow non-ASCII (high-bit-set) characters to be taken as data without inquiring too closely as to what they are. This avoids a bunch of locale and encoding issues without much loss of flexibility. (If we do ever extend the ltree syntax again, we'd certainly choose ASCII punctuation characters for whatever special symbols we need, else the feature might not be available in all encodings.) So for instance in your examples involving "Ñ", it's fine to take that as a label character without concern for locale/encoding.
I'm not sure what I think about the whitespace business. It looks like what you propose is to strip unquoted leading and trailing whitespace but allow embedded whitespace. There's precedent for that, certainly, but I wonder whether it isn't too confusing. In any case you didn't document that.
Embedded whitespace should also be escaped now. I'm also not sure about stripping unquoted leading and trailing whitespace.--
Attachment
pgsql-hackers by date: