Home > mailing lists

Re: Ltree syntax improvement - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Ltree syntax improvement
Date	January 21, 2020 19:13:22
Msg-id	31385.1579634002@sss.pgh.pa.us Whole thread Raw
In response to	Re: Ltree syntax improvement (Dmitry Belyavsky <beldmit@gmail.com>)
Responses	Re: Ltree syntax improvement
List	pgsql-hackers

Tree view

Dmitry Belyavsky <beldmit@gmail.com> writes:
> If the C part will be reviewed and considered mergeable, I'll update the
> plpython tests.

I haven't looked at any of the code involved in this, but I did examine
the "failing" plpython test, and I'm quite distressed about that change
of behavior.  Simply removing the test case is certainly not okay,
and I do not think that just changing it to accept this new behavior
is okay either.  As Nikita said upthread:

>> 7. ltree_plpython test does not fail now because Python list is converted to a
>> text and then to a ltree, and the textual representation of a Python list has
>> become a valid ltree text:
>>
>> SELECT $$['foo', 'bar', 'baz']$$::ltree;
>>            ltree
>> -------------------------
>>   "['foo', 'bar', 'baz']"
>> (1 row)
>>
>> So Python lists can be now successfully converted to ltrees without a transform,
>> but the result is not that is expected ('foo.bar.baz'::ltree).

If this case doesn't throw an error, then we're going to have a
compatibility problem whenever somebody finally gets around to
implementing the python-to-ltree transform properly, because it
would break any code that might be relying on this (wrong) behavior.

In general, I think it's a mistake to allow unquoted punctuation to be
taken as part of an ltree label, which is what this patch is evidently
doing.  By doing that, you'll make it impossible for anyone to ever
again extend the ltree syntax, because if they want to assign special
meaning to braces or whatever, they can't do so without breaking
existing applications.  For example, if the existing code allowed
double-quote or backslash as a label character, we'd already have
rejected this patch as being too big a compatibility break.  So it's
not very forward-thinking to close off future improvements like this.

Thus, what I think you should do is require non-alphanumeric label
characters to be quoted, either via double-quotes or backslashes
(although it's questionable whether we really need two independent
quoting mechanisms here).  That would preserve extensibility, and
it'd also preserve our freedom to fix ltree_plpython later, since
the case of interest would still be an error for now.  And it would
mean that you don't have subtly different rules for what's data in
ltree versus what's data in lquery or ltxtquery.

BTW, the general rule in existing backend code that's doing string
parsing is to allow non-ASCII (high-bit-set) characters to be taken as
data without inquiring too closely as to what they are.  This avoids a
bunch of locale and encoding issues without much loss of flexibility.
(If we do ever extend the ltree syntax again, we'd certainly choose
ASCII punctuation characters for whatever special symbols we need,
else the feature might not be available in all encodings.)  So for
instance in your examples involving "Ñ", it's fine to take that as a
label character without concern for locale/encoding.

I'm not sure what I think about the whitespace business.  It looks
like what you propose is to strip unquoted leading and trailing
whitespace but allow embedded whitespace.  There's precedent for that,
certainly, but I wonder whether it isn't too confusing.  In any case
you didn't document that.

            regards, tom lane

pgsql-hackers by date:

From: Pavel Stehule
Date: 21 January 2020, 18:51:17
Subject: Re: [Proposal] Global temporary tables

From: Tom Lane
Date: 21 January 2020, 19:17:53
Subject: Re: Protect syscache from bloating with negative cache entries

Re: Ltree syntax improvement - Mailing list pgsql-hackers

Previous

Next