Home > mailing lists

Re: Support LIKE with nondeterministic collations - Mailing list pgsql-hackers

From	jian he
Subject	Re: Support LIKE with nondeterministic collations
Date	November 15, 2024 07:26:24
Msg-id	CACJufxFeOuBbkHfp=0-0rwamydjYY4ky1A+CPr6s3WUABC9_Rg@mail.gmail.com Whole thread
In response to	Re: Support LIKE with nondeterministic collations ("Daniel Verite" <daniel@manitou-mail.org>)
Responses	Re: Support LIKE with nondeterministic collations
List	pgsql-hackers

Tree view

On Tue, Nov 12, 2024 at 3:45 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 11.11.24 14:25, Heikki Linnakangas wrote:
> > Sadly the algorithm is O(n^2) with non-deterministic collations.Is there
> > any way this could be optimized? We make no claims on how expensive any
> > functions or operators are, so I suppose a slow implementation is
> > nevertheless better than throwing an error.
>
> Yeah, maybe someone comes up with new ideas in the future.
>

/*
* Now build a substring of the text and try to match it against
* the subpattern.  t is the start of the text, t1 is one past the
* last byte.  We start with a zero-length string.
*/
t1 = t
t1len = tlen;
for (;;)
{
int cmp;
CHECK_FOR_INTERRUPTS();
cmp = pg_strncoll(subpat, subpatlen, t, (t1 - t), locale);

select '.foo.' LIKE '_oo' COLLATE ign_punct;
pg_strncoll's iteration of the first 4 argument values.
oo      2       foo. 0
oo      2       foo. 1
oo      2       foo. 2
oo      2       foo. 3
oo      2       foo. 4

seems there is a shortcut/optimization.
if subpat don't have wildcard(percent sign, underscore)
then we can have less pg_strncoll calls?

minimum case to trigger error within GenericMatchText
since no related tests.
create table t1(a text collate case_insensitive, b text collate "C");
insert into t1 values ('a','a');
select a like b from t1;

at 9.7.1. LIKE  section, we still don't know what "wildcard" is.
we mentioned it at 9.7.2.
maybe we can add a sentence at the end of:
    <para>
     If <replaceable>pattern</replaceable> does not contain percent
     signs or underscores, then the pattern only represents the string
     itself; in that case <function>LIKE</function> acts like the
     equals operator.  An underscore (<literal>_</literal>) in
     <replaceable>pattern</replaceable> stands for (matches) any single
     character; a percent sign (<literal>%</literal>) matches any sequence
     of zero or more characters.
    </para>

saying underscore and percent sign are wildcards in LIKE.
other than that, I can understand the doc.

pgsql-hackers by date:

From: Peter Smith
Date: 15 November 2024, 06:36:08
Subject: Re: Improve the error message for logical replication of regular column to generated column.

From: Pavel Stehule
Date: 15 November 2024, 07:45:58
Subject: Re: proposal: schema variables

Re: Support LIKE with nondeterministic collations - Mailing list pgsql-hackers

Previous

Next