Re: fixing tsearch locale support - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: fixing tsearch locale support
Date
Msg-id 0c086b8e-9075-4e47-8336-3f9a71102df8@eisentraut.org
Whole thread Raw
List pgsql-hackers
On 09.12.24 11:11, Peter Eisentraut wrote:
> lowerstr() and lowerstr_with_len() in ts_locale.c do the same thing as 
> str_tolower(), except that the former don't use the common locale 
> provider framework but instead use the global libc locale settings.
> 
> This patch replaces uses of lowerstr*() with str_tolower(..., 
> DEFAULT_COLLATION_OID).  For instances that use a libc locale globally, 
> this will result in exactly the same behavior.  For instances that use 
> other locale providers, you now get consistent behavior and are no 
> longer dependent on the libc locale settings.
> 
> Most uses of these functions are for processing dictionary and 
> configuration files.  In those cases, using the default collation seems 
> appropriate.  At least we don't have a more specific collation 
> available.  But the code in contrib/pg_trgm should really depend on the 
> collation of the columns being processed.  This is not done here, this 
> can be done in a separate patch.
> 
> (You can probably construct some edge cases where this change would 
> create some locale-related upgrade incompatibility, for example if 
> before you used a combination of ICU and a differently-behaving libc 
> locale.  We can document this in the release notes, but I don't think 
> there is anything more we can do about this.)

There is a PG18 open item to document this possible upgrade incompatibility.

I think the following text could be added to the release notes:

"""
The locale implementation underlying full-text search was improved.  It 
now observes the locale provider configured for the database.  It was 
previously hardcoded to use the configured libc LC_CTYPE setting.  In 
database clusters that use a locale provider other than libc and where 
the locale configured through that locale provider behaves differently 
from the LC_CTYPE setting configured for the database, this could cause 
changes in behavior of some functions related to full-text search as 
well as the pg_trgm extension.  When upgrading such database clusters 
using pg_upgrade, it is recommended to reindex all indexes related to 
full-text search and pg_trgm after the upgrade.
"""

The commit reference is fb1a18810f0.

Thoughts?




pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Logical Replication of sequences
Next
From: Kirill Reshke
Date:
Subject: Re: VM corruption on standby