Re: new environment variable INITDB_LOCALE_PROVIDER - Mailing list pgsql-hackers

From Chao Li
Subject Re: new environment variable INITDB_LOCALE_PROVIDER
Date
Msg-id 77D14CC3-27E7-4EAE-811C-4B58C8C112A5@gmail.com
Whole thread Raw
In response to Re: new environment variable INITDB_LOCALE_PROVIDER  (Jeff Davis <pgsql@j-davis.com>)
List pgsql-hackers


On Oct 11, 2025, at 10:06, Jeff Davis <pgsql@j-davis.com> wrote:

On Sat, 2025-10-11 at 08:30 +0800, Chao Li wrote:
* If we make that fail, I don’t think that would break existing
scripts. Because the default provider is libc and you are introducing
a new environment variable to set locale provider, thus a plain
initdb will not use builtin provider. Maybe provider can come from
PG_TEST_INITDB_EXTRA_OPTS, I'm ok for test environment to only only
issue warnings.

I would like it to be possible to change the initdb default in the
future to "builtin". See:

https://www.postgresql.org/message-id/e4ac16908dad3eddd3ed73c4862591375a3f0539.camel@j-davis.com

in that case, initdb should be able to succeed without other options.

Yes, if we decide to along with that path, then what I talked would no longer be valid.


* I am thinking loudly. Builtin provider is more performant but with
certain limitations. Some production users may want to try builtin
provider for better performance but not being aware of the
limitation. Their environment contains the actual LC_CTYPE/LC_COLLATE
they want to use, and they set the new environment variable with
“builtin” for provider. In this case, failing “initdb” would make the
user clearly realize the limitation of builtin provider. Otherwise,
if the user also ignores the warning messages, then the database
would be created with unexpected ctype, which would lead to loss
(time, data, etc.)

What limitation and/or loss are you concerned about?


For limitation of builtin provide, I just meant it supports less LC_CTYPE/LC_COLLATE than the other two providers.

I wasn’t concerned about anything, I was just imaging if anything could get a negative impact. 

Unless I'm mistaken, LC_CTYPE has very little practical effect when the
provider is builtin and the encoding is UTF-8.

The main effect that I'm aware of is that system errors from the OS
rely on LC_CTYPE for translation. Ordinary Postgres messages don't need
LC_CTYPE, so most of NLS still works even with LC_CTYPE=C; it's just
strerror() that depends on LC_CTYPE for the encoding.

LC_CTYPE also affects full text search parsing, but I'm fixing that as
part of another patch to use the database locale instead.

I think contrib/fuzzystrmatch may be affected.

Callers of pg_strcasecmp() could be affected, but it's mostly used to
compare with ascii anyway.

If you are aware of other areas, please let me know.


Thanks for the explanation. I think I am good now. The latest v3 patch looks good to me.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options
Next
From: Chao Li
Date:
Subject: Re: Add RESPECT/IGNORE NULLS and FROM FIRST/LAST options