Re: Order changes in PG16 since ICU introduction - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Order changes in PG16 since ICU introduction
Date
Msg-id f8d09d8f3d53daa9cdb446d021fe33d6ff7f1ee3.camel@j-davis.com
Whole thread Raw
In response to Re: Order changes in PG16 since ICU introduction  ("Daniel Verite" <daniel@manitou-mail.org>)
Responses Re: Order changes in PG16 since ICU introduction
Re: Order changes in PG16 since ICU introduction
List pgsql-hackers
On Tue, 2023-06-06 at 15:09 +0200, Daniel Verite wrote:
> FWIW I don't quite see how 0001 improve things or what problem it's
> trying to solve.

The word "locale" is generic, so we need to make LOCALE/--locale apply
to whatever provider is being used. If "locale" only applies to libc,
using ICU will always be confusing and never be on the same level as
libc, let alone the preferred provider.

The locale "C" is a special case, documented as a non-locale. So, if
LOCALE/--locale apply to ICU, then either ICU needs to handle locale
"C" in the expected way (v8 patch series); or when we see locale "C" we
need to somehow change the provider into something that can handle it
(v6 patch series changes it to the "none" provider).

Please let me know if you disagree with the goal or the reasoning here.
If so, please explain where you think we should end up, because the
status quo does not seem great to me.

> 0001 creates exceptions throughout the code so that when an ICU
> collation has a locale name "C" or "POSIX" then it does not behave
> like an ICU collation, even though pg_collation.collprovider='i'
> To me it's neither desirable nor necessary that a collation that
> has collprovider='i' is diverted to non-ICU semantics.

It's not very principled, but it matches what libc does.

> Also in the current state, this diversion does not apply to initdb.
>
> "initdb --icu-locale=C" with 0001 applied reports this:
>
>    Using language tag "en-US-u-va-posix" for ICU locale "C".

Thank you. I fixed it by skipping the canonicalization for C/POSIX
locales in initdb.

> Could you elaborate a bit more on what 0001 is meant to achieve, from
> the point of view of the user?

It makes it so the user consistently (regardless of the provider) gets
the "no locale" behavior (as documented and historically expected) when
they specify the C or POSIX locales.

Then that enables us to change LOCALE/--locale to apply to ICU, which
means that a simple command like "initdb --locale=en_US" does a
sensible thing regardless of the default provider.

I understand you are skeptical of trying to apply an arbitrary locale
name to ICU, but if they don't specify the provider, what do you expect
to happen?


--
Jeff Davis
PostgreSQL Contributor Team - AWS



Attachment

pgsql-hackers by date:

Previous
From: Joe Conway
Date:
Subject: Re: Order changes in PG16 since ICU introduction
Next
From: Joe Conway
Date:
Subject: Re: Order changes in PG16 since ICU introduction