Re: Collation and primary keys - Mailing list pgsql-hackers

From Jeff Davis
Subject Re: Collation and primary keys
Date
Msg-id 9b259f4c532943e428e9665122f37c099bab250e.camel@j-davis.com
Whole thread Raw
In response to Re: Collation and primary keys  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-hackers
On Wed, 2025-07-23 at 13:53 +0200, Daniel Verite wrote:
> > * The libc C.UTF-8 locale was a reasonable default (though not a
> > natural language collation). But now that we have C.UTF-8 available
> > from the builtin provider, then we should encourage that instead of
> > relying on the slower, platform-specific libc implementation.
>
> Yes. In particular, we should encourage the ecosystem to support
> the new collation features so that they're widely available to
> end users.

Then I propose that we change the initdb default to builtin C.UTF-8.
Patch attached.

To get the old initdb behavior use --locale-provider=libc, and all the
other defaults will work as before.

The change would not disrupt upgrades (see commit 9637badd9f).

One annoyance: if your environment has an LC_CTYPE with a non-UTF-8
locale, then initdb forces LC_CTYPE=C and emits a warning.

I had previously tried, and failed, to change the default to ICU for
v16, so it's worth mentioning why I don't believe this proposal will
run into the same problems:

* ICU, while better than libc, didn't completely solve any of the
problems. This proposal completely solves the inconsistent primary key
problem, and is much faster than libc or ICU.

* In the version 16 change, we were still attempting to map environment
variables to ICU locales, which was never going to work very well. In
particular, as you pointed out, ICU has nothing to approximate the
C.UTF-8 locale. The current proposal doesn't attempt that kind of
cleverness.

Comments?

Regards,
    Jeff Davis


Attachment

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Custom pgstat support performance regression for simple queries
Next
From: Tom Lane
Date:
Subject: Re: Fixing MSVC's inability to detect elog(ERROR) does not return