On Fri, 2025-10-10 at 17:48 -0700, Jeff Davis wrote:
> -------
> Summary
> -------
>
> The libc collation provider is a bad default[1]. The builtin
> collation
> provider is a good default, so let's use that.
The attached patches implement a more modest proposal which does not
conflict with Peter's objection about the display order:
0001: If the encoding is unspecified, and cannot be determined from the
locale (i.e. the locale is C), then use UTF-8 rather than SQL_ASCII.
0002: If the provider is unspecified, and the locale is C or C.UTF-8,
then use the builtin provider.
Motivation:
* UTF-8 seems safer than SQL_ASCII when the locale is compatible with
either.
* Whether the "C" locale uses the builtin provider or the libc provider
is mostly about the catalog representation, because the implementation
is the same. I don't have a strong motivation for this change, it just
clarifies that libc is not actually being used when the locale is "C".
* I think most users of the "C.UTF-8" locale would be better off with
the builtin provider, which benefits from important optimizations.
Note:
This would mean that "initdb --no-locale" would select UTF-8 and the
builtin provider with locale "C", whereas previously it would have
selected SQL_ASCII and the libc provider (though it didn't ever really
use libc internally). I'm not sure if others want this behavior or if
it would be surprising.
Regards,
Jeff Davis