Thread: CREATE DATABASE command for non-libc providers
From the discussion here: https://www.postgresql.org/message-id/CAFCRh--rtqbOBpJYFDmPD9kYCYxsxKpLW7LHxYMYhHXa2XoStw@mail.gmail.com the CREATE DATABASE command has a tendency to throw errors in confusing ways when using non-libc providers. I have attached a patch 0001 that fixes a misleading hint, but it's still not great. When using ICU or the builtin provider, it still requires coming up with some valid locale name for LC_COLLATE and LC_CTYPE, even though those have little or no effect. And because LOCALE is the fallback when LC_COLLATE and/or LC_CTYPE are unspecified, it's confusing to the user because they aren't even trying to specify a libc locale name at all. The solution, as I see it, is: * Force the environment variables LC_COLLATE=C and LC_CTYPE=C unconditionally, and pg_perm_setlocale() them. This requires closing a few loose ends, but it should be doable[1]. Even the libc provider uses the "_l()" functions already, and no longer depends on setlocale(). * When datlocprovider<>'c', force datcollate and datctype to NULL. * If the user specifies LC_CTYPE or LC_COLLATE to CREATE DATABASE, and the provider is not libc, then ignore LC_COLLATE/LC_CTYPE and emit a WARNING, rather than trying to set it based on LOCALE and getting an error. Regards, Jeff Davis [1] https://www.postgresql.org/message-id/cd3517c7-ddb8-454e-9dd5-70e3d84ff6a2%40eisentraut.org
Attachment
Jeff Davis wrote: > I have attached a patch 0001 that > fixes a misleading hint, but it's still not great. +1 for the patch > When using ICU or the builtin provider, it still requires coming up > with some valid locale name for LC_COLLATE and LC_CTYPE No, since the following invocation does work: CREATE DATABASE test template='template0' locale_provider='builtin' builtin_locale='C.UTF-8'; Here we let 'locale' or 'lc_collate/lc_ctype' which is the same thing, defaulting from the template database. In the discussion you mentioned, the error comes from the OP using 'locale' instead of 'builtin_locale'. At least that's my understanding. This mistake is not surprising, because when you specify a locale provider followed by a locale, intuitively you'd expect this locale to refer to that locale provider. Yet that's not case, mostly for backward compatibility reasons. > * Force the environment variables LC_COLLATE=C and LC_CTYPE=C > unconditionally, and pg_perm_setlocale() them Currently that would be a regression for some people, because when LC_CTYPE=C, the FTS parser produces substandard results with characters beyond ASCII. Best regards, -- Daniel Vérité https://postgresql.verite.pro/
On Fri, 2025-06-06 at 22:03 +0200, Daniel Verite wrote: > +1 for the patch Thank you, committed. > > Here we let 'locale' or 'lc_collate/lc_ctype' which is the same > thing, > defaulting from the template database. Right, in the normal case it's OK, but if anything goes wrong, it gets fairly confusing. > > * Force the environment variables LC_COLLATE=C and LC_CTYPE=C > > unconditionally, and pg_perm_setlocale() them > > Currently that would be a regression for some people, because > when LC_CTYPE=C, the FTS parser produces substandard results with > characters beyond ASCII. In the other thread, I posted a patch: https://www.postgresql.org/message-id/a1396f17f462ee6561820f755caaf2d12eb9fd15.camel%40j-davis.com for the callers that rely on datctype (regardless of datlocprovider), they access the locale_t through a global, and use the "_l" variants. There should be no behavior change, and we still need to set LC_CTYPE, so you are right that it's not a solution yet. I think it moves us in the right direction, though. If nothing else, we can easily identify the places that have behavior dependent on datctype, and I could have offered a more clear reply to the user. Regards, Jeff Davis