Re: Initcap works differently with different locale providers - Mailing list pgsql-docs

From Oleg Tselebrovskiy
Subject Re: Initcap works differently with different locale providers
Date
Msg-id cdfa64230784d7e330c1a2a55237b94e@postgrespro.ru
Whole thread Raw
In response to Re: Initcap works differently with different locale providers  (Jeff Davis <pgsql@j-davis.com>)
Responses Re: Initcap works differently with different locale providers
List pgsql-docs
Jeff Davis wrote at 2025-07-31 02:58:

Apologies for the late answer to a review

> First, it doesn't mention the "builtin" provider, which uses the same
> word break rules as libc.

Completely forgot about builtin provider in the first patch, my bad

> Second, word boundaries can be complex, and I'm wondering if we should
> not be so precise about what ICU does or doesn't do. For instance, ICU
> has options like U_TITLECASE_ADJUST_TO_CASED,
> U_TITLECASE_NO_BREAK_ADJUSTMENT, etc., and I'm not sure exactly
> which one of those we use.

While [1] describes the default word boundary rules and could be useful
as a starting point, I agree that in reality it probably is more
complicated. I didn't exactly find any place where
U_TITLECASE_ADJUST_TO_CASED and alike are set in non-test code, but
U_TITLECASE_ADJUST_TO_CASED was used as a default prior to ICU 60,
so initcap() will also behave differently depending on ICU version

> I'd prefer that we try to explain that INITCAP() is intended for
> convenient display, and the specific result should not be relied upon
> (at least for ICU; maybe for all providers). If you want specific word
> boundary rules, write your own function.

First patch just adds this warning about not relying on initcap() exact
result. The second one is the same, but removes the part "what is a 
word"
since it's could be moot because we recommend writing custom functions,
so understanding what is a word is not exactly needed. Still on the 
fence
about which patch is better, though

Thoughts?

[1]: https://www.unicode.org/reports/tr29/#Word_Boundaries

Regards, Oleg Tselebrovskiy
Attachment

pgsql-docs by date:

Previous
From: Fujii Masao
Date:
Subject: Make pgoutput documentation easier to find
Next
From: Álvaro Herrera
Date:
Subject: Re: Lets prohibit predicting the future in the documentation.