On 20.01.26 19:36, Matt Magoffin wrote:
> I am using Postgres 17 and trying to configure a collation that sorts upper case before lower case and includes
numericsorting:
>
> CREATE COLLATION testsort (provider = icu, locale = 'und-u-kf-upper-kn’);
>
> These comparisons are working as I expected:
>
> SELECT 'id-45' < 'id-123' COLLATE testsort; -- true (45 before 123)
>
> SELECT 'id' < 'ID' COLLATE testsort; -- false (upper case before lower case)
>
> However combining them resulted in an unexpected result:
>
> SELECT 'id-45' < 'ID-123' COLLATE testsort; -- true
>
> I thought that last one would be false because “ID” would come before “id”. Is there a way to configure the collation
toachieve that? I’m trying to match the sorting behaviour in external application code.
I suspect that this is because the effect of the numeric sorting is a
primary difference and the case difference is only a tertiary difference.
In other words, imagine the numeric sorting pass replacing all numbers
by hypothetical letters corresponding to the numeric order, like
'id-45' -> 'id-X'
'id-123' -> 'id-Z'
'ID-123' -> 'ID-Z'
Then you would have
'id-45' < 'ID-123' =>
'id-X' < 'ID-Z'
which would be correct.
This is just my guess from the outside. The numeric sorting is not a
part of the Unicode Collation Algorithm standard, it is an extension by
ICU, so one would have to dig into the code or documentation there, but
I didn't find anything.
I don't know if there is a way to customize this further to get the
effect you want. Maybe you could reach out to an ICU support forum to
get more expert insights there.