Re: ICU integration - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Re: ICU integration |
Date | |
Msg-id | f0adba2a-0846-cbbc-06eb-0f938c119aa9@2ndquadrant.com Whole thread Raw |
In response to | Re: ICU integration (Doug Doole <ddoole@salesforce.com>) |
Responses |
Re: ICU integration
|
List | pgsql-hackers |
On 8/31/16 4:24 PM, Doug Doole wrote: > ICU explicitly does not provide stability in their locales and collations. We pushed them hard to provide this, but betweenchanges to the CLDR data and changes to the ICU code it just wasn’t feasible for them to provide version to versionstability. > > What they do offer is a compile option when building ICU to version all their APIs. So instead of calling icu_foo() you’dcall icu_foo46(). (Or something like this - it’s been a few years since I actually worked with the ICU code.) This ultimatelyallows you to load multiple versions of the ICU library into a single program and provide stability by callingthe appropriate version of the library. (Unfortunately, the OS - at least my Linux box - only provides the genericversion of ICU and not the version annotated APIs, which means a separate compile of ICU is needed.) > > The catch with this is that it means you likely want to expose the version information. In another note it was suggestedto use something like fr_FR%icu. If you want to pin it to a specific version of ICU, you’ll likely need somethinglike fr_FR%icu46. (There’s nothing wrong with supporting fr_FR%icu to give users an easy way of saying “give methe latest and greatest”, but you’d probably want to harden it to a specific ICU version internally.) There are multiple things going on. Collations in ICU are versioned. You can find out the version of the collation you are currently using using an API call. A collation version does not change during the life of a single version of ICU. But it might well change in the next version of ICU, as bugs are fixed and things are refined. There is no way in the API to call for a collation of a specific version, since there is only one version of a collation in a specific installation of ICU. So my implementation is that we store the version of the collation in the catalog when we create the collation, and if we later on find at run time that the collation is of a different version, we warn about it. The ICU ABI (not API) is also versioned. The way that this is done is that all functions are actually macros to a versioned symbol. So ucol_open() is actually a macro that expands to, say, ucol_open_57() in ICU version 57. (They also got rid of a dot in their versions a while ago.) It's basically hand-crafted symbol versioning. That way, you can link with multiple versions of ICU at the same time. However, the purpose of that, as I understand it, is so that plugins can have a different version of ICU loaded than the main process or another plugin.In terms of postgres using the right version of ICU,it doesn't buy anything beyond what the soname mechanism does. >> + if (numversion != collform->collversion) >> + ereport(WARNING, >> + (errmsg("ICU collator version mismatch"), >> + errdetail("The database was created using >> version 0x%08X, the library provides version 0x%08X.", >> + (uint32) collform->collversion, >> (uint32) numversion), >> + errhint("Rebuild affected indexes, or build >> PostgreSQL with the right version of ICU."))); >> >> So you still need to manage this carefully, but at least you have a >> chance to learn about it. > > Indexes are the obvious place where collation comes into play, and are relatively easy to address. But consider all theplaces where string comparisons can be done. For example, check constraints and referential constraints can depend onstring comparisons. If the collation rules change because of a new version of ICU, the database can become inconsistentand will need a lot more work than an index rebuild. We can refine the guidance. But indexes are the most important issue, I think, because changing the sorting rules in the background makes data silently disappear. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: