Re: [HACKERS] ICU integration - Mailing list pgsql-hackers
From | Peter Eisentraut |
---|---|
Subject | Re: [HACKERS] ICU integration |
Date | |
Msg-id | 5291804b-169e-3ba9-fdaf-fae8e7d2d959@2ndquadrant.com Whole thread Raw |
In response to | Re: [HACKERS] ICU integration (Andreas Karlsson <andreas@proxel.se>) |
Responses |
Re: [HACKERS] ICU integration
|
List | pgsql-hackers |
Updated patch attached. On 3/14/17 22:15, Andreas Karlsson wrote: > I do not like the schema based solution since search_path already gives > us enough headaches. As for the naming I am fine with the current scheme. Yeah, it seems we're going to settle on the suffix idea. See below for an idea that builds on that. > - I get a test failures in the default test suite due to not having the > tr_TR locale installed. I would assume that this would be pretty common > for hackers. I have removed that test. It seems it's not possible to test that portably without major contortions. > - The code no longer compiles without HAVE_LOCALE_T. Fixed that. > - I do not like how it is not obvious which is the default version of > every locale. E.g. I believe that "sv%icu" is "sv_reformed%icu" and not > "sv_standard%icu" as one might expect. Is this inherent to ICU or > something we can work around? We get these keywords from ucol_getKeywordValuesForLocale(), which says "Given a key and a locale, returns an array of string values in a preferred order that would make a difference." So all those are supposedly different from each other. > - ICU talks about a new syntax for locale extensions (old: > de_DE@collation=phonebook, new: de_DE_u_co_phonebk) at this page > http://userguide.icu-project.org/collation/api. Is this something we > need to car about? It looks like we currently use the old format, and > while I personally prefer it I am not sure we should rely on an old syntax. Interesting. I hadn't see this before, and the documentation is sparse. But it seems that this refers to BCP 47 language tags, which seem like a good idea. So what I have done is change all the predefined ICU collations to be named after the BCP 47 scheme, with a "private use" -x-icu suffix (instead of %icu). The preserves the original idea but uses a standard naming scheme. I'm not terribly worried that they are going to remove the "old" locale naming, but just to be forward looking, I have changed it so that the collcollate entries are made using the "new" naming for ICU >=54. > - I get an error when creating a new locale. > > #CREATE COLLATION sv2 (LOCALE = 'sv'); > ERROR: could not create locale "sv": Success > > # CREATE COLLATION sv2 (LOCALE = 'sv'); > ERROR: could not create locale "sv": Resource temporarily unavailable > Time: 1.109 ms Hmm, that's pretty straightforward code. What is your operating system? What are the build options? Does it work without this patch? > - For the collprovider is it really correct to say that 'd' is the > default value as it does in catalogs.sgml? It doesn't say it's the default value, it says it uses the database default. This is all a bit confusing. We have a collation object named "default", which uses the locale set for the database. That's been that way for a while. Now when introducing the collation providers, that "default" collation gets its own collprovider category 'd'. That is not really used anywhere, but we have to put something there. > - I do not know what the policy for formatting the documentation is, but > some of the paragraphs are in need of re-wrapping. Hmm, I don't see anything terribly bad. > - Add a hint to "ERROR: conflicting or redundant options". The error > message is pretty unclear. I don't see that in my patch. Example? > - I am not a fan of this patch comparing the encoding with a -1 literal. > How about adding -1 as a value to the enum? See the example below for a > place where the patch compares with -1. That's existing practice. Not a great practice, probably, but widespread. > - The patch adds "FIXME" in the below. Is that a left-over from > development or something which still needs doing? > > /* > * Also forbid matching an any-encoding entry. This test of course > is not > * backed up by the unique index, but it's not a problem since we don't > - * support adding any-encoding entries after initdb. > + * support adding any-encoding entries after initdb. FIXME > */ I had mentioned that upthread. It technically needs "doing" as you say, but it's not clear how and it's not terribly important, arguably. > - Should functions like normalize_locale_name() be renamed to indicate > they relate to libc locales? I am leaning towards doing so but have not > looked closely at the task. Renamed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
pgsql-hackers by date: