Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values - Mailing list pgsql-bugs
From | Peter Geoghegan |
---|---|
Subject | Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values |
Date | |
Msg-id | CAH2-WzmeMSgjO-TocH7+v7vuwzv8C77eqGcTCTGSRattOK2ZLw@mail.gmail.com Whole thread Raw |
In response to | Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Re: [BUGS] Crash report for some ICU-52 (debian8) COLLATE andwork_mem values |
List | pgsql-bugs |
On Mon, Aug 14, 2017 at 4:11 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Peter Geoghegan <pg@bowt.ie> writes: >> Do we really need to pass "if_not_exists = true", anyway? Why >> shouldn't initdb fail if there are apparent duplicate ICU collations? > > Because the villagers will come after you; see commit > 0b13b2a7712b6f91590b7589a314240a14520c2f. Fair enough. Changing it is actually unnecessary anyway. pg_import_system_collations() should simply use uloc_countAvailable() + uloc_getAvailable, rather than ucol_countAvailable() + ucol_getAvailable(). Just like my test program. FYI, without the keyword variants, but with all base locales, ICU 55 adds 684 base locales to pg_collation. Should I write a real patch to make the changes I deem necessary, or do you prefer to Peter E.? The comments that pg_import_system_collations() adds to each locale have an annoying tendency to exclude anything that isn't ascii-safe, which is much more noticeable when locales populate pg_collation instead of collations: ...foo │ fr-BE-x-icu │ fr-BE │ fr-BE │ icu │ French (Belgium)foo │ fr-BF-x-icu │ fr-BF │ fr-BF │ icu │ French (Burkina Faso)foo │ fr-BI-x-icu │ fr-BI │ fr-BI │ icu │ French (Burundi)foo │ fr-BJ-x-icu │ fr-BJ │ fr-BJ │ icu │ French (Benin)foo │ fr-BL-x-icu │ fr-BL │ fr-BL │ icu │foo │ fr-CA-x-icu │ fr-CA │ fr-CA │ icu │ French (Canada)foo │ fr-CD-x-icu │ fr-CD │ fr-CD │ icu │ French (Congo - Kinshasa)foo │ fr-CF-x-icu │ fr-CF │ fr-CF │ icu │ French (Central African Republic)foo │ fr-CG-x-icu │ fr-CG │ fr-CG │ icu │ French (Congo - Brazzaville)foo │ fr-CH-x-icu │ fr-CH │ fr-CH │ icu │ French (Switzerland)foo │ fr-CI-x-icu │ fr-CI │ fr-CI │ icu │foo │ fr-CM-x-icu │ fr-CM │ fr-CM │ icu │ French (Cameroon)foo │ fr-DJ-x-icu │ fr-DJ │ fr-DJ │ icu │ French (Djibouti) ... Notice the lack of descriptions here, just because they weren't ascii-safe. I now wonder if we should even keep the comment thing at all. We can eventually add an SQL function, that is called by \dOS+, that interrogates ICU for a description on the fly. No need to store anything. Another thing that I dislike about get_icu_locale_comment() is that the descriptions we show are always English display names, regardless of pg_database collation or anything else. We shouldn't hard-code the use of English for display name, on general principle. > It's possible that there's an argument for not de-duplicating ICU > collation names even though we must do so for libc collation names. > But I'm not exactly sure what that would be. Having initdb fail if > the ICU configuration is funny isn't going to endear us to anyone. BTW, get_icu_language_tag() has a buglet. ULOC_FULLNAME_CAPACITY does not bound the size of a language tag; those can be almost arbitrarily long, unlike locale identifiers. That should probably be fixed, if only to avoid setting a bad example. -- Peter Geoghegan -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs
pgsql-bugs by date: