Re: OK, that's one LOCALE bug report too many... - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: OK, that's one LOCALE bug report too many... |
Date | |
Msg-id | 17693.975105090@sss.pgh.pa.us Whole thread Raw |
In response to | Re: OK, that's one LOCALE bug report too many... (Peter Eisentraut <peter_e@gmx.net>) |
Responses |
Re: OK, that's one LOCALE bug report too many...
Re: OK, that's one LOCALE bug report too many... |
List | pgsql-hackers |
Peter Eisentraut <peter_e@gmx.net> writes: > Tom Lane writes: >> I propose, therefore, that in an --enable-locale installation, initdb >> should save its values for LC_COLLATE and LC_CTYPE in pg_control, and >> backend startup should restore these settings from pg_control. > Note that when these are unset there might still be a "catch-all" locale > value coming from the LANG env. var. (or LC_ALL on some systems). Actually, what I intend to do while writing pg_control is read the current effective values via "setlocale(category, NULL)" --- then it shouldn't matter where they came from, no? This brings up a question I had just come across while doing further research: backend/main/main.c does #ifdef USE_LOCALE setlocale(LC_CTYPE, ""); /* take locale information from an * environment*/ setlocale(LC_COLLATE, ""); setlocale(LC_MONETARY, ""); #endif which seems a little odd --- why not setlocale(LC_ALL, "") ? Karel Zak said in a thread around 8/15/00 that this is deliberate, but I don't quite see why. >> Also, since "LC_COLLATE=en_US" seems to misbehave rather spectacularly >> on recent RedHat releases, I propose that initdb change "en_US" to "C" >> if it finds that setting. (Are there any platforms where there are >> non-bogus differences between the two?) > There *should* be differences and it is definitely not okay to mix them > up. I have now received positive proof that en_US sort order on RedHat is broken. For example, it asserts'/root/' < '/root0' but'/root/t' > '/root0' I defy you to find anyone in the US who will say that that is a reasonable definition of string collation. Of course, if you prefer the notion of disabling LIKE optimization on a default RedHat installation, we can go ahead and accept en_US. But I say it's broken and we shouldn't use it. >> Finally, until we have a really bulletproof solution for LIKE indexing >> optimization, I will disable that optimization if --enable-locale is >> compiled *and* LC_COLLATE is not C. Better to get "LIKE is slow" bug >> reports than "LIKE gives wrong answers" bug reports. > (C or POSIX) Do you think there are cases where setlocale(,NULL) will give back "POSIX" rather than "C"? We can certainly test for either. > I have a question about that optimization: If you have X LIKE 'foo%', > wouldn't it be enough to use X >= 'foo' (which certainly works for any > locale I've ever heard of)? Why do you need the X <= 'foo???' at all? Because you need a two-sided index constraint, not a one-sided one. Otherwise you're probably better off doing a sequential scan --- scanning 50% of the table (on average) via an index will be slower than sequential. >> Comments? Anyone think that initdb should lock down more categories >> than just these two? > Not sure whether LC_CTYPE is necessary. I'm not either, but I'm afraid to leave it float... regards, tom lane
pgsql-hackers by date: