Thread: Missing Bug-Report #5904?
Hello, i've written a bugreport which got the ID #5904, but i'm not able to find it in the mailinglist. Is it lost? Maybe there is a problem with the Umlaut in my name? Now for the Problem: There is a problem with the translation of the english word "March" to the german "März". Instead of "März" i get "MäRz" (with uppercase "r"). You can reproduce it as follow: # SET lc_time = "de_DE.UTF-8"; # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); to_char ----------- MäRz 2011 I did not find the translation file for this, so i can't add a patch or check for other misspellings. My System: PostgreSQL 9.0.3 FreeBSD 8.1-RELEASE Greetings from Germany, Torsten
Hi, On Friday, March 04, 2011 10:16:24 AM Torsten Z=FChlsdorff wrote: > Now for the Problem: There is a problem with the translation of the=20 > english word "March" to the german "M=E4rz". Instead of "M=E4rz" i get=20 > "M=E4Rz" (with uppercase "r"). >=20 > You can reproduce it as follow: > # SET lc_time =3D "de_DE.UTF-8"; > # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); > to_char > ----------- > M=E4Rz 2011 >=20 > I did not find the translation file for this, so i can't add a patch or= =20 > check for other misspellings. >=20 > My System: > PostgreSQL 9.0.3 > FreeBSD 8.1-RELEASE Thats very likely a problem of your operating systems locales. What spellin= g=20 does the month have if you construct it with `date` or such? Andres
Hello Andres, >> Now for the Problem: There is a problem with the translation of the >> english word "March" to the german "März". Instead of "März" i get >> "MäRz" (with uppercase "r"). >> >> You can reproduce it as follow: >> # SET lc_time = "de_DE.UTF-8"; >> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); >> to_char >> ----------- >> MäRz 2011 >> >> I did not find the translation file for this, so i can't add a patch or >> check for other misspellings. >> >> My System: >> PostgreSQL 9.0.3 >> FreeBSD 8.1-RELEASE > Thats very likely a problem of your operating systems locales. What spelling > does the month have if you construct it with `date` or such? Done directly at the bash on the same system: $ date +%B March $ export LC_TIME=de_DE.UTF-8 $ date +%B März And in PostgreSQL: # SET lc_time = "de_DE.UTF-8"; SET # SELECT to_char(current_date, 'TMMonth YYYY'); to_char ----------- MäRz 2011 I also can reproduce this at a FreeBSD 7.0-STABLE. Greetings, Torsten
On Fri, Mar 4, 2011 at 14:34, Torsten Z=FChlsdorff <foo@meisterderspiele.de> wrote: > Hello Andres, > >>> Now for the Problem: There is a problem with the translation of the >>> english word "March" to the german "M=E4rz". Instead of "M=E4rz" i get = "M=E4Rz" >>> (with uppercase "r"). >>> >>> You can reproduce it as follow: >>> # SET lc_time =3D "de_DE.UTF-8"; >>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); >>> =A0 to_char >>> ----------- >>> =A0M=E4Rz 2011 >>> >>> I did not find the translation file for this, so i can't add a patch or >>> check for other misspellings. >>> >>> My System: >>> PostgreSQL 9.0.3 >>> FreeBSD 8.1-RELEASE >> >> Thats very likely a problem of your operating systems locales. What >> spelling does the month have if you construct it with `date` or such? > > Done directly at the bash on the same system: > $ date +%B > March > $ =A0export LC_TIME=3Dde_DE.UTF-8 > $ date +%B > M=E4rz > > And in PostgreSQL: > # SET lc_time =3D "de_DE.UTF-8"; > SET > # SELECT to_char(current_date, 'TMMonth YYYY'); > =A0to_char > ----------- > =A0M=E4Rz 2011 > > I also can reproduce this at a FreeBSD 7.0-STABLE. IIRC, the FreeBSD locales at least used to be pretty much broken for UTF8. Can you try and see if you get the same problem in a non-UTF8 locale? --=20 =A0Magnus Hagander =A0Me: http://www.hagander.net/ =A0Work: http://www.redpill-linpro.com/
Hello, >>>> Now for the Problem: There is a problem with the translation of the >>>> english word "March" to the german "März". Instead of "März" i get "MäRz" >>>> (with uppercase "r"). >>>> >>>> You can reproduce it as follow: >>>> # SET lc_time = "de_DE.UTF-8"; >>>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); >>>> to_char >>>> ----------- >>>> MäRz 2011 >>>> >>>> I did not find the translation file for this, so i can't add a patch or >>>> check for other misspellings. >>>> >>>> My System: >>>> PostgreSQL 9.0.3 >>>> FreeBSD 8.1-RELEASE >>> Thats very likely a problem of your operating systems locales. What >>> spelling does the month have if you construct it with `date` or such? >> Done directly at the bash on the same system: >> $ date +%B >> March >> $ export LC_TIME=de_DE.UTF-8 >> $ date +%B >> März >> >> And in PostgreSQL: >> # SET lc_time = "de_DE.UTF-8"; >> SET >> # SELECT to_char(current_date, 'TMMonth YYYY'); >> to_char >> ----------- >> MäRz 2011 >> >> I also can reproduce this at a FreeBSD 7.0-STABLE. > > IIRC, the FreeBSD locales at least used to be pretty much broken for > UTF8. Can you try and see if you get the same problem in a non-UTF8 > locale? Doesn't work proper at my bash, even the dirty way: $ export LC_ALL=de_DE.ISO8859-1 $ export LC_PAPER=de_DE.ISO8859-1 $ export LC_ADDRESS=de_DE.ISO8859-1 $ export LC_MONETARY=de_DE.ISO8859-1 $ export LC_NUMERIC=de_DE.ISO8859-1 $ export LC_TELEPHONE=de_DE.ISO8859-1 $ export LC_MESSAGES=de_DE.ISO8859-1 $ export LC_IDENTIFICATION=de_DE.ISO8859-1 $ export LC_COLLATE=de_DE.ISO8859-1 $ export LANG=de_DE.ISO8859-1 $ export LC_MEASUREMENT=de_DE.ISO8859-1 $ export XTERM_LOCALE=de_DE.ISO8859-1 $ export LANGUAGE=de_DE.ISO8859-1:de $ export LC_CTYPE=de_DE.ISO8859-1 $ export LC_TIME=de_DE.ISO8859-1 $ export LC_NAME=de_DE.ISO8859-1 $ export LC_ALL=de_DE.ISO8859-1 $ date +%B M�z I can't figure out, why the umlaut is not displayed correctly. In PostgreSQL it looks interesting: # SET lc_time = "de_DE.ISO8859-1"; SET 0.3.impos=# SELECT to_char(current_date, 'TMMonth YYYY'); to_char ---------- MRz 2011 (1 Zeile) The missing Umlaut could be an error of the bash, but the upperase "r" is still there. Greetings, Torsten
Torsten Zühlsdorff <foo@meisterderspiele.de> writes: > Now for the Problem: There is a problem with the translation of the > english word "March" to the german "M�rz". Instead of "M�rz" i get > "M�Rz" (with uppercase "r"). > You can reproduce it as follow: > # SET lc_time = "de_DE.UTF-8"; > # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); > to_char > ----------- > M�Rz 2011 I can reproduce the above when the database encoding is not UTF8 or lc_ctype isn't a UTF8 locale. The reason is that TMMonth implies applying an initcap transformation to the month name retrieved from the locale library. The only way initcap will make the right choice of what to do with the "r" is if it thinks that � is a letter. Which it won't if the encoding is wrong or lc_ctype isn't set to classify � as a letter. This does not seem like a bug to me though, just misconfiguration. regards, tom lane
Hello Tom, >> Now for the Problem: There is a problem with the translation of the >> english word "March" to the german "März". Instead of "März" i get >> "MäRz" (with uppercase "r"). > >> You can reproduce it as follow: >> # SET lc_time = "de_DE.UTF-8"; >> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); >> to_char >> ----------- >> MäRz 2011 > > I can reproduce the above when the database encoding is not UTF8 or > lc_ctype isn't a UTF8 locale. The reason is that TMMonth implies > applying an initcap transformation to the month name retrieved from > the locale library. The only way initcap will make the right choice > of what to do with the "r" is if it thinks that ä is a letter. > Which it won't if the encoding is wrong or lc_ctype isn't set to > classify ä as a letter. This does not seem like a bug to me > though, just misconfiguration. Hm... encoding of the database is UTF8. The lc_ctype is 'C'. Maybe this may be a misconfiguration, but is there another way to get it work right than recreating the complete database with another locale? But don't that mean, that the translation of the timestamp to languages with other umlauts should also be wrong. For example to "fr_FR.UTF-8"? Greetings from Germany, Torsten
Torsten Zühlsdorff <foo@meisterderspiele.de> writes: >>> # SET lc_time = "de_DE.UTF-8"; >>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); >>> to_char >>> ----------- >>> M�Rz 2011 >> I can reproduce the above when the database encoding is not UTF8 or >> lc_ctype isn't a UTF8 locale. > Hm... encoding of the database is UTF8. The lc_ctype is 'C'. Right, that was the same case I checked. In C locale, � is not a letter, so you get the above from the initcap transformation. > But don't that mean, that the translation of the timestamp to languages > with other umlauts should also be wrong. For example to "fr_FR.UTF-8"? Possibly, I haven't checked. If they have any month names with non-ASCII characters in the middle, they'd see the same thing. You would certainly also get undesirable results from TMMONTH, since it wouldn't know how to uppercase �. In my view none of this is a Postgres bug --- the correct fix is to use locale settings that correspond to the behavior you want. regards, tom lane
Hello, >>>> # SET lc_time = "de_DE.UTF-8"; >>>> # SELECT to_char('2011-03-04 00:00:01'::date, 'TMMonth YYYY'); >>>> to_char >>>> ----------- >>>> MäRz 2011 > >>> I can reproduce the above when the database encoding is not UTF8 or >>> lc_ctype isn't a UTF8 locale. > >> Hm... encoding of the database is UTF8. The lc_ctype is 'C'. > > Right, that was the same case I checked. In C locale, ä is not a > letter, so you get the above from the initcap transformation. > >> But don't that mean, that the translation of the timestamp to languages >> with other umlauts should also be wrong. For example to "fr_FR.UTF-8"? > > Possibly, I haven't checked. If they have any month names with > non-ASCII characters in the middle, they'd see the same thing. > You would certainly also get undesirable results from TMMONTH, since > it wouldn't know how to uppercase ä. In my view none of this is > a Postgres bug --- the correct fix is to use locale settings that > correspond to the behavior you want. Hm... in my point of view it's a bug, but not necessarily a PG bug. My desired result is the correct translated output in different languages. Now i know that this is not possible, because i have to use the correct lc_ctype for the entire database, which can't be changed after the database-creation. The only work-around seems to be to handle the translation myself. That's very ugly and makes the use of TMMonth pointless, if you have to take care of the result-output before you use the database. Thanks to all for your time and help, Torsten