Re: More message encoding woes - Mailing list pgsql-hackers
From | Heikki Linnakangas |
---|---|
Subject | Re: More message encoding woes |
Date | |
Msg-id | 49DB2666.1050800@enterprisedb.com Whole thread Raw |
In response to | Re: More message encoding woes (Peter Eisentraut <peter_e@gmx.net>) |
Responses |
Re: More message encoding woes
|
List | pgsql-hackers |
Peter Eisentraut wrote: > On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote: >> Using the name for the latin1 encoding in the currently Windows-only >> mapping table, "LATIN1", you get no translation because that name is not >> recognized by the system. Using the other name "ISO-8859-1", it works. >> "LATIN1" is not listed in the output of locale -m either. > > You are looking in the wrong place. What we need is for iconv to recognize > the encoding name used by PostgreSQL. iconv --list is the primary hint for > that. > > The locale names provided by the operating system are arbitrary and unrelated. Oh, ok. I guess we can do the simple fix you proposed then. Patch attached. Instead of checking for LC_CTYPE == C, I'm checking "pg_get_encoding_from_locale(NULL) == encoding" which is more close to what we actually want. The downside is that pg_get_encoding_from_locale(NULL) isn't exactly free, but the upside is that we don't need to keep this in sync with the rules we have in CREATE DATABASE that enforce that locale matches encoding. This doesn't include the cleanup to make the mapping table easier to maintain that Magnus was going to have a look at before I started this thread. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com *** a/src/backend/utils/mb/mbutils.c --- b/src/backend/utils/mb/mbutils.c *************** *** 890,896 **** cliplen(const char *str, int len, int limit) return l; } ! #if defined(ENABLE_NLS) && defined(WIN32) static const struct codeset_map { int encoding; const char *codeset; --- 890,896 ---- return l; } ! #if defined(ENABLE_NLS) static const struct codeset_map { int encoding; const char *codeset; *************** *** 929,935 **** static const struct codeset_map { {PG_EUC_TW, "EUC-TW"}, {PG_EUC_JIS_2004, "EUC-JP"} }; ! #endif /* WIN32 */ void SetDatabaseEncoding(int encoding) --- 929,935 ---- {PG_EUC_TW, "EUC-TW"}, {PG_EUC_JIS_2004, "EUC-JP"} }; ! #endif /* ENABLE_NLS */ void SetDatabaseEncoding(int encoding) *************** *** 946,960 **** SetDatabaseEncoding(int encoding) } /* ! * On Windows, we need to explicitly bind gettext to the correct ! * encoding, because gettext() tends to get confused. */ void pg_bind_textdomain_codeset(const char *domainname, int encoding) { ! #if defined(ENABLE_NLS) && defined(WIN32) int i; for (i = 0; i < lengthof(codeset_map_array); i++) { if (codeset_map_array[i].encoding == encoding) --- 946,975 ---- } /* ! * Bind gettext to the correct encoding. */ void pg_bind_textdomain_codeset(const char *domainname, int encoding) { ! #if defined(ENABLE_NLS) int i; + /* + * gettext() uses the encoding specified by LC_CTYPE by default, + * so if that matches the database encoding, we don't need to do + * anything. This is not for performance, but because if + * bind_textdomain_codeset() doesn't recognize the codeset name we + * pass it, it will fall back to English and we don't want that to + * happen unnecessarily. + * + * On Windows, though, gettext() tends to get confused so we always + * bind it. + */ + #ifndef WIN32 + if (pg_get_encoding_from_locale(NULL) == encoding) + return; + #endif + for (i = 0; i < lengthof(codeset_map_array); i++) { if (codeset_map_array[i].encoding == encoding)
pgsql-hackers by date: