Thread: [RFC] Localized literals
Hi, we have a customer who shot themselves in the foot by using table names with german accented characters in them. The client application on the popular OS is using a single-byte encoding (LATIN9), their dump of the original database is using the same but no "SET client_encoding = ..." line anywhere. We didn't know the previous conditions and have setup the default installed database cluster that was created during installation on Debian 4.0. Obviously we have to re-initialize the cluster with the original locale so the table names come out right after reloading the dump. But the question popped up whether PostgreSQL can be extended to allow localized literals and apply encoding conversion the same way as on string data. NAMEDATA can be replaced with regular TEXT and have the same conversion everywhere. This way the relation and field name limits are also eliminated. The conversion could be controlled by a compile-time option and/or a GUC variable. Tell me if I am crazy. Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
On Wed, Apr 23, 2008 at 10:02:37AM +0200, Zoltan Boszormenyi wrote: > But the question popped up whether PostgreSQL can be extended > to allow localized literals and apply encoding conversion the same > way as on string data. NAMEDATA can be replaced with regular TEXT > and have the same conversion everywhere. This way the relation and > field name limits are also eliminated. The conversion could be controlled > by a compile-time option and/or a GUC variable. Tell me if I am crazy. It does convert the table names also, since the encoding translation is applied to the whole query string, not just normal strings. A simple SET CLIENT_ENCODING='latin9' at the beginning of your dump should have worked. As for the other point, the reason NAMEDATA is fixed is because these records is mapped onto in memory structures in the backend. By changing it to a variable length type all structure accesses would become much more expensive. But none of this has anything ot do with encodings. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Martijn van Oosterhout írta: > On Wed, Apr 23, 2008 at 10:02:37AM +0200, Zoltan Boszormenyi wrote: > >> But the question popped up whether PostgreSQL can be extended >> to allow localized literals and apply encoding conversion the same >> way as on string data. NAMEDATA can be replaced with regular TEXT >> and have the same conversion everywhere. This way the relation and >> field name limits are also eliminated. The conversion could be controlled >> by a compile-time option and/or a GUC variable. Tell me if I am crazy. >> > > It does convert the table names also, since the encoding translation is > applied to the whole query string, not just normal strings. Thanks for enlightenment, I didn't know or check it. > A simple > SET CLIENT_ENCODING='latin9' at the beginning of your dump should have > worked. > I thought so - the dump was off a 7.4 server with its pg_dump and there was no "SET client_encoding = ..." in the script. They insisted on continuing with LATIN9 but with the default UTF-8 server encoding createdb -E LATIN9 failed. > As for the other point, the reason NAMEDATA is fixed is because these > records is mapped onto in memory structures in the backend. By changing > it to a variable length type all structure accesses would become much > more expensive. > Yes, I guessed it would be slower. However, NAMEDATALEN doesn't translate to the varchar(NAMEDATALEN) if I use accented characters in literals with UTF-8 encoding. :-( > But none of this has anything ot do with encodings. > > Have a nice day, > Thanks, I have it. :-) Best regards, Zoltán Böszörményi -- ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/