Thread: [RFC] Localized literals

[RFC] Localized literals

From

Zoltan Boszormenyi

Date:

23 April 2008, 05:05:15

Hi,

we have a customer who shot themselves in the foot by using
table names with german accented characters in them.
The client application on the popular OS is using a single-byte
encoding (LATIN9), their dump of the original database is using
the same but no "SET client_encoding = ..." line anywhere.
We didn't know the previous conditions and have setup the
default installed database cluster that was created during installation
on Debian 4.0. Obviously we have to re-initialize the cluster
with the original locale so the table names come out right after
reloading the dump.

But the question popped up whether PostgreSQL can be extended
to allow localized literals and apply encoding conversion the same
way as on string data. NAMEDATA can be replaced with regular TEXT
and have the same conversion everywhere. This way the relation and
field name limits are also eliminated. The conversion could be controlled
by a compile-time option and/or a GUC variable. Tell me if I am crazy.

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/

Re: [RFC] Localized literals

From

Martijn van Oosterhout

Date:

23 April 2008, 05:46:57

On Wed, Apr 23, 2008 at 10:02:37AM +0200, Zoltan Boszormenyi wrote:
> But the question popped up whether PostgreSQL can be extended
> to allow localized literals and apply encoding conversion the same
> way as on string data. NAMEDATA can be replaced with regular TEXT
> and have the same conversion everywhere. This way the relation and
> field name limits are also eliminated. The conversion could be controlled
> by a compile-time option and/or a GUC variable. Tell me if I am crazy.

It does convert the table names also, since the encoding translation is
applied to the whole query string, not just normal strings. A simple
SET CLIENT_ENCODING='latin9' at the beginning of your dump should have
worked.

As for the other point, the reason NAMEDATA is fixed is because these
records is mapped onto in memory structures in the backend. By changing
it to a variable length type all structure accesses would become much
more expensive.

But none of this has anything ot do with encodings.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Re: [RFC] Localized literals

From

Zoltan Boszormenyi

Date:

23 April 2008, 06:43:03

Martijn van Oosterhout írta:
> On Wed, Apr 23, 2008 at 10:02:37AM +0200, Zoltan Boszormenyi wrote:
>
>> But the question popped up whether PostgreSQL can be extended
>> to allow localized literals and apply encoding conversion the same
>> way as on string data. NAMEDATA can be replaced with regular TEXT
>> and have the same conversion everywhere. This way the relation and
>> field name limits are also eliminated. The conversion could be controlled
>> by a compile-time option and/or a GUC variable. Tell me if I am crazy.
>>
>
> It does convert the table names also, since the encoding translation is
> applied to the whole query string, not just normal strings.

Thanks for enlightenment, I didn't know or check it.

>  A simple
> SET CLIENT_ENCODING='latin9' at the beginning of your dump should have
> worked.
>

I thought so - the dump was off a 7.4 server with its pg_dump
and there was no "SET client_encoding = ..." in the script.
They insisted on continuing with LATIN9 but with the default
UTF-8 server encoding createdb -E LATIN9 failed.

> As for the other point, the reason NAMEDATA is fixed is because these
> records is mapped onto in memory structures in the backend. By changing
> it to a variable length type all structure accesses would become much
> more expensive.
>

Yes, I guessed it would be slower. However, NAMEDATALEN
doesn't translate to the varchar(NAMEDATALEN) if I use accented
characters in literals with UTF-8 encoding. :-(

> But none of this has anything ot do with encodings.
>
> Have a nice day,
>

Thanks, I have it. :-)

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
http://www.postgresql.at/