Re: inserts bypass encoding conversion - Mailing list pgsql-admin

From Tom Lane
Subject Re: inserts bypass encoding conversion
Date
Msg-id 1727535.1692240032@sss.pgh.pa.us
Whole thread Raw
In response to RE: inserts bypass encoding conversion  ("James Pang (chaolpan)" <chaolpan@cisco.com>)
List pgsql-admin
"James Pang (chaolpan)" <chaolpan@cisco.com> writes:
> So,  insert into values(chr(226)||chr(128)||chr(166)) actually got stored in database with LATIN1 with single byte
sequence,but when query select * from testutf8, it got converted to UTF8 three byte sequence first ?  

There are no LATIN1 characters that have longer than 2-byte UTF8
representations, so no.

I think your fundamental misunderstanding is supposing that this:

    chr(226)||chr(128)||chr(166)

produces something equivalent to the UTF8 sequence 0xe2 0x80 0xa6.
It will not, no matter which server encoding you are dealing with.
It will produce something that is three separate characters
according to the server encoding.  In LATIN1, that could well be
the byte sequence 0xe2 0x80 0xa6, but *that byte sequence does not
mean the same thing that it would mean in UTF8 encoding*.

You also seem not to grasp the fact that an encoding conversion
will happen between your client and the server if client_encoding
is different from server_encoding.  Because of that, the output of
a SELECT command doesn't prove much of anything here.

            regards, tom lane



pgsql-admin by date:

Previous
From: "James Pang (chaolpan)"
Date:
Subject: RE: inserts bypass encoding conversion
Next
From: Rajesh Kumar
Date:
Subject: Autovacuum not working peoperly