May "PostgreSQL server side GB18030 character set support" reconsidered? - Mailing list pgsql-general
From | Han Parker |
---|---|
Subject | May "PostgreSQL server side GB18030 character set support" reconsidered? |
Date | |
Msg-id | ME2PR01MB2532E72B514DC46ED0E10F798A0C0@ME2PR01MB2532.ausprd01.prod.outlook.com Whole thread Raw |
In response to | Re: Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets? (Arjen Nienhuis <a.g.nienhuis@gmail.com>) |
Responses |
Re: May "PostgreSQL server side GB18030 character set support" reconsidered?
|
List | pgsql-general |
Hi,
May "GB18030 server side support" deserve reconsidering, after about 15 years later than release of GB18030-2005?
It may be the one of most green features for PostgreSQL.
1. In this big data and mobile era, in the country with most population, 50% more disk energy consuming for Chinese characters (UTF-8 usually 3 bytes for a Chinese character, while GB180830 only 2 bytes) is indeed a harm to "Carbon Neutral", along with Polar ice melting.
2."Setting client side to UTF-8, just like setting server side to UTF-8" in the following mail is not practical for most Chinese IT projects, especially public funding projects. Because GB18030 compatible is a law in Mainland China.
Usually the client side encoding configuration with a GUI is more difficult to be hidden, and most MS Windows users are familiar with GB18030.
MySQL supports GB18030 in server side from V5.7 in 2015. And I am not sure how much this feature contributed to MySQL's more popular in Mainland China.
If greenhouse gas emissions continue apace, Greenland and Antarctica’s ice sheets could together contribute more than 15 inches of global sea level rise by 2100 www.nasa.gov |
Parker Han
From: pgsql-general-owner@postgresql.org <pgsql-general-owner@postgresql.org> on behalf of Arjen Nienhuis <a.g.nienhuis@gmail.com>
Sent: Saturday, March 7, 2015 8:18
To: lsliang <lsliang@pconline.com.cn>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets?
Sent: Saturday, March 7, 2015 8:18
To: lsliang <lsliang@pconline.com.cn>
Cc: Adrian Klaver <adrian.klaver@aklaver.com>; pgsql-general <pgsql-general@postgresql.org>
Subject: Re: Re: Re: [GENERAL] can postgresql supported utf8mb4 character sets?
On Fri, Mar 6, 2015 at 3:55 AM, lsliang <lsliang@pconline.com.cn> wrote:
2015-03-06发件人:Adrian Klaver发送时间:2015-03-05 21:31:39收件人:lsliang; pgsql-general抄送:主题:Re: [GENERAL] can postgresql supported utf8mb4 character sets?On 03/05/2015 01:45 AM, lsliang wrote:> can postgresql supported utf8mb4 character set?> today mobile apps support 4-byte character and utf8 can only> support 1-3 bytes characterThe docs would seem to indicate otherwise:> if load string to database which contain a 4-byte character> will failed .Have you actually tried to load strings in to Postgres?If so and it failed what was the method you used and what was the error?> mysql since 5.5.3 support utf8mb4 character sets> I don't find some information about postgresql> thanks--Adrian Klaver>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>thanks for your help .postgresql can support 4-byte charactertest=> select * from utf8mb4_test ;ERROR: character with byte sequence 0xf0 0x9f 0x98 0x84 in encoding "UTF8" has no equivalent in encoding "GB18030"test=> \encoding utf8test=> select * from utf8mb4_test ;content---------😄😄pcauto=>
UTF-8 support works fine. The 3 byte limit was something mysql invented. But it only works if your client encoding is UTF-8. In your example, your terminal is not set to UTF-8.
create table test (glyph text);
insert into test values ('A'), ('馬'), ('𐁀'), ('😄'), ('🇪🇸');
select glyph, convert_to(glyph, 'utf-8'), length(glyph) FROM test;
glyph | convert_to | length
-------+--------------------+--------
A | \x41 | 1
馬 | \xe9a6ac | 1
𐁀 | \xf0908180 | 1
😄 | \xf09f9884 | 1
🇪🇸 | \xf09f87aaf09f87b8 | 2
(5 rows)
What doesn't work is GB18030:
select glyph, convert_to(glyph, 'GB18030'), length(glyph) FROM test;
ERROR: character with byte sequence 0xf0 0x90 0x81 0x80 in encoding "UTF8" has no equivalent in encoding "GB18030"
create table test (glyph text);
insert into test values ('A'), ('馬'), ('𐁀'), ('😄'), ('🇪🇸');
select glyph, convert_to(glyph, 'utf-8'), length(glyph) FROM test;
glyph | convert_to | length
-------+--------------------+--------
A | \x41 | 1
馬 | \xe9a6ac | 1
𐁀 | \xf0908180 | 1
😄 | \xf09f9884 | 1
🇪🇸 | \xf09f87aaf09f87b8 | 2
(5 rows)
What doesn't work is GB18030:
select glyph, convert_to(glyph, 'GB18030'), length(glyph) FROM test;
ERROR: character with byte sequence 0xf0 0x90 0x81 0x80 in encoding "UTF8" has no equivalent in encoding "GB18030"
I think that is a bug.
Gr. Arjen
pgsql-general by date: