Home > mailing lists

Re: GB18030-2022 Support in PostgreSQL - Mailing list pgsql-hackers

From	Chao Li
Subject	Re: GB18030-2022 Support in PostgreSQL
Date	August 11 05:01:08
Msg-id	3f12e2ab-6a20-4363-b72f-42502d1c36d3@gmail.com Whole thread Raw
In response to	Re: GB18030-2022 Support in PostgreSQL (Chao Li <li.evan.chao@gmail.com>)
List	pgsql-hackers

Tree view

I have created a patch https://commitfest.postgresql.org/patch/5954/. CommitFests requested a rebase, so I rebased the code and created the v2 patch.

BTW, I have tested all 66 new characters, 9 not-required characters and 18 changed characters in a way as:

evantest=# SELECT encode(convert_from(decode('82359632', 'hex'), 'GB18030')::bytea, 'hex');
encode
--------
e9bfab
(1 row)

All encoded correctly.

Chao Li (Evan)

---------------------

HighGo Software Co., Ltd.
https://www.highgo.com/

On 2025/8/7 16:14, Chao Li wrote:

I did more researches about the changes in 2022 over 2000, here is a summary:

* 66 new characters have been added in 2022. All these are 4 bytes characters. As the map files store only 2 bytes GB code mappings, 4 bytes GB code mapping are calculated, thus these chars can be properly encoded/decoded without this patch, I tested that.
* 9 characters are no longer required by 2022, but application may decide to retain them or not. As the ucm file (https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/gb18030-2022.ucm) retains them, we also retain them.
* Unicode mappings for 18 characters have changed. Only these changes will cause backward compatibility issues. However, half of them are rarely used punctuation marks and rests are glyphs that I cannot recognize as a native Chinese speaker. So these changes should not significantly impact most existing databases.

I added a test case with a mapping changed char, and the test passes:

% make check
...
# All 229 tests passed.

For more details on the standard change, see https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132

I am attaching the patch file.

Chao Li (Evan)
---------------------
Highgo Software Co., Ltd.
https://www.highgo.com/

Attachment

v2-0001-Upgrade-GB18030-encoding-support-from-2000-to-202.patch

pgsql-hackers by date:

From: Mircea Cadariu
Date: 11 August, 03:10:21
Subject: Re: Request for Guidance on Reducing PostgreSQL DB Restoration Time

From: Amit Kapila
Date: 11 August, 07:45:41
Subject: Parallel Apply

Re: GB18030-2022 Support in PostgreSQL - Mailing list pgsql-hackers

Attachment

Previous

Next