I have created a patch https://commitfest.postgresql.org/patch/5954/. CommitFests requested a rebase, so I rebased the code and created the v2 patch.
BTW, I have tested all 66 new characters, 9 not-required characters and 18 changed characters in a way as:
evantest=# SELECT encode(convert_from(decode('82359632', 'hex'), 'GB18030')::bytea, 'hex');
encode
--------
e9bfab
(1 row)
All encoded correctly.
Chao Li (Evan)
---------------------
On 2025/8/7 16:14, Chao Li wrote:
I did more researches about the changes in 2022 over 2000, here is a summary:
* 66 new characters have been added in 2022. All these are 4 bytes characters. As the map files store only 2 bytes GB code mappings, 4 bytes GB code mapping are calculated, thus these chars can be properly encoded/decoded without this patch, I tested that.
* Unicode mappings for 18 characters have changed. Only these changes will cause backward compatibility issues. However, half of them are rarely used punctuation marks and rests are glyphs that I cannot recognize as a native Chinese speaker. So these changes should not significantly impact most existing databases.
I added a test case with a mapping changed char, and the test passes:
% make check
...
# All 229 tests passed.
I am attaching the patch file.
Chao Li (Evan)
---------------------