Re: GB18030-2022 Support in PostgreSQL - Mailing list pgsql-hackers

From John Naylor
Subject Re: GB18030-2022 Support in PostgreSQL
Date
Msg-id CANWCAZaHbby890qkVQkjwW991fmYzJKXmfKEVhQtOYw+uh8Vhw@mail.gmail.com
Whole thread Raw
In response to GB18030-2022 Support in PostgreSQL  (JiaoShuntian <jiaoshuntian@highgo.com>)
Responses Re: GB18030-2022 Support in PostgreSQL
List pgsql-hackers
On Mon, Aug 11, 2025 at 9:01 AM Chao Li <li.evan.chao@gmail.com> wrote:
>
> I have created a patch https://commitfest.postgresql.org/patch/5954/. CommitFests requested a rebase, so I rebased
thecode and created the v2 patch. 
>
> BTW, I have tested all 66 new characters, 9 not-required characters and 18 changed characters in a way as:

"9 characters are no longer required by the new standard, but are
retained in this patch for compatibility"

How is that done?

> I added a test case with a mapping changed char, and the test passes:
>
> % make check
> ...
> # All 229 tests passed.
>
> For more details on the standard change, see https://ken-lunde.medium.com/the-gb-18030-2022-standard-3d0ebaeb4132
>
> I am attaching the patch file.

Going from the old .xml file to the .ucm file makes it difficult to
see the relevant changes. Also, there are nearly 1000 non-user-visible
changes like this in the output file that are not explained:

-  /*** Three byte table, leaf: efa8xx - offset 0x07aba ***/
+  /*** Three byte table, leaf: efa8xx - offset 0x07b3a ***/

The 2000 version is available in the .ucm format, so maybe converting
to that first would be a good preparatory patch:

https://github.com/unicode-org/icu-data/blob/main/charset/data/ucm/gb-18030-2000.ucm

Looking at the history, it looks like that file has seen small
revisions, so it may take some research to get the exact equivalent to
the XML file we use. That will also tell us if anything will change
for us besides the actual 2022 revision.

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Parallel Apply
Next
From: Peter Eisentraut
Date:
Subject: Generate GUC tables from .dat file