Re: GB18030-2022 Support in PostgreSQL - Mailing list pgsql-hackers

From John Naylor
Subject Re: GB18030-2022 Support in PostgreSQL
Date
Msg-id CANWCAZZh4jPYDF5gVy0V8Hs9VhDBJCznaD7g9CCY+Npn_7OfPg@mail.gmail.com
Whole thread Raw
In response to Re: GB18030-2022 Support in PostgreSQL  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
On Mon, Aug 18, 2025 at 3:50 PM Chao Li <li.evan.chao@gmail.com> wrote:
> This is my first spitted patch. I was confused about the "0001" part in patch file names. Now I understood. I just
recreatedthe both patch files as v3: 

I've attached v4, in which I made some cosmetic changes to the perl
script, mostly to make it resemble master more closely. These changes
are separated out into a separate patch for visibility, but will be
squashed in the final commit. Two things are worth calling out:

- The URL at the top currently points to a directory in Github, but v3
changed it to point to the actual file. A directory can be navigated
for inspection, so I used:

2000:
https://github.com/unicode-org/icu-data/tree/main/charset/data/ucm

2022:
https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/

- I also made the regex a multiline regex for readability, even though
the previous one was not.

For 2022 version, I think it would be good to once run a test to
verify that no mappings changed that we didn't expect. Perhaps the
tests here can be used:

https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi

The upstream correction to the 2000 version is not present in our
mappings, so we should mention that, unless it was reverted in or
before 2022.

In the documentation (charset.sgml), do we want to mention the version
e.g. the following?

 <entry><literal>GB18030</literal></entry>
-<entry>National Standard</entry>
+<entry>National Standard, version 2022</entry>

I've whacked around the commit messages, so those should be reviewed
for accuracy.

Your draft commit message had "9 characters are no longer required by
the new standard, but are retained in this patch for compatibility"
...but those nine were introduced in the 2005 version, right? In which
case it doesn't affect us. Please confirm.

"Author: Zheng Tao <taoz@highgo.com>" -- I haven't seen any messages
from this address in this thread, so could you confirm this was
intentional?

--
John Naylor
Amazon Web Services

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: allow benign typedef redefinitions (C11)
Next
From: BharatDB
Date:
Subject: Re: Adding skip scan (including MDAM style range skip scan) to nbtree