Home > mailing lists

Re: GB18030-2022 Support in PostgreSQL - Mailing list pgsql-hackers

From	Chao Li
Subject	Re: GB18030-2022 Support in PostgreSQL
Date	September 10 14:54:08
Msg-id	CAEoWx2=EoJYa8HRXaWOnoaD58MonkwN8pTKkc_7=oj6Zjs0P=w@mail.gmail.com Whole thread Raw
In response to	Re: GB18030-2022 Support in PostgreSQL (John Naylor <johncnaylorls@gmail.com>)
List	pgsql-hackers

Tree view

Hi John,

Thank you very much for taking care of this patch.

John Naylor <johncnaylorls@gmail.com> 于2025年9月10日周三 14:38写道：

- The URL at the top currently points to a directory in Github, but v3
changed it to point to the actual file. A directory can be navigated
for inspection, so I used:

2000:
https://github.com/unicode-org/icu-data/tree/main/charset/data/ucm

2022:
https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/

Looks good.

- I also made the regex a multiline regex for readability, even though
the previous one was not.

Thank you very much for polishing the perl script. I am not an expert of perl. I can make the script working, but not perfect.

For 2022 version, I think it would be good to once run a test to
verify that no mappings changed that we didn't expect. Perhaps the
tests here can be used:

https://www.postgresql.org/message-id/b9e3167f-f84b-7aa4-5738-be578a4db924%40iki.fi

I have manually run tested I had done before, everything works as expected.

I downloaded the tests from the referenced mail, but I cannot make the tests to run. After extracting the 2 patch files, it added src/test/encodings, but "make check" seems to not run them. I tried to copy .out and .sql files to src/test/regress, but the tests still not running. Did I miss anything?

The upstream correction to the 2000 version is not present in our
mappings, so we should mention that, unless it was reverted in or
before 2022.

I think the upstream correction to the 2000 version is just a few not round-trip chars that are ignored by us. So I feel we don't need to mention them.

In the documentation (charset.sgml), do we want to mention the version
e.g. the following?

<entry><literal>GB18030</literal></entry>
-<entry>National Standard</entry>
+<entry>National Standard, version 2022</entry>

That's a good idea. I updated the sgml file:

I've whacked around the commit messages, so those should be reviewed
for accuracy.

Your draft commit message had "9 characters are no longer required by
the new standard, but are retained in this patch for compatibility"
...but those nine were introduced in the 2005 version, right? In which
case it doesn't affect us. Please confirm.

I don't find any hint about if the 9 characters were introduced in the 2005 version.

But without this patch, they can be properly converted:

```

evantest=# SELECT encode(convert_from(decode('FD9D', 'hex'), 'GB18030')::bytea, 'hex');
encode
--------
efa5b9
(1 row)

```

So they should be available in the version 2002 already.

"Author: Zheng Tao <taoz@highgo.com>" -- I haven't seen any messages
from this address in this thread, so could you confirm this was
intentional?

Yes, Zheng Tao is my colleague. He worked with me for this patch, so I want to credit him.

I am attaching v5 version. The only change is 0003, I added the SGML change.

Best regards,

Chao Li (Evan)

---------------------

HighGo Software Co., Ltd.
https://www.highgo.com/

Attachment

pgsql-hackers by date:

From: Alexander Kukushkin
Date: 10 September, 14:53:41
Subject: Re: issue with synchronized_standby_slots

From: Andrey Borodin
Date: 10 September, 14:59:56
Subject: Re: VM corruption on standby

Re: GB18030-2022 Support in PostgreSQL - Mailing list pgsql-hackers

Attachment

Previous

Next