Re: GB18030-2022 Support in PostgreSQL - Mailing list pgsql-hackers

From John Naylor
Subject Re: GB18030-2022 Support in PostgreSQL
Date
Msg-id CANWCAZaH+37-avZkmEp1kYP1iw8zozY1nVMHHjfnH88QzaUQSg@mail.gmail.com
Whole thread Raw
In response to Re: GB18030-2022 Support in PostgreSQL  (Chao Li <li.evan.chao@gmail.com>)
List pgsql-hackers
On Thu, Sep 11, 2025 at 4:09 PM Chao Li <li.evan.chao@gmail.com> wrote:
> Then I switched to the patch branch, it got 21 different lines. After I updated the 18 known changes in the out file,
thenit got only 3 different lines: 
>
> ```
> - \x8135f437   | \xe1b8bf
> + \x8135f437   | \xee9f87
>
> - \xa3a0       | \xee97a5
> + \xa3a0       | character with byte sequence 0xa3 0xa0 in encoding "GB18030" has no equivalent in encoding “UTF8"
>
> - \xa8bc       | \xee9f87
> + \xa8bc       | \xe1b8bf
> ```
>
> Where, \x8135f437 and \xa8bc reflect to the change pointed by above link:
>
> \xA8BC used to map to unicode UE7C7, now \x8135f437 changed to map to UE7C7, and \xA8BC changed to map to U1E3F in
version2005. 

Maybe we can phrase it like this:

```
There have been two corrections to the 2000 version that were carried
forward to later versions. The following mappings were previously
swapped:

U+E7C7 (Private Use Area) now maps to \x8135f437
U+1E3F (Latin Small Letter M with Acute) now maps to \xA8BC
```

> For \xa3a0, in 2022.ucm, it is a not a roundtrip mapping:
>
> ```
> <U3000> \xA3\xA0 |3
> <UE5E5> \xA3\xA0 |4
> ```
>
> So we ignored it. Then everything is clear.

Yes, I see this in the file, but it's not described in any of the
documents about the 2022 version, although they mention other cases
regarding the Private Use Area. I'm not sure we need to worry too
much, but we need to describe the behavior changes, maybe like this:

```
Previously, U+E5E5 (Private Use Area) was mapped to \xA3A0. This code
point now maps to \x65356535. Attempting to convert \xA3A0 will now
raise an error.
```

I'm open to suggestions.

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: Aleksander Alekseev
Date:
Subject: Re: [PATCH] Refactor bytea_sortsupport(), take two
Next
From: shveta malik
Date:
Subject: Re: Logical Replication of sequences