pgsql: Generate GB18030 mappings from the Unicode Consortium's UCM file - Mailing list pgsql-committers

From John Naylor
Subject pgsql: Generate GB18030 mappings from the Unicode Consortium's UCM file
Date
Msg-id E1uyS1G-000ywA-20@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Generate GB18030 mappings from the Unicode Consortium's UCM file

Previously we built the .map files for GB18030 (version 2000) from an
XML file. The 2022 version for this encoding is only available as a
Unicode Character Mapping (UCM) file, so as preparatory refactoring
switch to this format as the source for building version 2000.

As we do with most input files for the conversion mappings, download
the file on demand. In order to generate the same mappings we have
now, we must download from a previous upstream commit, rather than
the head since the latter contains a correction not present in our
current .map files.

The XML file is still used by EUC_CN, so we cannot delete it from our
repository. GB18030 is a superset of EUC_CN, so it may be possible to
build EUC_CN from the same UCM file, but that is left for future work.

Author: Chao Li <lic@highgo.com>
Discussion: https://postgr.es/m/966d9fc.169.198741fe60b.Coremail.jiaoshuntian%40highgo.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/cfa6cd29271e67c43c1040e3420c1145fdcdceb7

Modified Files
--------------
src/backend/utils/mb/Unicode/Makefile              |  5 +++-
src/backend/utils/mb/Unicode/UCS_to_GB18030.pl     | 28 +++++++++++++++-------
.../utf8_and_gb18030/utf8_and_gb18030.c            |  7 +++++-
3 files changed, 29 insertions(+), 11 deletions(-)


pgsql-committers by date:

Previous
From: Peter Eisentraut
Date:
Subject: pgsql: Move pg_int64 back to postgres_ext.h
Next
From: Richard Guo
Date:
Subject: pgsql: Treat JsonConstructorExpr as non-strict