Re: About Unicode IVS - Mailing list pgsql-admin
From | Holger Jakobs |
---|---|
Subject | Re: About Unicode IVS |
Date | |
Msg-id | 15e893e2-1237-27f4-58ba-5ea59499fa3d@jakobs.com Whole thread Raw |
In response to | RE: About Unicode IVS (荒井元成 <n2029@ndensan.co.jp>) |
Responses |
Re: About Unicode IVS
|
List | pgsql-admin |
It's totally correct that the two characters are still two characters.
You would have to normalize the string first, so that the combination becomes one character.
More information about this topic, which is in part beyond PostgreSQL:
- https://stackoverflow.com/questions/7931204/what-is-normalized-utf-8-all-about
- https://en.wikipedia.org/wiki/Unicode_equivalence
Regards,
Holger
@font-face {font-family:"MS ゴシック"; panose-1:2 11 6 9 7 2 5 8 2 4;}@font-face {font-family:"Cambria Math"; panose-1:2 4 5 3 5 4 6 3 2 4;}@font-face {font-family:"MS Pゴシック"; panose-1:2 11 6 0 7 2 5 8 2 4;}@font-face {font-family:"・ュ・ウ 繧エ繧キ繝・け"; panose-1:0 0 0 0 0 0 0 0 0 0;}@font-face {font-family:Calibri; panose-1:2 15 5 2 2 2 4 3 2 4;}@font-face {font-family:"Calibri Light"; panose-1:2 15 3 2 2 2 4 3 2 4;}@font-face {font-family:Consolas; panose-1:2 11 6 9 2 2 4 3 2 4;}@font-face {font-family:"\@MS ゴシック"; panose-1:2 11 6 9 7 2 5 8 2 4;}@font-face {font-family:"\@MS Pゴシック"; panose-1:2 11 6 0 7 2 5 8 2 4;}@font-face {font-family:"Segoe UI Emoji"; panose-1:2 11 5 2 4 2 4 2 2 3;}@font-face {font-family:"\@・ュ・ウ 繧エ繧キ繝・け";}@font-face {font-family:"ÿ2dÿ33 0b40b70c30af"; panose-1:0 0 0 0 0 0 0 0 0 0;}p.MsoNormal, li.MsoNormal, div.MsoNormal {margin:0mm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"MS Pゴシック";}a:link, span.MsoHyperlink {mso-style-priority:99; color:blue; text-decoration:underline;}a:visited, span.MsoHyperlinkFollowed {mso-style-priority:99; color:purple; text-decoration:underline;}pre {mso-style-priority:99; mso-style-link:"HTML 書式付き \(文字\)"; margin:0mm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"MS ゴシック";}span.HTML {mso-style-name:"HTML 書式付き \(文字\)"; mso-style-priority:99; mso-style-link:"HTML 書式付き"; font-family:"Courier New";}p.msonormal0, li.msonormal0, div.msonormal0 {mso-style-name:msonormal; mso-margin-top-alt:auto; margin-right:0mm; mso-margin-bottom-alt:auto; margin-left:0mm; font-size:12.0pt; font-family:"MS Pゴシック";}p.PrformatHTML, li.PrformatHTML, div.PrformatHTML {mso-style-name:"Préformaté HTML"; mso-style-link:"Préformaté HTML Car"; margin:0mm; margin-bottom:.0001pt; font-size:12.0pt; font-family:"MS Pゴシック";}span.PrformatHTMLCar {mso-style-name:"Préformaté HTML Car"; mso-style-priority:99; mso-style-link:"Préformaté HTML"; font-family:Consolas;}span.y2iqfc {mso-style-name:y2iqfc;}.MsoChpDefault {mso-style-type:export-only; font-size:10.0pt;}div.WordSection1 {page:WordSection1;} thank you for your reply.
Changing the collation order and CTYPE did not change the behavior.
名前 | 所有者 | エンコーディング | 照合順序 | Ctype(変換演算子) | アクセス権限
-----------+---------+------------------+-------------+-------------------+---------------------
D209007 | D209007 | UTF8 | C | C |
postgres | D209007 | UTF8 | C | C |
template0 | D209007 | UTF8 | C | C | =c/D209007 +
| | | | | D209007=CTc/D209007
template1 | D209007 | UTF8 | C | C | =c/D209007 +
| | | | | D209007=CTc/D209007
template2 | D209007 | UTF8 | ja_JP.UTF-8 | ja_JP.UTF-8 |
(5 行)
D209007=# \c template2
データベース"template2"にユーザ"D209007"として接続しました。
template2=# select char_length(U&'\+0000E6' || U&'\+000300');
char_length
-------------
2
(1 行)
template2=# select char_length(U&'\+008FBA' || U&'\+0E0102');
char_length
-------------
2
(1 行)
template2=# select length(U&'\+008FBA' || U&'\+0E0102');
length
--------
2
(1 行)
Moto.
From: Michel SALAIS <msalais@msym.fr>
Sent: Tuesday, March 29, 2022 6:35 PM
To: '荒井元成' <n2029@ndensan.co.jp>; 'David G. Johnston' <david.g.johnston@gmail.com>
Cc: pgsql-admin@lists.postgresql.org
Subject: RE: About Unicode IVS
Hi,
I think this has something to do with collation and ctype. As I see you have it set to “C” for all your databases (even if I don’t understand your titles 😊).
Michel SALAIS
De : 荒井元成 <n2029@ndensan.co.jp>
Envoyé : mardi 29 mars 2022 06:35
À : 'David G. Johnston' <david.g.johnston@gmail.com>
Cc : pgsql-admin@lists.postgresql.org
Objet : RE: About Unicode IVS
thank you for your reply.
It will be 2 characters.
select char_length(U&'\+008FBA' || U&'\+0E0102');
char_length
-------------
2
(1 行)
select length('辺󠄂');
length
--------
2
(1 行)
select char_length('辺󠄂');
char_length
-------------
2
(1 行)
$ psql -l
データベース一覧
名前 | 所有者 | エンコーディング | 照合順序 | Ctype(変換演算子) | アクセス権限
-----------+---------+------------------+----------+-------------------+---------------------
D209007 | D209007 | UTF8 | C | C |
postgres | D209007 | UTF8 | C | C |
template0 | D209007 | UTF8 | C | C | =c/D209007 +
| | | | | D209007=CTc/D209007
template1 | D209007 | UTF8 | C | C | =c/D209007 +
| | | | | D209007=CTc/D209007
(4 行)
$ cat pgdata/PG_VERSION
13
Moto.
From: David G. Johnston <david.g.johnston@gmail.com>
Sent: Tuesday, March 29, 2022 12:38 PM
To: 荒井元成 <n2029@ndensan.co.jp>
Cc: pgsql-admin@lists.postgresql.org
Subject: Re: About Unicode IVS
On Monday, March 28, 2022, 荒井元成 <n2029@ndensan.co.jp> wrote:Hi,
In the Length () function, it will be 2 characters where you want it to be 1 character.
Is it possible to respond by changing the settings such as changing the collation setting like SQL Server?
Also, if you understand how to deal with it (eg, create your own function), it would be helpful if you could provide as much information as you can.
Try char_length(text) instead.
David J.
-- Holger Jakobs, Bergisch Gladbach, Tel. +49-178-9759012
Attachment
pgsql-admin by date: