Home > mailing lists

Re: BUG #19354: JOHAB rejects valid byte sequences - Mailing list pgsql-bugs

From	Jeroen Vermeulen
Subject	Re: BUG #19354: JOHAB rejects valid byte sequences
Date	December 16, 2025 10:42:09
Msg-id	CA+zULE47EXZOp7qKYODd+mjSgDiR-WX5ZNBkwdKnj-Zc0FT58w@mail.gmail.com Whole thread
In response to	Re: BUG #19354: JOHAB rejects valid byte sequences (VASUKI M <vasukianand0119@gmail.com>)
Responses	Re: BUG #19354: JOHAB rejects valid byte sequences
List	pgsql-bugs

Tree view

My one worry is perhaps Johab is on the list because one important user needed it.

But even then that requirement may have gone away?

Jeroen

On Tue, Dec 16, 2025, 07:23 VASUKI M <vasukianand0119@gmail.com> wrote:

Thanks all,That analysis makes a lot of sense.

Given the lack of a clear spec,the existence of multiple JOHAB variants,and how long this has apparently been "working" without anyone noticing,IMHO desupporting it does seem like the least risky option.At this point,trying to fix JOHAB variants feels like opening a pretty big can of worms,especially with the potential for dump/reload surprises or subtle parsing/security issues.

I don't have additional data to add,but +1 on removal or deprecation being a reasonable outcome here,given how obscure and effectively dead the encoding is nowadays.

Thanks for digging into this.
Cheers,
Vasuki M

On Tue, Dec 16, 2025 at 11:46 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jeroen Vermeulen <jtvjtv@gmail.com> writes:
> This bit worries me: "TlOther, vendor-defined, Johab variants also exist" —
> such as an EBCDIC-based one and a stateful one!

Yeah. So what we have here is:

1. Our JOHAB implementation has apparently been wrong since day one.

2. Wrongness may be in the eye of the beholder, since there are
multiple versions of JOHAB.

3. Your complaint is the first, AFAIR.

4. That wikipedia page says "Following the introduction of Unified
Hangul Code by Microsoft in Windows 95, and Hangul Word Processor
abandoning Johab in favour of Unicode in 2000, Johab ceased to be
commonly used."

Given these things, I wonder if we shouldn't desupport JOHAB
rather than attempt to fix it. Fixing would likely be a significant
amount of work: if we don't even have the character lengths right,
how likely is it that our conversions to other character sets are
correct? I also worry that if different PG versions have different
ideas of the mapping, there could be room for dump/reload problems,
and maybe even security problems related to the backslash issue.

regards, tom lane

pgsql-bugs by date:

From: VASUKI M
Date: 16 December 2025, 09:23:48
Subject: Re: BUG #19354: JOHAB rejects valid byte sequences

From: Tender Wang
Date: 16 December 2025, 14:30:45
Subject: Re: BUG #19353: Error XX000 if referencing expanded array in grouping set: variable not found in subplan target list

Re: BUG #19354: JOHAB rejects valid byte sequences - Mailing list pgsql-bugs

Previous

Next