Home > mailing lists

Re: Improve the performance of Unicode Normalization Forms. - Mailing list pgsql-hackers

From	Jeff Davis
Subject	Re: Improve the performance of Unicode Normalization Forms.
Date	June 19 20:41:57
Msg-id	4211ffd7fe154c4af693b98d78f4a3689ce8cc30.camel@j-davis.com Whole thread Raw
In response to	Improve the performance of Unicode Normalization Forms. (Alexander Borisov <lex.borisov@gmail.com>)
List	pgsql-hackers

Tree view

On Tue, 2025-06-03 at 00:51 +0300, Alexander Borisov wrote:
> As promised, I continue to improve/speed up Unicode in Postgres.
> Last time, we improved the lower(), upper(), and casefold()
> functions. [1]
> Now it's time for Unicode Normalization Forms, specifically
> the normalize() function.

Did you compare against other implementations, such as ICU's
normalization functions? There's also a rust crate here:

https://github.com/unicode-rs/unicode-normalization

that might have been optimized.

In addition to the lookups themselves, there are other opportunities
for optimization as well, such as:

* reducing the need for palloc and extra buffers, perhaps by using
buffers on the stack for small strings

* operate more directly on UTF-8 data rather than decoding and re-
encoding the entire string

Regards,
    Jeff Davis

pgsql-hackers by date:

From: Robert Treat
Date: 19 June, 19:51:05
Subject: Re: Add CASEFOLD() function.

From: Michael Paquier
Date: 20 June, 03:02:00
Subject: Re: Issues with 2PC at recovery: CLOG lookups and GlobalTransactionData

Re: Improve the performance of Unicode Normalization Forms. - Mailing list pgsql-hackers

Previous

Next