Thread: Unicode normalization test broken output

Unicode normalization test broken output

From

Peter Eisentraut

Date:

09 December 2019, 11:05:30

I was playing with the Unicode normalization test in 
src/common/unicode/.  I think there is something wrong with how the test 
program reports failures.  For example, if I manually edit the 
norm_test_table.h to make a failure, like

-    { 74, { 0x00A8, 0 }, { 0x0020, 0x0308, 0 } },
+    { 74, { 0x00A8, 0 }, { 0x0020, 0x0309, 0 } },

then the output from the test is

FAILURE (NormalizationTest.txt line 74):
input:    00
expected:    0003
got    0003

which doesn't make sense.

There appear to be several off-by-more-than-one errors in norm_test.c 
print_wchar_str().  Attached is a patch to fix this (and make the output 
a bit prettier).  Result afterwards:

FAILURE (NormalizationTest.txt line 74):
input:    U+00A8
expected: U+0020 U+0309
got:      U+0020 U+0308

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

0001-Fix-output-of-Unicode-normalization-test.patch

Re: Unicode normalization test broken output

From

Tom Lane

Date:

09 December 2019, 22:22:39

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> There appear to be several off-by-more-than-one errors in norm_test.c 
> print_wchar_str().  Attached is a patch to fix this (and make the output 
> a bit prettier).  Result afterwards:

I concur that this looks broken and your patch improves it.
But I'm not very happy about the remaining assumption that
we don't have to worry about characters above U+FFFF.  I'd
rather see it allocate 11 bytes per allowed pg_wchar, and
manage the string contents with something like

    p += sprintf(p, "U+%04X ", *s);

An alternative fix would be to start using a PQExpBuffer, but
it's probably not quite worth that.

            regards, tom lane

Re: Unicode normalization test broken output

From

Peter Eisentraut

Date:

10 December 2019, 12:18:31

On 2019-12-09 23:22, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> There appear to be several off-by-more-than-one errors in norm_test.c
>> print_wchar_str().  Attached is a patch to fix this (and make the output
>> a bit prettier).  Result afterwards:
> 
> I concur that this looks broken and your patch improves it.
> But I'm not very happy about the remaining assumption that
> we don't have to worry about characters above U+FFFF.  I'd
> rather see it allocate 11 bytes per allowed pg_wchar, and
> manage the string contents with something like
> 
>     p += sprintf(p, "U+%04X ", *s);

Good point.  Fixed in attached patch.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment

v2-0001-Fix-output-of-Unicode-normalization-test.patch

Re: Unicode normalization test broken output

From

Tom Lane

Date:

10 December 2019, 16:16:49

Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> Good point.  Fixed in attached patch.

This one LGTM.

            regards, tom lane

Re: Unicode normalization test broken output

From

Peter Eisentraut

Date:

11 December 2019, 07:45:26

On 2019-12-10 17:16, Tom Lane wrote:
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
>> Good point.  Fixed in attached patch.
> 
> This one LGTM.

done, thanks

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services