Thread: Unicode normalization test broken output
I was playing with the Unicode normalization test in src/common/unicode/. I think there is something wrong with how the test program reports failures. For example, if I manually edit the norm_test_table.h to make a failure, like - { 74, { 0x00A8, 0 }, { 0x0020, 0x0308, 0 } }, + { 74, { 0x00A8, 0 }, { 0x0020, 0x0309, 0 } }, then the output from the test is FAILURE (NormalizationTest.txt line 74): input: 00 expected: 0003 got 0003 which doesn't make sense. There appear to be several off-by-more-than-one errors in norm_test.c print_wchar_str(). Attached is a patch to fix this (and make the output a bit prettier). Result afterwards: FAILURE (NormalizationTest.txt line 74): input: U+00A8 expected: U+0020 U+0309 got: U+0020 U+0308 -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > There appear to be several off-by-more-than-one errors in norm_test.c > print_wchar_str(). Attached is a patch to fix this (and make the output > a bit prettier). Result afterwards: I concur that this looks broken and your patch improves it. But I'm not very happy about the remaining assumption that we don't have to worry about characters above U+FFFF. I'd rather see it allocate 11 bytes per allowed pg_wchar, and manage the string contents with something like p += sprintf(p, "U+%04X ", *s); An alternative fix would be to start using a PQExpBuffer, but it's probably not quite worth that. regards, tom lane
On 2019-12-09 23:22, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> There appear to be several off-by-more-than-one errors in norm_test.c >> print_wchar_str(). Attached is a patch to fix this (and make the output >> a bit prettier). Result afterwards: > > I concur that this looks broken and your patch improves it. > But I'm not very happy about the remaining assumption that > we don't have to worry about characters above U+FFFF. I'd > rather see it allocate 11 bytes per allowed pg_wchar, and > manage the string contents with something like > > p += sprintf(p, "U+%04X ", *s); Good point. Fixed in attached patch. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Attachment
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > Good point. Fixed in attached patch. This one LGTM. regards, tom lane
On 2019-12-10 17:16, Tom Lane wrote: > Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: >> Good point. Fixed in attached patch. > > This one LGTM. done, thanks -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services