Re: [PATCH] Add roman support for to_number function - Mailing list pgsql-hackers

From Hunaid Sohail
Subject Re: [PATCH] Add roman support for to_number function
Date
Msg-id CAMWA6yaXq9=7rJdS9LVkmYB0d-AYt3pLC0hpBe9+stnM7W+SQw@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Add roman support for to_number function  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: [PATCH] Add roman support for to_number function
List pgsql-hackers
Hi Tom,

On Sat, Jan 18, 2025 at 5:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Hunaid Sohail <hunaidpgml@gmail.com> writes:
> I’ve attached a new patch that addresses comments 2, 3, and 4 from Tomas.

I'm still quite unhappy that this doesn't tolerate trailing
whitespace.  That's not how other format patterns work, and it makes
it impossible to round-trip the output of to_char(..., 'RN').
I think the core of the problem is that you tried to cram the logic
in where the existing "not implemented" error is thrown.  But on
inspection there is nothing sane about that entire "Roman correction"
stanza -- what it does is either useless (zeroing already-zeroed
fields) or in the wrong place (if we don't want to allow other flags
with IS_ROMAN, that should be done via an error in NUMDesc_prepare,
like other similar cases).  In the attached I moved the roman_to_int
call into NUM_processor's main loop so it's more like other format
patterns, and fixed it to not eat any more characters than it should.
This might allow putting back some format-combination capabilities,
but other than the whitespace angle I figure we can leave that for
people to request it.

Thank you for the tweaks to the patch. Overall, it looks great.

Initially, when I looked at the test case:
SELECT to_number('M CC', 'RN');

I was confused about whether it should be 'MCC'. After further inspection,
I realized that the output is 1000 for 'M'. The format of the input can be a
bit unclear. Perhaps we could add a note in the documentation
or issue a warning in roman_to_int function when input is truncated,
to clarify this behavior.

+       bool            truncated = false;
+
+       /*
+        * Collect and decode valid roman numerals, consuming at most
+        * MAX_ROMAN_LEN characters.  We do this in a separate loop to avoid
+        * repeated decoding and because the main loop needs to know when it's at
+        * the last numeral.
+        */
+       for (len = 0; len < MAX_ROMAN_LEN && !OVERLOAD_TEST; len++)
+       {
+               char            currChar = pg_ascii_toupper(*Np->inout_p);
+               int                     currValue = ROMAN_VAL(currChar);
+
+               if (currValue == 0)
+               {
+                       truncated = true;
+                       break;                          /* Not a valid roman numeral. */
+               }
+               romanChars[len] = currChar;
+               romanValues[len] = currValue;
+               Np->inout_p++;
+       }
+
+       if (len == 0)
+               return -1;                              /* No valid roman numerals. */
+
+       /* Check for truncation. */
+       if (truncated)
+       {
+               ereport(WARNING,
+                               (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+                                errmsg("Input truncated to \"%.*s\"",
+                                               len, romanChars)));
+       }

Output after change:

postgres=# SELECT to_number('MMXX  ', 'RN');
WARNING:  Input truncated to "MMXX"
 to_number
-----------
      2020
(1 row)

postgres=# SELECT to_number('  XIV  ', '  RN');
WARNING:  Input truncated to "XIV"
 to_number
-----------
        14
(1 row)

postgres=# SELECT to_number('M CC', 'RN');
WARNING:  Input truncated to "M"
 to_number
-----------
      1000
(1 row)

Also, some modifications to the test cases will be required
if we go with these changes.

Regards,
Hunaid Sohail 

pgsql-hackers by date:

Previous
From: Guillaume Lelarge
Date:
Subject: Re: improve DEBUG1 logging of parallel workers for CREATE INDEX?
Next
From: Mats Kindahl
Date:
Subject: Re: Coccinelle for PostgreSQL development [1/N]: coccicheck.py