Thread: Add pg_strtoupper and pg_strtolower functions
Hi, I came across pg_toupper and pg_tolower functions, converting a single character, are being used in loops to convert an entire null-terminated string. The cost of calling these character-based conversion functions (even though small) can be avoided if we have two new functions pg_strtoupper and pg_strtolower. Attaching a patch with these new two functions and their usage in most of the possible places in the code. Thoughts? Regards, Bharath Rupireddy.
Attachment
On Mon, May 2, 2022 at 6:21 PM Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote: > > Hi, > > I came across pg_toupper and pg_tolower functions, converting a single > character, are being used in loops to convert an entire > null-terminated string. The cost of calling these character-based > conversion functions (even though small) can be avoided if we have two > new functions pg_strtoupper and pg_strtolower. Have we measured the saving in cost? Let's say for a million character long string? > > Attaching a patch with these new two functions and their usage in most > of the possible places in the code. Converting pg_toupper and pg_tolower to "inline" might save cost similarly and also avoid code duplication? -- Best Wishes, Ashutosh Bapat
On Mon, May 2, 2022 at 6:43 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote: > > On Mon, May 2, 2022 at 6:21 PM Bharath Rupireddy > <bharath.rupireddyforpostgres@gmail.com> wrote: > > > > Hi, > > > > I came across pg_toupper and pg_tolower functions, converting a single > > character, are being used in loops to convert an entire > > null-terminated string. The cost of calling these character-based > > conversion functions (even though small) can be avoided if we have two > > new functions pg_strtoupper and pg_strtolower. > > Have we measured the saving in cost? Let's say for a million character > long string? I didn't spend time on figuring out the use-cases hitting all the code areas, even if I do so, the function call cost savings might not impress most of the time and the argument of saving function call cost then becomes pointless. > > Attaching a patch with these new two functions and their usage in most > > of the possible places in the code. > > Converting pg_toupper and pg_tolower to "inline" might save cost > similarly and also avoid code duplication? I think most of the modern compilers do inline small functions. But, inlining isn't always good as it increases the size of the code. With the proposed helper functions, the code looks cleaner (at least IMO, others may have different opinions though). Regards, Bharath Rupireddy.
On 2022-May-02, Bharath Rupireddy wrote: > Hi, > > I came across pg_toupper and pg_tolower functions, converting a single > character, are being used in loops to convert an entire > null-terminated string. The cost of calling these character-based > conversion functions (even though small) can be avoided if we have two > new functions pg_strtoupper and pg_strtolower. Currently, pg_toupper/pg_tolower are used in very limited situations. Are they really always safe enough to run in arbitrary situations, enough to create this new layer on top of them? Reading the comment on pg_tolower, "the whole thing is a bit bogus for multibyte charsets", I worry that we might create security holes, either now or in future callsites that use these new functions. Consider that in the Turkish locale you lowercase an I (single-byte ASCII character) with a dotless-i (two bytes). So overwriting the input string is not a great solution. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > Currently, pg_toupper/pg_tolower are used in very limited situations. > Are they really always safe enough to run in arbitrary situations, > enough to create this new layer on top of them? They are not, and we should absolutely not be encouraging additional uses of them. The existing multi-character str_toupper/str_tolower functions should be used instead. (Perhaps those should be relocated to someplace more prominent?) > Reading the comment on > pg_tolower, "the whole thing is a bit bogus for multibyte charsets", I > worry that we might create security holes, either now or in future > callsites that use these new functions. I doubt that they are security holes, but they do give unexpected answers in some locales. regards, tom lane