Thread: replace strtok()

replace strtok()

From
Peter Eisentraut
Date:
Under the topic of getting rid of thread-unsafe functions in the backend 
[0], here is a patch series to deal with strtok().

Of course, strtok() is famously not thread-safe and can be replaced by 
strtok_r().  But it also has the wrong semantics in some cases, because 
it considers adjacent delimiters to be one delimiter.  So if you parse

     SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>

with strtok(), then

     SCRAM-SHA-256$$<iterations>::<salt>$$<storedkey>::<serverkey>

parses just the same.  In many cases, this is arguably wrong and could 
hide mistakes.

So I'm suggesting to use strsep() in those places.  strsep() is 
nonstandard but widely available.

There are a few places where strtok() has the right semantics, such as 
parsing tokens separated by whitespace.  For those, I'm using strtok_r().

A reviewer job here would be to check whether I made that distinction 
correctly in each case.

On the portability side, I'm including a port/ replacement for strsep() 
and some workaround to get strtok_r() for Windows.  I have included 
these here as separate patches for clarity.


[0]: 
https://www.postgresql.org/message-id/856e5ec3-879f-42ee-8258-8bcc6ec9bdea@eisentraut.org
Attachment

Re: replace strtok()

From
Ranier Vilela
Date:
Em ter., 18 de jun. de 2024 às 04:18, Peter Eisentraut <peter@eisentraut.org> escreveu:
Under the topic of getting rid of thread-unsafe functions in the backend
[0], here is a patch series to deal with strtok().

Of course, strtok() is famously not thread-safe and can be replaced by
strtok_r().  But it also has the wrong semantics in some cases, because
it considers adjacent delimiters to be one delimiter.  So if you parse

     SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>

with strtok(), then

     SCRAM-SHA-256$$<iterations>::<salt>$$<storedkey>::<serverkey>

parses just the same.  In many cases, this is arguably wrong and could
hide mistakes.

So I'm suggesting to use strsep() in those places.  strsep() is
nonstandard but widely available.

There are a few places where strtok() has the right semantics, such as
parsing tokens separated by whitespace.  For those, I'm using strtok_r().

A reviewer job here would be to check whether I made that distinction
correctly in each case.

On the portability side, I'm including a port/ replacement for strsep()
and some workaround to get strtok_r() for Windows.  I have included
these here as separate patches for clarity.
+1 For making the code thread-safe.
But I would like to see more const char * where this is possible.

For example, in pg_locale.c
IMO, the token variable can be const char *.

At least strchr expects a const char * as the first parameter.

I found another implementation of strsep, it seems lighter to me.
I will attach it for consideration, however, I have not done any testing. 

best regards,
Ranier Vilela
Attachment

Re: replace strtok()

From
Kyotaro Horiguchi
Date:
At Tue, 18 Jun 2024 09:18:28 +0200, Peter Eisentraut <peter@eisentraut.org> wrote in 
> Under the topic of getting rid of thread-unsafe functions in the
> backend [0], here is a patch series to deal with strtok().
> 
> Of course, strtok() is famously not thread-safe and can be replaced by
> strtok_r().  But it also has the wrong semantics in some cases,
> because it considers adjacent delimiters to be one delimiter.  So if
> you parse
> 
>     SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>
> 
> with strtok(), then
> 
>     SCRAM-SHA-256$$<iterations>::<salt>$$<storedkey>::<serverkey>
> 
> parses just the same.  In many cases, this is arguably wrong and could
> hide mistakes.
> 
> So I'm suggesting to use strsep() in those places.  strsep() is
> nonstandard but widely available.
> 
> There are a few places where strtok() has the right semantics, such as
> parsing tokens separated by whitespace.  For those, I'm using
> strtok_r().

I agree with the distinction.

> A reviewer job here would be to check whether I made that distinction
> correctly in each case.

0001 and 0002 look correct to me regarding that distinction. They
applied correctly to the master HEAD and all tests passed on Linux.

> On the portability side, I'm including a port/ replacement for
> strsep() and some workaround to get strtok_r() for Windows.  I have
> included these here as separate patches for clarity.

0003 looks fine and successfully built and seems working on an MSVC
build.

About 0004, Cygwin seems to have its own strtok_r, but I haven't
checked how that fact affects the build.

> [0]:
> https://www.postgresql.org/message-id/856e5ec3-879f-42ee-8258-8bcc6ec9bdea@eisentraut.org

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: replace strtok()

From
Peter Eisentraut
Date:
On 18.06.24 13:43, Ranier Vilela wrote:
> But I would like to see more const char * where this is possible.
> 
> For example, in pg_locale.c
> IMO, the token variable can be const char *.
> 
> At least strchr expects a const char * as the first parameter.

This would not be future-proof.  In C23, if you pass a const char * into 
strchr(), you also get a const char * as a result.  And in this case, we 
do write into the area pointed to by the result.  So with a const char 
*token, this whole thing would not compile cleanly under C23.

> I found another implementation of strsep, it seems lighter to me.
> I will attach it for consideration, however, I have not done any testing.

Yeah, surely there are many possible implementations.  I'm thinking, 
since we already took other str*() functions from OpenBSD, it makes 
sense to do this here as well, so we have only one source to deal with.




Re: replace strtok()

From
Tom Lane
Date:
Peter Eisentraut <peter@eisentraut.org> writes:
> On 18.06.24 13:43, Ranier Vilela wrote:
>> I found another implementation of strsep, it seems lighter to me.
>> I will attach it for consideration, however, I have not done any testing.

> Yeah, surely there are many possible implementations.  I'm thinking, 
> since we already took other str*() functions from OpenBSD, it makes 
> sense to do this here as well, so we have only one source to deal with.

Why not use strpbrk?  That's equally thread-safe, it's been there
since C89, and it doesn't have the problem that you can't find out
which of the delimiter characters was found.

            regards, tom lane



Re: replace strtok()

From
Michael Paquier
Date:
On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote:
> Peter Eisentraut <peter@eisentraut.org> writes:
> > On 18.06.24 13:43, Ranier Vilela wrote:
> >> I found another implementation of strsep, it seems lighter to me.
> >> I will attach it for consideration, however, I have not done any testing.
>
> > Yeah, surely there are many possible implementations.  I'm thinking,
> > since we already took other str*() functions from OpenBSD, it makes
> > sense to do this here as well, so we have only one source to deal with.
>
> Why not use strpbrk?  That's equally thread-safe, it's been there
> since C89, and it doesn't have the problem that you can't find out
> which of the delimiter characters was found.

Yeah, strpbrk() has been used in the tree as far as 2003 without any
port/ implementation.
--
Michael

Attachment

Re: replace strtok()

From
Peter Eisentraut
Date:
On 24.06.24 02:34, Michael Paquier wrote:
> On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote:
>> Peter Eisentraut <peter@eisentraut.org> writes:
>>> On 18.06.24 13:43, Ranier Vilela wrote:
>>>> I found another implementation of strsep, it seems lighter to me.
>>>> I will attach it for consideration, however, I have not done any testing.
>>
>>> Yeah, surely there are many possible implementations.  I'm thinking,
>>> since we already took other str*() functions from OpenBSD, it makes
>>> sense to do this here as well, so we have only one source to deal with.
>>
>> Why not use strpbrk?  That's equally thread-safe, it's been there
>> since C89, and it doesn't have the problem that you can't find out
>> which of the delimiter characters was found.
> 
> Yeah, strpbrk() has been used in the tree as far as 2003 without any
> port/ implementation.

The existing uses of strpbrk() are really just checking whether some 
characters exist in a string, more like an enhanced strchr().  I don't 
see any uses for tokenizing a string like strtok() or strsep() would do. 
  I think that would look quite cumbersome.  So I think a simpler and 
more convenient abstraction like strsep() would still be worthwhile.



Re: replace strtok()

From
David Steele
Date:
On 6/24/24 19:57, Peter Eisentraut wrote:
> On 24.06.24 02:34, Michael Paquier wrote:
>> On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote:
>>> Peter Eisentraut <peter@eisentraut.org> writes:
>>>> On 18.06.24 13:43, Ranier Vilela wrote:
>>>>> I found another implementation of strsep, it seems lighter to me.
>>>>> I will attach it for consideration, however, I have not done any 
>>>>> testing.
>>>
>>>> Yeah, surely there are many possible implementations.  I'm thinking,
>>>> since we already took other str*() functions from OpenBSD, it makes
>>>> sense to do this here as well, so we have only one source to deal with.
>>>
>>> Why not use strpbrk?  That's equally thread-safe, it's been there
>>> since C89, and it doesn't have the problem that you can't find out
>>> which of the delimiter characters was found.
>>
>> Yeah, strpbrk() has been used in the tree as far as 2003 without any
>> port/ implementation.
> 
> The existing uses of strpbrk() are really just checking whether some 
> characters exist in a string, more like an enhanced strchr().  I don't 
> see any uses for tokenizing a string like strtok() or strsep() would do. 
>   I think that would look quite cumbersome.  So I think a simpler and 
> more convenient abstraction like strsep() would still be worthwhile.

I agree that using strsep() in these cases seems more natural. Since 
this patch provides a default implementation compatibility does not seem 
like a big issue.

I've also reviewed the rest of the patch and it looks good to me.

Regards,
-David



Re: replace strtok()

From
Peter Eisentraut
Date:
On 08.07.24 07:45, David Steele wrote:
> On 6/24/24 19:57, Peter Eisentraut wrote:
>> On 24.06.24 02:34, Michael Paquier wrote:
>>> On Sat, Jun 22, 2024 at 11:48:21AM -0400, Tom Lane wrote:
>>>> Peter Eisentraut <peter@eisentraut.org> writes:
>>>>> On 18.06.24 13:43, Ranier Vilela wrote:
>>>>>> I found another implementation of strsep, it seems lighter to me.
>>>>>> I will attach it for consideration, however, I have not done any 
>>>>>> testing.
>>>>
>>>>> Yeah, surely there are many possible implementations.  I'm thinking,
>>>>> since we already took other str*() functions from OpenBSD, it makes
>>>>> sense to do this here as well, so we have only one source to deal 
>>>>> with.
>>>>
>>>> Why not use strpbrk?  That's equally thread-safe, it's been there
>>>> since C89, and it doesn't have the problem that you can't find out
>>>> which of the delimiter characters was found.
>>>
>>> Yeah, strpbrk() has been used in the tree as far as 2003 without any
>>> port/ implementation.
>>
>> The existing uses of strpbrk() are really just checking whether some 
>> characters exist in a string, more like an enhanced strchr().  I don't 
>> see any uses for tokenizing a string like strtok() or strsep() would 
>> do.   I think that would look quite cumbersome.  So I think a simpler 
>> and more convenient abstraction like strsep() would still be worthwhile.
> 
> I agree that using strsep() in these cases seems more natural. Since 
> this patch provides a default implementation compatibility does not seem 
> like a big issue.
> 
> I've also reviewed the rest of the patch and it looks good to me.

This has been committed.  Thanks.




Re: replace strtok()

From
Alexander Lakhin
Date:
Hello Peter,

23.07.2024 15:38, Peter Eisentraut wrote:
> This has been committed.  Thanks.

Please look at the SCRAM secret, which breaks parse_scram_secret(),
perhaps because strsep() doesn't return NULL where strtok() did:

CREATE ROLE r PASSWORD 

'SCRAM-SHA-256$4096:hpFyHTUsSWcR7O9P$LgZFIt6Oqdo27ZFKbZ2nV+=vtnYM995pDh9ca6WSi120qVV5NeluNfUPkwm7Vqat25RjSPLkGeoZBQs6wVv+um4=';

Core was generated by `postgres: law regression [local] CREATE ROLE                                  '.
Program terminated with signal SIGSEGV, Segmentation fault.

#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
(gdb) bt
#0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
#1  0x0000563625e9e5b0 in parse_scram_secret (...) at auth-scram.c:655

Best regards,
Alexander



Re: replace strtok()

From
Ranier Vilela
Date:


Em ter., 15 de out. de 2024 às 03:45, Peter Eisentraut <peter@eisentraut.org> escreveu:
On 10.10.24 14:59, Ranier Vilela wrote:
>     Please look at the SCRAM secret, which breaks parse_scram_secret(),
>     perhaps because strsep() doesn't return NULL where strtok() did:
>
>     CREATE ROLE r PASSWORD
>     'SCRAM-
>     SHA-256$4096:hpFyHTUsSWcR7O9P$LgZFIt6Oqdo27ZFKbZ2nV+=vtnYM995pDh9ca6WSi120qVV5NeluNfUPkwm7Vqat25RjSPLkGeoZBQs6wVv+um4=';
>
>     Core was generated by `postgres: law regression [local] CREATE
>     ROLE                                  '.
>     Program terminated with signal SIGSEGV, Segmentation fault.
>
>     #0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
>     (gdb) bt
>     #0  __strlen_avx2 () at ../sysdeps/x86_64/multiarch/strlen-avx2.S:74
>     #1  0x0000563625e9e5b0 in parse_scram_secret (...) at auth-scram.c:655
>
> Thanks for the report.
>
> It seems to me that it could be due to incorrect use of the strsep function.
> See:
> https://man7.org/linux/man-pages/man3/strsep.3.html <https://man7.org/
> linux/man-pages/man3/strsep.3.html>
> "
>
> In case no delimiter was found, the token
>         is taken to be the entire string/*stringp/, and/*stringp/ is made
>         NULL.
>
> "
> So, it is necessary to check the *stringp* against NULL too.

Thanks for the analysis.  I think moreover we *only* need to check the
"stringp" for NULL, not the return value of strsep(), which would never
be NULL in our case.  So I propose the attached patch as a variant of yours.
I'm not 100% sure, but the contrib passwordcheck uses and It's not very safe.
The checks of NULL return are cheap, and will protect unwary users.

So I'm neutral here.

Notice that the report is only by Alexander Lakhin.
I'm at most a reviewer here.

best regards,
Ranier Vilela

Re: replace strtok()

From
Alexander Lakhin
Date:
Hello Ranier and Peter,

15.10.2024 14:44, Ranier Vilela wrote:


Em ter., 15 de out. de 2024 às 03:45, Peter Eisentraut <peter@eisentraut.org> escreveu:

Thanks for the analysis.  I think moreover we *only* need to check the
"stringp" for NULL, not the return value of strsep(), which would never
be NULL in our case.  So I propose the attached patch as a variant of yours.
I'm not 100% sure, but the contrib passwordcheck uses and It's not very safe.
The checks of NULL return are cheap, and will protect unwary users.

So I'm neutral here.

I also wonder, if other places touched by 5d2e1cc11 need corrections too.
I played with
PG_COLOR=always PG_COLORS="error=01;31" .../initdb

and it looks like this free() call in pg_logging_init():
            char       *colors = strdup(pg_colors_env);

            if (colors)
            {
...
                while ((token = strsep(&colors, ":")))
                {
...
                }

                free(colors);
            }
gets null in colors.

Best regards,
Alexander

Re: replace strtok()

From
Ranier Vilela
Date:


Em ter., 15 de out. de 2024 às 09:00, Alexander Lakhin <exclusion@gmail.com> escreveu:
Hello Ranier and Peter,

15.10.2024 14:44, Ranier Vilela wrote:


Em ter., 15 de out. de 2024 às 03:45, Peter Eisentraut <peter@eisentraut.org> escreveu:

Thanks for the analysis.  I think moreover we *only* need to check the
"stringp" for NULL, not the return value of strsep(), which would never
be NULL in our case.  So I propose the attached patch as a variant of yours.
I'm not 100% sure, but the contrib passwordcheck uses and It's not very safe.
The checks of NULL return are cheap, and will protect unwary users.

So I'm neutral here.

I also wonder, if other places touched by 5d2e1cc11 need corrections too.
I played with
PG_COLOR=always PG_COLORS="error=01;31" .../initdb

and it looks like this free() call in pg_logging_init():
            char       *colors = strdup(pg_colors_env);

            if (colors)
            {
...
                while ((token = strsep(&colors, ":")))
                {
...
                }

                free(colors);
            }
gets null in colors.
Yeah, I also saw this usage, but I was waiting for a definition for the first report.
The solution IMO, would be the same.

diff --git a/src/common/logging.c b/src/common/logging.c
index aedd1ae2d8..45b5316d48 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -121,7 +121,7 @@ pg_logging_init(const char *argv0)
  {
  char   *token;
 
- while ((token = strsep(&colors, ":")))
+ while ((token = strsep(&colors, ":")) != NULL && colors != NULL)
  {
  char   *e = strchr(token, '=');
 
The advantage of this change is that it would avoid processing unnecessary tokens.

best regards,
Ranier Vilela

Re: replace strtok()

From
Peter Eisentraut
Date:
On 15.10.24 14:07, Ranier Vilela wrote:
>     I also wonder, if other places touched by 5d2e1cc11 need corrections
>     too.
>     I played with
>     PG_COLOR=always PG_COLORS="error=01;31" .../initdb
> 
>     and it looks like this free() call in pg_logging_init():
>                  char       *colors = strdup(pg_colors_env);
> 
>                  if (colors)
>                  {
>     ...
>                      while ((token = strsep(&colors, ":")))
>                      {
>     ...
>                      }
> 
>                      free(colors);
>                  }
>     gets null in colors.
> 
> Yeah, I also saw this usage, but I was waiting for a definition for the 
> first report.
> The solution IMO, would be the same.
> 
> diff --git a/src/common/logging.c b/src/common/logging.c
> index aedd1ae2d8..45b5316d48 100644
> --- a/src/common/logging.c
> +++ b/src/common/logging.c
> @@ -121,7 +121,7 @@ pg_logging_init(const char *argv0)
>    {
>    char   *token;
> 
> - while ((token = strsep(&colors, ":")))
> + while ((token = strsep(&colors, ":")) != NULL && colors != NULL)
>    {
>    char   *e = strchr(token, '=');
> The advantage of this change is that it would avoid processing 
> unnecessary tokens.

This wouldn't fix anything, I think.  If colors is NULL, then strsep() 
already returns NULL, so the added code does nothing.




Re: replace strtok()

From
Ranier Vilela
Date:
Em qua., 16 de out. de 2024 às 04:45, Peter Eisentraut <peter@eisentraut.org> escreveu:
On 15.10.24 14:07, Ranier Vilela wrote:
>     I also wonder, if other places touched by 5d2e1cc11 need corrections
>     too.
>     I played with
>     PG_COLOR=always PG_COLORS="error=01;31" .../initdb
>
>     and it looks like this free() call in pg_logging_init():
>                  char       *colors = strdup(pg_colors_env);
>
>                  if (colors)
>                  {
>     ...
>                      while ((token = strsep(&colors, ":")))
>                      {
>     ...
>                      }
>
>                      free(colors);
>                  }
>     gets null in colors.
>
> Yeah, I also saw this usage, but I was waiting for a definition for the
> first report.
> The solution IMO, would be the same.
>
> diff --git a/src/common/logging.c b/src/common/logging.c
> index aedd1ae2d8..45b5316d48 100644
> --- a/src/common/logging.c
> +++ b/src/common/logging.c
> @@ -121,7 +121,7 @@ pg_logging_init(const char *argv0)
>    {
>    char   *token;
>
> - while ((token = strsep(&colors, ":")))
> + while ((token = strsep(&colors, ":")) != NULL && colors != NULL)
>    {
>    char   *e = strchr(token, '=');
> The advantage of this change is that it would avoid processing
> unnecessary tokens.

This wouldn't fix anything, I think.  If colors is NULL, then strsep()
already returns NULL, so the added code does nothing.
If *colors* is NULL, then the delimiter is not found and strsep will return the entire
string *stringp, so the token becomes invalid.

IMO, I think it must be necessary to check if *colors* are NULL too.

best regards,
Ranier Vilela

Re: replace strtok()

From
Peter Eisentraut
Date:
On 16.10.24 09:42, Peter Eisentraut wrote:
> On 15.10.24 14:00, Alexander Lakhin wrote:
>> I also wonder, if other places touched by 5d2e1cc11 need corrections too.
>> I played with
>> PG_COLOR=always PG_COLORS="error=01;31" .../initdb
>>
>> and it looks like this free() call in pg_logging_init():
>>              char       *colors = strdup(pg_colors_env);
>>
>>              if (colors)
>>              {
>> ...
>>                  while ((token = strsep(&colors, ":")))
>>                  {
>> ...
>>                  }
>>
>>                  free(colors);
>>              }
>> gets null in colors.
> 
> Yes, this is indeed incorrect.  We need to keep a separate pointer to 
> the start of the string to free later.  This matches the example on the 
> strsep man page (https://man.freebsd.org/cgi/man.cgi?strsep(3)).  Patch 
> attached.

I have committed both fixes mentioned in this thread in the last couple 
of days.




Re: replace strtok()

From
Ranier Vilela
Date:
Em sex., 18 de out. de 2024 às 06:41, Peter Eisentraut <peter@eisentraut.org> escreveu:
On 16.10.24 09:42, Peter Eisentraut wrote:
> On 15.10.24 14:00, Alexander Lakhin wrote:
>> I also wonder, if other places touched by 5d2e1cc11 need corrections too.
>> I played with
>> PG_COLOR=always PG_COLORS="error=01;31" .../initdb
>>
>> and it looks like this free() call in pg_logging_init():
>>              char       *colors = strdup(pg_colors_env);
>>
>>              if (colors)
>>              {
>> ...
>>                  while ((token = strsep(&colors, ":")))
>>                  {
>> ...
>>                  }
>>
>>                  free(colors);
>>              }
>> gets null in colors.
>
> Yes, this is indeed incorrect.  We need to keep a separate pointer to
> the start of the string to free later.  This matches the example on the
> strsep man page (https://man.freebsd.org/cgi/man.cgi?strsep(3)).  Patch
> attached.

I have committed both fixes mentioned in this thread in the last couple
of days.
Thanks.
 
Thanks Alexander, for the hard work.

best regards,
Ranier Vilela