Thread: -Wformat-signedness

-Wformat-signedness

From

Thomas Munro

Date:

29 October 2020, 21:37:07

Hi hackers,

There're probably mostly harmless, being mostly error and debug
messages and the like, and considering that eg OID parsing tolerates
negative numbers when reading them back in, but for what it's worth:
GCC complains about many %d vs %u type mixups if you build with
$SUBJECT.

Re: -Wformat-signedness

From

Peter Eisentraut

Date:

09 November 2020, 15:02:01

On 2020-10-29 22:37, Thomas Munro wrote:
> There're probably mostly harmless, being mostly error and debug
> messages and the like, and considering that eg OID parsing tolerates
> negative numbers when reading them back in, but for what it's worth:
> GCC complains about many %d vs %u type mixups if you build with
> $SUBJECT.

I had looked into this some time ago.  I have dusted off my patch again. 
The attached version fixes all warnings for me.

The following are the main categories of issues:

1. enums are unsigned by default in gcc, so all those internal error 
messages "unrecognized blah kind: %d" need to be changed to %u.

I have split that into its own patch since it's easily separable.  All 
the remaining issues are in one patch.

2. Various trickery at the boundary of internal counters that are 
unsigned and external functions or views using signed types.  These need 
another look.

3. Various messages print signed values using %x formats, which need to 
be unsigned.  These might also need another look.

4. Issues with constants being signed by default.  For example, things 
like elog(ERROR, "foo is %u but should be %u", somevar, 55) warns 
because of the constant.  Should be changed to something like 55U for 
symmetry, or change the %u to %d.  This also reaches into genbki 
territory with all the OID constants being generated.

5. Some "surprising" but correct C behavior.  For example, unsigned 
short is promoted to int (not unsigned int) in variable arguments, so 
needs a %d format.

6. Finally, a bunch of uses were just plain wrong and should be corrected.

I haven't found anything that is a really serious bug, but I imagine you 
could run into trouble in various ways when you exceed the INT_MAX 
value.  But then again, if you use up INT_MAX WAL timelines, you 
probably have other problems. ;-)

Attachment

Re: -Wformat-signedness

From

Tom Lane

Date:

09 November 2020, 15:25:24

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> 1. enums are unsigned by default in gcc, so all those internal error 
> messages "unrecognized blah kind: %d" need to be changed to %u.

Do we have reason to think that that is true in every C compiler?
My own preference for this would be to leave the messages as-is
and add explicit "(int)" casts to the arguments.  There are some
fraction of these that are like that already.

            regards, tom lane

Re: -Wformat-signedness

From

Thomas Munro

Date:

09 November 2020, 22:43:26

On Tue, Nov 10, 2020 at 4:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> > 1. enums are unsigned by default in gcc, so all those internal error
> > messages "unrecognized blah kind: %d" need to be changed to %u.
>
> Do we have reason to think that that is true in every C compiler?
> My own preference for this would be to leave the messages as-is
> and add explicit "(int)" casts to the arguments.  There are some
> fraction of these that are like that already.

From experimentation, it seems that GCC enumerator constants are int,
but enum variables are int or signed int depending on whether any
negative values were defined.  Valid values have to be representable
as int anyway regardless of what size and signedness a compiler
chooses to use, so yeah, +1 for casting to int.

Re: -Wformat-signedness

From

Andy Fan

Date:

27 October 2024, 06:59:51

Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:

Hi,

> On 2020-10-29 22:37, Thomas Munro wrote:
>> There're probably mostly harmless, being mostly error and debug
>> messages and the like, and considering that eg OID parsing tolerates
>> negative numbers when reading them back in, but for what it's worth:
>> GCC complains about many %d vs %u type mixups if you build with
>> $SUBJECT.
>
> I had looked into this some time ago.  I have dusted off my patch
> again. The attached version fixes all warnings for me.

When Dean pointed me this thread[1], I was thinking we need to add the
"-Wformat-signedness" and fix all the existing warnning. Then after some
research, it is not such easy and seems we need some agreement first if
we want to fix them.  

> The following are the main categories of issues:
>
> 1. enums are unsigned by default in gcc, so all those internal error
> messages "unrecognized blah kind: %d" need to be changed to %u.

IIUC, we have agreed that we should cast enum to int and continue to use
"%d". At least Tom suggested this and Thomas agreed this [1] and Peter
didn't raise any opposition.

> 2. Various trickery at the boundary of internal counters that are
> unsigned and external functions or views using signed types.  These need
> another look.

I also noticed we lack of UNSIGNED INT32/64 SQL type.  Changing the
counter to signed looks not good to me as well. This one looks doesn't
have an agreement yet.  

> 3. Various messages print signed values using %x formats, which need to
> be unsigned.  These might also need another look.
>
> 4. Issues with constants being signed by default.  For example, things
> like elog(ERROR, "foo is %u but should be %u", somevar, 55) warns
> because of the constant.  Should be changed to something like 55U for
> symmetry, or change the %u to %d.  This also reaches into genbki
> territory with all the OID constants being generated.
>
> 5. Some "surprising" but correct C behavior.  For example, unsigned
> short is promoted to int (not unsigned int) in variable arguments, so
> needs a %d format.
>
> 6. Finally, a bunch of uses were just plain wrong and should be corrected.

7. __FILE__ in gcc is 'int', but we elog() it with "%u".  Should we
change it to "%d"? 

> I haven't found anything that is a really serious bug, but I imagine you
> could run into trouble in various ways when you exceed the INT_MAX
> value.  But then again, if you use up INT_MAX WAL timelines, you
> probably have other problems. ;-)

Me too, just that want some clean code:) But FWIW, "-Wformat-signedness"
is not supported by clang so far, so if people is using clang, they
still can't benefit from this changes. My soluation (I use clang
everyday) is adding a "gcc-checker" for my c file, if I make such
mistake, it can remind me directly.  

[0] https://www.postgresql.org/message-id/874j4yl4cj.fsf%40163.com 
[1]
https://www.postgresql.org/message-id/CA%2BhUKGJNUk434tcsVbs5YUGsujZbveo43QcZeWbv0xPzg9us-A%40mail.gmail.com

-- 
Best Regards
Andy Fan

Re: -Wformat-signedness

From

Peter Eisentraut

Date:

29 October 2024, 09:38:36

On 27.10.24 04:59, Andy Fan wrote:
>> I haven't found anything that is a really serious bug, but I imagine you
>> could run into trouble in various ways when you exceed the INT_MAX
>> value.  But then again, if you use up INT_MAX WAL timelines, you
>> probably have other problems. ;-)
> Me too, just that want some clean code:) But FWIW, "-Wformat-signedness"
> is not supported by clang so far, so if people is using clang, they
> still can't benefit from this changes.

clang 19 supports it now.

> My soluation (I use clang
> everyday) is adding a "gcc-checker" for my c file, if I make such
> mistake, it can remind me directly.

I think it could be useful to set up some better test coverage for 
various things overflowing signed integer maximums.  For example, maybe 
you could hack initdb to advance the OID counter to INT32_MAX+1 or 
thereabouts and run the test suites from there.  That would also catch 
things like inappropriate uses of atoi(), things beyond just the format 
strings.

Re: -Wformat-signedness

From

Peter Eisentraut

Date:

29 October 2024, 11:33:54

On 29.10.24 07:51, Michael Paquier wrote:
> On Tue, Oct 29, 2024 at 07:38:36AM +0100, Peter Eisentraut wrote:
>> I think it could be useful to set up some better test coverage for various
>> things overflowing signed integer maximums.  For example, maybe you could
>> hack initdb to advance the OID counter to INT32_MAX+1 or thereabouts and run
>> the test suites from there.  That would also catch things like inappropriate
>> uses of atoi(), things beyond just the format strings.
> 
> Fun.  One way to be fancy here would be to force a pg_resetwal
> --next-oid in some of the test paths (Cluster.pm and/or pg_regress)
> with an environment variable to force the command to trigger across
> the board for all the clusters created in the tests.  initdb cannot be
> used here as the TAP tests reuse a cluster already initdb'd to save
> time. No need to touch at pg_regress, either, as we could count on the
> pg_regress runs in 002_pg_upgrade.pl and 027_stream_regress.pl.

I was thinking just compiling with a patch like this:

-#define FirstNormalObjectId        16384
+#define FirstNormalObjectId        ((Oid) INT_MAX + 1)

Already found one bug:  pg_checksums --filenode only accepts files up to 
INT_MAX.