Re: custom function for converting human readable sizes to bytes - Mailing list pgsql-hackers
From | Pavel Stehule |
---|---|
Subject | Re: custom function for converting human readable sizes to bytes |
Date | |
Msg-id | CAFj8pRA5_JQ+2ytwkFefeenWG8v=1+jo_MmYc86BUr_t3vwtmA@mail.gmail.com Whole thread Raw |
In response to | Re: custom function for converting human readable sizes to bytes (Vitaly Burovoy <vitaly.burovoy@gmail.com>) |
Responses |
Re: custom function for converting human readable sizes to bytes
Re: custom function for converting human readable sizes to bytes |
List | pgsql-hackers |
2016-01-19 4:45 GMT+01:00 Vitaly Burovoy <vitaly.burovoy@gmail.com>:
On 1/4/16, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> 2016-01-04 18:13 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de> :
>> On Mon, Jan 4, 2016 at 6:03 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
>> > 2016-01-04 17:48 GMT+01:00 Shulgin, Oleksandr <oleksandr.shulgin@zalando.de>:
>> >> On Mon, Jan 4, 2016 at 4:51 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> >>
>> >> I'm also inclined on dropping that explicit check for empty string
>> >> below and let numeric_in() error out on that. Does this look OK, or can
>> >> it confuse someone:
>> >> postgres=# select pg_size_bytes('');
>> >> ERROR: invalid input syntax for type numeric: ""
>> >
>> > both fixed
>>
>> Hm...
>>
>> > + switch (*strptr)
>> > + {
>> > + /* ignore plus symbol */
>> > + case '+':
>> > + case '-':
>> > + *bufptr++ = *strptr++;
>> > + break;
>> > + }
>>
>> Well, to that amount you don't need any special checks, I'm just not sure
>> if reported error message is not misleading if we let numeric_in() handle
>> all the errors. At least it can cope with the leading spaces, +/- and
>> empty input quite well.
>>
>
> I don't would to catch a exception from numeric_in - so I try to solve some
> simple situations, where I can raise possible better error message.
There are several cases where your behavior gives strange errors (see below).
Next batch of notes:
src/include/catalog/pg_proc.h:
---
+ DATA(insert OID = 3317 ( pg_size_bytes...
now oid 3317 is used (by pg_stat_get_wal_receiver), 3318 is free
fixed
---
+ DESCR("convert a human readable text with size units to big int bytes");
May be the best way is to copy the first sentence from the doc?
("convert a size in human-readable format with size units into bytes")
fixed
====
src/backend/utils/adt/dbsize.c:
+ text *arg = PG_GETARG_TEXT_PP(0);
+ char *str = text_to_cstring(arg);
...
+ /* working buffer cannot be longer than original string */
+ buffer = (char *) palloc(VARSIZE_ANY_EXHDR(arg) + 1);
Is there any reason to get TEXT for only converting it to cstring and
call VARSIZE_ANY_EXHDR instead of strlen?
performance - but these strings should be short, so I can use strlen
fixed
---
+ text *arg = PG_GETARG_TEXT_PP(0);
+ char *str = text_to_cstring(arg);
+ char *strptr = str;
+ char *buffer;
There are wrong offsets of variable names after their types (among all
body of the "pg_size_bytes" function).
See variable declarations in nearby functions (
>> "make the new code look like the existing code around it"
http://www.postgresql.org/docs/devel/static/source-format.html
)
fixed
---
+ errmsg("\"%s\" is not number", str)));
s/is not number/is not a number/
(the second version can be found in a couple places besides translations)
fixed
but this message can be little bit no intuitive - better text is "is not a valid number"
---
+ if (*strptr != '\0')
...
+ while (*strptr && !isspace(*strptr))
Sometimes it explicitly compares to '\0', sometimes implicitly.
Common use is explicit comparison and it is preferred due to different
compilers (their conversions to boolean).
fixed
---
+ /* Skip leading spaces */
...
+ /* ignore plus symbol */
...
+ /* copy digits to working buffer */
...
+ /* allow whitespace between integer and unit */
I'm also inclined on dropping that explicit skipping spaces, checking
for +/- symbols, but copying all digits, spaces, dots and '+-' symbols
and let numeric_in() error out on that.
This is difficult - you have to divide string to two parts and first world is number, second world is unit.
For example "+912+ kB" is correct number +912 and broken unit "+ kB".
It allows to get correct error messages for something like:
postgres=# select pg_size_bytes('.+912');
ERROR: invalid input syntax for type numeric: ".+912"
postgres=# select pg_size_bytes('+912+ kB');
ERROR: invalid input syntax for type numeric: "+912+ "
postgres=# select pg_size_bytes('++123 kB');
ERROR: invalid input syntax for type numeric: "++123 "
instead of current:
postgres=# select pg_size_bytes('.+912');
ERROR: invalid input syntax for type numeric: "."
postgres=# select pg_size_bytes('+912+ kB');
ERROR: invalid unit: "+ kB"
postgres=# select pg_size_bytes('++123 kB');
ERROR: invalid input syntax for type numeric: "+"
I redesigned this check. Now it is
popostgres=# select pg_size_bytes('.+912');
ERROR: 22023: ".+912" is not a valid number
stgres=# select pg_size_bytes('++123 kB');
ERROR: 22023: "++123 kB" is not a valid number
popostgres=# select pg_size_bytes('.+912');
ERROR: 22023: ".+912" is not a valid number
stgres=# select pg_size_bytes('++123 kB');
ERROR: 22023: "++123 kB" is not a valid number
---
+ while (isspace((unsigned char) *strptr))
...
+ while (isspace(*strptr))
...
+ while (*strptr && !isspace(*strptr))
...
+ while (isspace(*strptr))
The first occurece of isspace's parameter is casting to "unsigned
char" whereas the others are not.
Note:
"The behavior is undefined if the value of ch is not representable as
unsigned char and is not equal to EOF"
Proof:
http://en.cppreference.com/w/c/string/byte/isspace
fixed
---
+ pfree(buffer);
+ pfree(str);
pfree-s here are not necessary. See:
http://www.neilconway.org/talks/hacking/hack_slides.pdf (page 17)
Automatic memory deallocation doesn't cover all possible situations where the function can be used - for example DirectFunctionCall - so explicit deallocation can descrease a memory requirements when you call these functions from C.
New version is attached
Regards
Pavel
--
Best regards,
Vitaly Burovoy
Attachment
pgsql-hackers by date: