Re: [PATCHES] to_date() validation - Mailing list pgsql-hackers
From | Brendan Jurd |
---|---|
Subject | Re: [PATCHES] to_date() validation |
Date | |
Msg-id | 37ed240d0809090404w2490d375l60b0fc972f899cec@mail.gmail.com Whole thread Raw |
In response to | Re: [PATCHES] to_date() validation (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: [PATCHES] to_date() validation
|
List | pgsql-hackers |
On Tue, Sep 9, 2008 at 7:29 PM, Martijn van Oosterhout <kleptog@svana.org> wrote: > I actually had a look at this patch also, though not as thoroughly as > Alex. There was one part that I had some thoughts about in from_char_parse_int_len: > Hi Martijn. Thanks for your comments. > The use of palloc/pfree in this routine seems excessive. Does len have > upper bound? If so a simple array will do it. > There isn't any *theoretical* upper bound on len. However, in practice, with the current set of formatting nodes, the largest len that will ever be passed to rom_char_parse_int_len() is 6 (for the microsecond 'US' node). I suppose I could define a constant FORMATNODE_MAX_LEN, make it something like 10 and just use that for copying the string, rather than palloc(). I'll give it a try. > ! if (strlen(first) < len) > > Here you check the length of the remaining string and complain if it's > too short, but: > > <snip> > ! result = strtol(first, &last, 10); > ! *src += (last - first); > > Here you do not note if we didn't convert the entire string. So it > seems you are allowed to provide too few characters, as long as it's > not the last field? That's true. The only way to hit that condition would be to provide a semi-bogus value in a string with no separators between the numbers. For example: postgres=# SELECT to_date('20081o13', 'YYYYMMDD'); ERROR: invalid value for "DD" in source string DETAIL: Value must be an integer. What happens here is that strtol() happily consumes the '1' for the format node MM, and as you point out it does not complain that it expected to consume two characters and only got one. On the next node (DD), the function tries to start parsing an integer, but the first character it encounters is 'o', so it freaks out. Certainly the error message here should be more apropos, and tell the user that the problem is in the MM node. Blaming the DD node is pure red herring. However, if the mistake is at the end of the string, there is no error at all: postgres=# SELECT to_date('2008101!', 'YYYYMMDD'); to_date ------------2008-10-01 This is because the end of the string counts as a "separator", so we just run an unbounded strtol() on whatever characters remain in the string. As discussed in my response to Alex's review, I made the end of the string a separator so that things like 'DD-MM-YYYY' would actually work for years with more than four digits. Now I'm wondering if that was the wrong way to go. The case for years with more than four digits is extremely narrow, and if somebody really wanted to parse '01-01-20008' as 1 Jan 20,008 they could always use the 'FM' modifier to get the behaviour they want ('DD-MM-FMYYYY'). I'll send in a new version which addresses these issues. Cheers, BJ
pgsql-hackers by date: