Thread: to_date_valid()
Hello, we have customers complaining that to_date() accepts invalid dates, and returns a different date instead. This is a known issue: http://sql-info.de/postgresql/notes/to_date-to_timestamp-gotchas.html On the other hand this leads to wrong dates when loading dates into the database, because the database happily accepts invalid dates and ends up writing something completely different into the table. The attached patch adds a new function "to_date_valid()" which will validate the date and return an error if the input and output date do not match. Tests included, documentation update as well. -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
Attachment
On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote: > The attached patch adds a new function "to_date_valid()" which will > validate the date and return an error if the input and output date do > not match. Tests included, documentation update as well. > Why don't you add a third parameter (say, validate = true | false) instead of creating another function? The new parameter could default to false to not break compatibility. -- Euler Taveira Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte24x7 e Treinamento
<p dir="ltr"><p dir="ltr">El 2/7/2016 20:33, "Euler Taveira" <<a href="mailto:euler@timbira.com.br">euler@timbira.com.br</a>>escribió:<br /> ><br /> > On 02-07-2016 22:04, Andreas'ads' Scherbaum wrote:<br /> > > The attached patch adds a new function "to_date_valid()" which will<br /> >> validate the date and return an error if the input and output date do<br /> > > not match. Tests included,documentation update as well.<br /> > ><br /> > Why don't you add a third parameter (say, validate = true| false)<br /> > instead of creating another function? The new parameter could default to<br /> > false to notbreak compatibility.<br /> ><p dir="ltr">Shouldn't we fix this instead? Sounds like a bug to me. We don't usually wantto be bug compatible so it doesn't matter if we break something. <p dir="ltr">--<br /> Jaime Casanova <ahref="http://www.2ndQuadrant.com/">http://www.2ndQuadrant.com/</a><br /> PostgreSQL Development, 24x7 Support, Training& Services
On 03.07.2016 07:05, Jaime Casanova wrote: > El 2/7/2016 20:33, "Euler Taveira" <euler@timbira.com.br > <mailto:euler@timbira.com.br>> escribió: > > > > On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote: > > > The attached patch adds a new function "to_date_valid()" which will > > > validate the date and return an error if the input and output date do > > > not match. Tests included, documentation update as well. > > > > > Why don't you add a third parameter (say, validate = true | false) > > instead of creating another function? The new parameter could default to > > false to not break compatibility. > > > > Shouldn't we fix this instead? Sounds like a bug to me. We don't usually > want to be bug compatible so it doesn't matter if we break something. There are previous discussions about such a change, and this was rejected: https://www.postgresql.org/message-id/lbjf1v%24a2v%241%40ger.gmane.org https://www.postgresql.org/message-id/A737B7A37273E048B164557ADEF4A58B17C9140E%40ntex2010i.host.magwien.gv.at Hence the new function, which does not collide with the existing implementation. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br> wrote:
-- On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:
> The attached patch adds a new function "to_date_valid()" which will
> validate the date and return an error if the input and output date do
> not match. Tests included, documentation update as well.
>
Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.
because
SELECT to_date('blah', 'pattern', true)
is less clear to read than
SELECT to_date_valid('blah', 'pattern')
and offers no advantage. It's likely faster to use a separate function too.
2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:
On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br> wrote:On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:
> The attached patch adds a new function "to_date_valid()" which will
> validate the date and return an error if the input and output date do
> not match. Tests included, documentation update as well.
>
Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.becauseSELECT to_date('blah', 'pattern', true)is less clear to read thanSELECT to_date_valid('blah', 'pattern')and offers no advantage. It's likely faster to use a separate function too.
personally I prefer first variant - this is same function with stronger check.
The name to_date_valid sounds little bit strange - maybe to_date_strict should be better.
Regards
Pavel
--
On 04/07/16 15:19, Pavel Stehule wrote: > > > 2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com > <mailto:craig@2ndquadrant.com>>: > > On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br > <mailto:euler@timbira.com.br>> wrote: > > On 02-07-2016 22 <tel:02-07-2016%2022>:04, Andreas 'ads' > Scherbaum wrote: > > The attached patch adds a new function "to_date_valid()" > which will > > validate the date and return an error if the input and > output date do > > not match. Tests included, documentation update as well. > > > Why don't you add a third parameter (say, validate = true | false) > instead of creating another function? The new parameter could > default to > false to not break compatibility. > > > because > > > SELECT to_date('blah', 'pattern', true) > > is less clear to read than > > SELECT to_date_valid('blah', 'pattern') > > and offers no advantage. It's likely faster to use a separate > function too. > > > personally I prefer first variant - this is same function with > stronger check. > > The name to_date_valid sounds little bit strange - maybe > to_date_strict should be better. > > Regards > > Pavel > > -- > Craig Ringer http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Training & Services > > Yeah, my feeling too, is that 'to_date_strict' would be better! Cheers, Gavin
2016-07-04 5:19 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com>:
2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com>:On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br> wrote:On 02-07-2016 22:04, Andreas 'ads' Scherbaum wrote:
> The attached patch adds a new function "to_date_valid()" which will
> validate the date and return an error if the input and output date do
> not match. Tests included, documentation update as well.
>
Why don't you add a third parameter (say, validate = true | false)
instead of creating another function? The new parameter could default to
false to not break compatibility.becauseSELECT to_date('blah', 'pattern', true)is less clear to read thanSELECT to_date_valid('blah', 'pattern')and offers no advantage. It's likely faster to use a separate function too.personally I prefer first variant - this is same function with stronger check.
Currently probably we have not two similar function - one fault tolerant and second stricter. There is only one example of similar behave - parse_ident with "strict" option.
The three parameters are ok still - so I don't see a reason why we have to implement new function. If you need to emphasize the fact so behave should be strict, you can use named parameters
select to_date('blah', 'patter', strict => true)
Regards
Pavel
The name to_date_valid sounds little bit strange - maybe to_date_strict should be better.RegardsPavel
--
On Sun, Jul 3, 2016 at 6:34 AM, Andreas 'ads' Scherbaum <adsmail@wars-nicht.de> wrote: > > Hello, > > we have customers complaining that to_date() accepts invalid dates, and > returns a different date instead. This is a known issue: > > http://sql-info.de/postgresql/notes/to_date-to_timestamp-gotchas.html > > On the other hand this leads to wrong dates when loading dates into the > database, because the database happily accepts invalid dates and ends up > writing something completely different into the table. > > The attached patch adds a new function "to_date_valid()" which will validate > the date and return an error if the input and output date do not match. > Tests included, documentation update as well. > It seems that you are calling many additional function calls (date_out, timestamp_in, etc.) to validate the date. Won't the additional function calls make to_date much costlier than its current implementation? I don't know if there is a better way, but I think it is worth to consider, if we can find a cheaper way to detect validity of date. Note - Your patch is small (~13KB) enough that it doesn't need to zipped. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
On 04.07.2016 16:33, Amit Kapila wrote: > On Sun, Jul 3, 2016 at 6:34 AM, Andreas 'ads' Scherbaum > <adsmail@wars-nicht.de> wrote: >> >> Hello, >> >> we have customers complaining that to_date() accepts invalid dates, and >> returns a different date instead. This is a known issue: >> >> http://sql-info.de/postgresql/notes/to_date-to_timestamp-gotchas.html >> >> On the other hand this leads to wrong dates when loading dates into the >> database, because the database happily accepts invalid dates and ends up >> writing something completely different into the table. >> >> The attached patch adds a new function "to_date_valid()" which will validate >> the date and return an error if the input and output date do not match. >> Tests included, documentation update as well. >> > > It seems that you are calling many additional function calls > (date_out, timestamp_in, etc.) to validate the date. Won't the > additional function calls make to_date much costlier than its current > implementation? I don't know if there is a better way, but I think it > is worth to consider, if we can find a cheaper way to detect validity > of date. It certainly is costlier, and I'm open to suggestions how to improve the performance. That is one of the reasons why I considered a separate function, instead of adding this to to_date() and add even more overhead there. > Note - Your patch is small (~13KB) enough that it doesn't need to zipped. It's not about the small size, it's about websites like Gmail which mingle up the linebreaks and then git fails. Ran into this problem on the pgAdminIII list a while ago, ever since I just zip it and avoid the trouble. -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 04.07.2016 05:51, Pavel Stehule wrote: > > > 2016-07-04 5:19 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com > <mailto:pavel.stehule@gmail.com>>: > > > > 2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com > <mailto:craig@2ndquadrant.com>>: > > On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br > <mailto:euler@timbira.com.br>> wrote: > > On 02-07-2016 22 <tel:02-07-2016%2022>:04, Andreas 'ads' > Scherbaum wrote: > > The attached patch adds a new function "to_date_valid()" which will > > validate the date and return an error if the input and output date do > > not match. Tests included, documentation update as well. > > > Why don't you add a third parameter (say, validate = true | > false) > instead of creating another function? The new parameter > could default to > false to not break compatibility. > > > because > > > SELECT to_date('blah', 'pattern', true) > > is less clear to read than > > SELECT to_date_valid('blah', 'pattern') > > and offers no advantage. It's likely faster to use a separate > function too. > > > personally I prefer first variant - this is same function with > stronger check. > > > Currently probably we have not two similar function - one fault > tolerant and second stricter. There is only one example of similar > behave - parse_ident with "strict" option. > > The three parameters are ok still - so I don't see a reason why we have > to implement new function. If you need to emphasize the fact so behave > should be strict, you can use named parameters > > select to_date('blah', 'patter', strict => true) The new function is not "strict", it just adds a validation step: postgres=# select to_date_valid(NULL, NULL); to_date_valid --------------- (1 row) -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
2016-07-04 18:24 GMT+02:00 Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>:
On 04.07.2016 05:51, Pavel Stehule wrote:
2016-07-04 5:19 GMT+02:00 Pavel Stehule <pavel.stehule@gmail.com
<mailto:pavel.stehule@gmail.com>>:
2016-07-04 4:25 GMT+02:00 Craig Ringer <craig@2ndquadrant.com
<mailto:craig@2ndquadrant.com>>:
On 3 July 2016 at 09:32, Euler Taveira <euler@timbira.com.br
<mailto:euler@timbira.com.br>> wrote:
On 02-07-2016 22 <tel:02-07-2016%2022>:04, Andreas 'ads'
Scherbaum wrote:
> The attached patch adds a new function "to_date_valid()" which will
> validate the date and return an error if the input and output date do
> not match. Tests included, documentation update as well.
>
Why don't you add a third parameter (say, validate = true |
false)
instead of creating another function? The new parameter
could default to
false to not break compatibility.
because
SELECT to_date('blah', 'pattern', true)
is less clear to read than
SELECT to_date_valid('blah', 'pattern')
and offers no advantage. It's likely faster to use a separate
function too.
personally I prefer first variant - this is same function with
stronger check.
Currently probably we have not two similar function - one fault
tolerant and second stricter. There is only one example of similar
behave - parse_ident with "strict" option.
The three parameters are ok still - so I don't see a reason why we have
to implement new function. If you need to emphasize the fact so behave
should be strict, you can use named parameters
select to_date('blah', 'patter', strict => true)
The new function is not "strict", it just adds a validation step:
I understand - I know, so this has zero relation to function flag STRICT
I don't know if the name "strict" is best, but the name "validate" is not good too. Current to_date does some validations too.
Regards
Pavel
postgres=# select to_date_valid(NULL, NULL);
to_date_valid
---------------
(1 row)
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
On 07/03/2016 12:36 PM, Andreas 'ads' Scherbaum wrote: > On 03.07.2016 07:05, Jaime Casanova wrote: >> Shouldn't we fix this instead? Sounds like a bug to me. We don't usually >> want to be bug compatible so it doesn't matter if we break something. > > There are previous discussions about such a change, and this was rejected: > > https://www.postgresql.org/message-id/lbjf1v%24a2v%241%40ger.gmane.org > https://www.postgresql.org/message-id/A737B7A37273E048B164557ADEF4A58B17C9140E%40ntex2010i.host.magwien.gv.at > > > Hence the new function, which does not collide with the existing > implementation. I do not see a clear conclusion in the linked threads. For example Bruce calls it a bug in one of the emails (https://www.postgresql.org/message-id/201107200103.p6K13ix10517%40momjian.us). I think we should fix to_date() to throw an error. Personally I would be happy if my code broke due to this kind of change since the exception would reveal an old bug which has been there a long time eating my data. I cannot see a case where I would have wanted the current behavior. If there is any legitimate use for the current behavior then we can add it back as another function. Andreas
2016-07-04 22:15 GMT+02:00 Andreas Karlsson <andreas@proxel.se>:
On 07/03/2016 12:36 PM, Andreas 'ads' Scherbaum wrote:On 03.07.2016 07:05, Jaime Casanova wrote:Shouldn't we fix this instead? Sounds like a bug to me. We don't usually
want to be bug compatible so it doesn't matter if we break something.
There are previous discussions about such a change, and this was rejected:
https://www.postgresql.org/message-id/lbjf1v%24a2v%241%40ger.gmane.org
https://www.postgresql.org/message-id/A737B7A37273E048B164557ADEF4A58B17C9140E%40ntex2010i.host.magwien.gv.at
Hence the new function, which does not collide with the existing
implementation.
I do not see a clear conclusion in the linked threads. For example Bruce calls it a bug in one of the emails (https://www.postgresql.org/message-id/201107200103.p6K13ix10517%40momjian.us).
I think we should fix to_date() to throw an error. Personally I would be happy if my code broke due to this kind of change since the exception would reveal an old bug which has been there a long time eating my data. I cannot see a case where I would have wanted the current behavior.
If I remember, this implementation is based on Oracle's behave. It is pretty old and documented - so it is hard to speak about it like the bug. I understand, so the behave is strange, but it was designed in different time. You can enter not 100% valid string, but you get correct date
postgres=# select to_date('2016-12-40','YYYY-MM-DD');
┌────────────┐
│ to_date │
╞════════════╡
│ 2017-01-09 │
└────────────┘
(1 row)
postgres=# select to_date('2016-12-40','YYYY-MM-DD');
┌────────────┐
│ to_date │
╞════════════╡
│ 2017-01-09 │
└────────────┘
(1 row)
It can be used for some date calculations. In old age the applications was designed in this style. I am against to change of default behave. We can introduce new new different function with different name with better designed format and rules, or we can add new options to this function, or we can live with current state.
Now, to_date function should not be used - functions make_date, make_timestamp are faster and safe.
postgres=# select make_date(2017,01,40);
ERROR: date field value out of range: 2017-01-40
postgres=# select make_date(2017,01,40);
ERROR: date field value out of range: 2017-01-40
Regards
Pavel
If there is any legitimate use for the current behavior then we can add it back as another function.
Andreas
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers
On 07/04/2016 10:55 PM, Pavel Stehule wrote: > 2016-07-04 22:15 GMT+02:00 Andreas Karlsson <andreas@proxel.se > <mailto:andreas@proxel.se>>: > I do not see a clear conclusion in the linked threads. For example > Bruce calls it a bug in one of the emails > (https://www.postgresql.org/message-id/201107200103.p6K13ix10517%40momjian.us). > > I think we should fix to_date() to throw an error. Personally I > would be happy if my code broke due to this kind of change since the > exception would reveal an old bug which has been there a long time > eating my data. I cannot see a case where I would have wanted the > current behavior. > > > If I remember, this implementation is based on Oracle's behave. In the thread I linked above they claim that Oracle (at least 10g) does not work like this. Thomas Kellerer wrote:> Oracle throws an error for the above example:>> SQL> select to_date('20110231', 'YYYYMMDD') fromdual;> select to_date('20110231', 'YYYYMMDD') from dual> *> ERROR at line 1:> ORA-01839: date not validfor month specified I do not have access to an Oracle installation so I cannot confirm this myself. > It is > pretty old and documented - so it is hard to speak about it like the > bug. I understand, so the behave is strange, but it was designed in > different time. You can enter not 100% valid string, but you get correct > date > > postgres=# select to_date('2016-12-40','YYYY-MM-DD'); > > ┌────────────┐ > │ to_date │ > ╞════════════╡ > │ 2017-01-09 │ > └────────────┘ > (1 row) > > > It can be used for some date calculations. In old age the applications > was designed in this style. I am against to change of default behave. We > can introduce new new different function with different name with better > designed format and rules, or we can add new options to this function, > or we can live with current state. While clever, I think this behavior is a violation of the principle of least surprise. It is not obvious to someone reading a query that to_date() would behave like this. Especially when Oracle's to_date() works differently. > Now, to_date function should not be used - functions make_date,> make_timestamp are faster and safe. Yeah, I personally know of this behavior and therefore would never use to_date(), but I am far from the average PostgreSQL user. Andreas
On 04.07.2016 18:37, Pavel Stehule wrote: > > I don't know if the name "strict" is best, but the name "validate" is > not good too. Current to_date does some validations too. Obviously not enough, because it allows invalid dates. I'd say that the current to_date() merely validates the input format for string parsing, and that the date is in range. But there is not much validation on the date itself. So the name can't be "strict" because of the conflict with "NULL" handling, and you don't like "valid" - what other options do you offer? -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 04.07.2016 18:37, Pavel Stehule wrote:
I don't know if the name "strict" is best, but the name "validate" is
not good too. Current to_date does some validations too.
Obviously not enough, because it allows invalid dates. I'd say that the current to_date() merely validates the input format for string parsing, and that the date is in range. But there is not much validation on the date itself.
So the name can't be "strict" because of the conflict with "NULL" handling, and you don't like "valid" - what other options do you offer?
We don't have to change the name...we could do something like how RegularExpressions work - like (?i) - and just add a new modifier code.
'~YYYY-MI-DD' --that's a leading tilde, could be anything - even something like "HMYYYY-MI-DD" for "historical mode"
Also, see this thread of a few weeks ago for related material:
It seems that fixing it is back on the table, possibly even for 9.6 since this is such a hideous bug - one that closely resembles a cockroach ;)
WRT to the 2014 "reject out-of-range dates" thread, I'm kinda surprised that we didn't just set the date to be the minimum or maximum allowable date back in 9.2 instead of rejecting it...
I'd be more inclined to add another argument if it was basically an enum.
to_date(text, format, format_style)
This at least opens the door for future enhancements that are associated with named styles. I could imagine an "exact" format that also allows for something like regexp character classes so that one can write: "YYYY[:SEP:]MM[:SEP:]DD" to keep exact matches but accommodate variations on what people type as a separator e.g. [.-/]
format_style=>historical would invoke today's behaviors
David J.
2016-07-05 4:33 GMT+02:00 David G. Johnston <david.g.johnston@gmail.com>:
On 04.07.2016 18:37, Pavel Stehule wrote:
I don't know if the name "strict" is best, but the name "validate" is
not good too. Current to_date does some validations too.
Obviously not enough, because it allows invalid dates. I'd say that the current to_date() merely validates the input format for string parsing, and that the date is in range. But there is not much validation on the date itself.
So the name can't be "strict" because of the conflict with "NULL" handling, and you don't like "valid" - what other options do you offer?We don't have to change the name...we could do something like how RegularExpressions work - like (?i) - and just add a new modifier code.'~YYYY-MI-DD' --that's a leading tilde, could be anything - even something like "HMYYYY-MI-DD" for "historical mode"Also, see this thread of a few weeks ago for related material:It seems that fixing it is back on the table, possibly even for 9.6 since this is such a hideous bug - one that closely resembles a cockroach ;)WRT to the 2014 "reject out-of-range dates" thread, I'm kinda surprised that we didn't just set the date to be the minimum or maximum allowable date back in 9.2 instead of rejecting it...I'd be more inclined to add another argument if it was basically an enum.to_date(text, format, format_style)This at least opens the door for future enhancements that are associated with named styles. I could imagine an "exact" format that also allows for something like regexp character classes so that one can write: "YYYY[:SEP:]MM[:SEP:]DD" to keep exact matches but accommodate variations on what people type as a separator e.g. [.-/]format_style=>historical would invoke today's behaviors
this is compatibility break - the question is "can we break it?"
Pavel
David J.
2016-07-05 2:39 GMT+02:00 Andreas 'ads' Scherbaum <adsmail@wars-nicht.de>:
On 04.07.2016 18:37, Pavel Stehule wrote:
I don't know if the name "strict" is best, but the name "validate" is
not good too. Current to_date does some validations too.
Obviously not enough, because it allows invalid dates. I'd say that the current to_date() merely validates the input format for string parsing, and that the date is in range. But there is not much validation on the date itself.
So the name can't be "strict" because of the conflict with "NULL" handling, and you don't like "valid" - what other options do you offer?
I have not - so third option looks best for me - it can be long name "only_correct_date", "only_valid_date", "only_valid_date_on_input" ...
Pavel
--
Andreas 'ads' Scherbaum
German PostgreSQL User Group
European PostgreSQL User Group - Board of Directors
Volunteer Regional Contact, Germany - PostgreSQL Project
Andreas Karlsson wrote: > On 07/04/2016 10:55 PM, Pavel Stehule wrote: >> 2016-07-04 22:15 GMT+02:00 Andreas Karlsson wrote: >>> I do not see a clear conclusion in the linked threads. For example >>> Bruce calls it a bug in one of the emails >>> (https://www.postgresql.org/message-id/201107200103.p6K13ix10517%40momjian.us). >>> >>> I think we should fix to_date() to throw an error. Personally I >>> would be happy if my code broke due to this kind of change since the >>> exception would reveal an old bug which has been there a long time >>> eating my data. I cannot see a case where I would have wanted the >>> current behavior. >> >> If I remember, this implementation is based on Oracle's behave. > > In the thread I linked above they claim that Oracle (at least 10g) does > not work like this. [...] > I do not have access to an Oracle installation so I cannot confirm this > myself. Oracle 12.1: SQL> SELECT to_date('2016-12-40','YYYY-MM-DD') FROM dual; SELECT to_date('2016-12-40','YYYY-MM-DD') FROM dual * ERROR at line 1: ORA-01847: day of month must be between 1 and last day of month SQL> SELECT to_date('2017-02-29','YYYY-MM-DD') FROM dual; SELECT to_date('2017-02-29','YYYY-MM-DD') FROM dual * ERROR at line 1: ORA-01839: date not valid for month specified So no, compatibility with Oracle is certainly not the reason to leave it as it is. But notwithstanding your feeling that you would like your application to break if it makes use of this behaviour, it is a change that might make some people pretty unhappy - nobody can tell how many. Yours, Laurenz Albe
On 05.07.2016 06:05, Pavel Stehule wrote: > > > 2016-07-05 2:39 GMT+02:00 Andreas 'ads' Scherbaum <adsmail@wars-nicht.de > <mailto:adsmail@wars-nicht.de>>: > > On 04.07.2016 18:37, Pavel Stehule wrote: > > > I don't know if the name "strict" is best, but the name > "validate" is > not good too. Current to_date does some validations too. > > > Obviously not enough, because it allows invalid dates. I'd say that > the current to_date() merely validates the input format for string > parsing, and that the date is in range. But there is not much > validation on the date itself. > > So the name can't be "strict" because of the conflict with "NULL" > handling, and you don't like "valid" - what other options do you offer? > > > I have not - so third option looks best for me - it can be long name > "only_correct_date", "only_valid_date", "only_valid_date_on_input" ... Then you don't have "to_date" in the function name, but still use "valid" in the name. How is that useful to remember the function? Where "to_date_valid" already gives you the idea that it is "to_date" with an additional "valid"ator. Don't make it overly complicated. -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 05.07.2016 04:33, David G. Johnston wrote: > On Mon, Jul 4, 2016 at 8:39 PM, Andreas 'ads' Scherbaum > <adsmail@wars-nicht.de <mailto:adsmail@wars-nicht.de>>wrote: > > On 04.07.2016 18:37, Pavel Stehule wrote: > > > I don't know if the name "strict" is best, but the name > "validate" is > not good too. Current to_date does some validations too. > > > Obviously not enough, because it allows invalid dates. I'd say that > the current to_date() merely validates the input format for string > parsing, and that the date is in range. But there is not much > validation on the date itself. > > So the name can't be "strict" because of the conflict with "NULL" > handling, and you don't like "valid" - what other options do you offer? > > > We don't have to change the name...we could do something like how > RegularExpressions work - like (?i) - and just add a new modifier code. > > '~YYYY-MI-DD' --that's a leading tilde, could be anything - even > something like "HMYYYY-MI-DD" for "historical mode" Where to_timestamp() already uses HH for the hour? If you add another "H", that surely is confusing. > It seems that fixing it is back on the table, possibly even for 9.6 > since this is such a hideous bug - one that closely resembles a cockroach ;) 9.6 is already in Beta, people are testing their applications against it. This would be a huge break, plus an API change - something you don't add in a Beta. -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 05.07.2016 04:33, David G. Johnston wrote:On Mon, Jul 4, 2016 at 8:39 PM, Andreas 'ads' Scherbaum
<adsmail@wars-nicht.de <mailto:adsmail@wars-nicht.de>>wrote:
On 04.07.2016 18:37, Pavel Stehule wrote:
I don't know if the name "strict" is best, but the name
"validate" is
not good too. Current to_date does some validations too.
Obviously not enough, because it allows invalid dates. I'd say that
the current to_date() merely validates the input format for string
parsing, and that the date is in range. But there is not much
validation on the date itself.
So the name can't be "strict" because of the conflict with "NULL"
handling, and you don't like "valid" - what other options do you offer?
We don't have to change the name...we could do something like how
RegularExpressions work - like (?i) - and just add a new modifier code.
'~YYYY-MI-DD' --that's a leading tilde, could be anything - even
something like "HMYYYY-MI-DD" for "historical mode"
Where to_timestamp() already uses HH for the hour? If you add another "H", that surely is confusing.
I don't really try to pick final names for things until the idea has taken hold...
It seems that fixing it is back on the table, possibly even for 9.6
since this is such a hideous bug - one that closely resembles a cockroach ;)
9.6 is already in Beta, people are testing their applications against it. This would be a huge break, plus an API change - something you don't add in a Beta.
Surely these beta testers would test against the RC before putting it into production so I don't see an issue. I tend to agree generally but the point of beta is to find bugs and solicit suggestions for improvement to features. Having found a bug it doesn't make sense to avoid patching our current unstable release. This applies moreso because of our annual release cycle. On the topic of whether this becomes an exception to the rule I'm largely ambivalent.
David J.
On Tue, Jul 5, 2016 at 07:41:15AM -0400, David G. Johnston wrote: > Surely these beta testers would test against the RC before putting it into > production so I don't see an issue. I tend to agree generally but the point of > beta is to find bugs and solicit suggestions for improvement to features. > Having found a bug it doesn't make sense to avoid patching our current unstable > release. This applies moreso because of our annual release cycle. On the > topic of whether this becomes an exception to the rule I'm largely ambivalent. We don't assume users re-test everything when the RC1 arrives --- we assume the testing is cumulative from when we started beta. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
On 7/5/16 4:24 AM, Albe Laurenz wrote: > But notwithstanding your feeling that you would like your application > to break if it makes use of this behaviour, it is a change that might > make some people pretty unhappy - nobody can tell how many. What is the use of the existing behavior? You get back an arbitrary implementation dependent value. We don't even guarantee what the value will be. If we changed it to return a different implementation dependent value, would users get upset? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 07/26/2016 06:25 PM, Peter Eisentraut wrote: > On 7/5/16 4:24 AM, Albe Laurenz wrote: >> But notwithstanding your feeling that you would like your application >> to break if it makes use of this behaviour, it is a change that might >> make some people pretty unhappy - nobody can tell how many. > > What is the use of the existing behavior? You get back an arbitrary > implementation dependent value. We don't even guarantee what the value > will be. If we changed it to return a different implementation > dependent value, would users get upset? No they would not get upset because they wouldn't know. Can we just do the right thing? JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. Unless otherwise stated, opinions are my own.
On 27.07.2016 05:00, Joshua D. Drake wrote: > On 07/26/2016 06:25 PM, Peter Eisentraut wrote: >> On 7/5/16 4:24 AM, Albe Laurenz wrote: >>> But notwithstanding your feeling that you would like your application >>> to break if it makes use of this behaviour, it is a change that might >>> make some people pretty unhappy - nobody can tell how many. >> >> What is the use of the existing behavior? You get back an arbitrary >> implementation dependent value. We don't even guarantee what the value >> will be. If we changed it to return a different implementation >> dependent value, would users get upset? > > No they would not get upset because they wouldn't know. > > Can we just do the right thing? I'm in favour of fixing this, and update the documentation. But given the discussions in the past, it seemed like people actually depend on this behaviour. Hence the additional function. if this is fixed, it's too late for the current beta. But it's a good time to add a note in the release notes, and advise people that it will be changed in the next release. A workaround can be to rename the current function to something like "to_date_legacy", or "to_date_oracle". And implement the checks in to_date. -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 7/29/16 1:33 PM, Andreas 'ads' Scherbaum wrote: > On 27.07.2016 05:00, Joshua D. Drake wrote: >> On 07/26/2016 06:25 PM, Peter Eisentraut wrote: >>> On 7/5/16 4:24 AM, Albe Laurenz wrote: >>>> But notwithstanding your feeling that you would like your application >>>> to break if it makes use of this behaviour, it is a change that might >>>> make some people pretty unhappy - nobody can tell how many. >>> >>> What is the use of the existing behavior? You get back an arbitrary >>> implementation dependent value. We don't even guarantee what the value >>> will be. If we changed it to return a different implementation >>> dependent value, would users get upset? >> >> No they would not get upset because they wouldn't know. >> >> Can we just do the right thing? > > I'm in favour of fixing this, and update the documentation. +1. I'd say that if users complain we can always create an extension (on PGXN) that offers the old behavior. Users could even put that function before pg_catalog in search_path and get the old behavior back. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461
On 27.07.2016 05:00, Joshua D. Drake wrote: > On 07/26/2016 06:25 PM, Peter Eisentraut wrote: >> On 7/5/16 4:24 AM, Albe Laurenz wrote: >>> But notwithstanding your feeling that you would like your application >>> to break if it makes use of this behaviour, it is a change that might >>> make some people pretty unhappy - nobody can tell how many. >> >> What is the use of the existing behavior? You get back an arbitrary >> implementation dependent value. We don't even guarantee what the value >> will be. If we changed it to return a different implementation >> dependent value, would users get upset? > > No they would not get upset because they wouldn't know. > > Can we just do the right thing? Attached is a patch to "do the right thing". The verification is in "to_date()" now, the extra function is removed. Regression tests are updated - two or three of them returned a wrong date before, and still passed. They fail now. Documentation is also updated. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
Attachment
On 14.08.2016 01:52, Andreas 'ads' Scherbaum wrote: > > Attached is a patch to "do the right thing". The verification is in > "to_date()" now, the extra function is removed. Regression tests are > updated - two or three of them returned a wrong date before, and still > passed. They fail now. Documentation is also updated. > > > Regards, > Is it right and "true" way to validate date by extra transforming and comparison? Maybe validate date by using ValidateDate(). Attached sample patch. -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
Attachment
On 15.08.2016 10:24, Artur Zakirov wrote: > On 14.08.2016 01:52, Andreas 'ads' Scherbaum wrote: >> >> Attached is a patch to "do the right thing". The verification is in >> "to_date()" now, the extra function is removed. Regression tests are >> updated - two or three of them returned a wrong date before, and still >> passed. They fail now. Documentation is also updated. >> >> >> Regards, >> > > Is it right and "true" way to validate date by extra transforming and > comparison? > > Maybe validate date by using ValidateDate(). Attached sample patch. This does not solve the problem at hand, and let's wrong dates/formats slip through: ./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make --run-install --no-clean-at-all --patch 'https://www.postgresql.org/message-id/95738e12-6ed6-daf5-9dcf-6336072e6b15%40postgrespro.ru' postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD'); to_date ------------ 2011-12-08 (1 row) That is from the regression tests, and obviously handles the date transformation wrong. My attempt catches this, because I compare the date with the input date, and do not rely on a valid date only. -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 15.08.2016 14:33, Andreas 'ads' Scherbaum wrote: >> Is it right and "true" way to validate date by extra transforming and >> comparison? >> >> Maybe validate date by using ValidateDate(). Attached sample patch. > > This does not solve the problem at hand, and let's wrong dates/formats > slip through: > > ./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make > --run-install --no-clean-at-all --patch > 'https://www.postgresql.org/message-id/95738e12-6ed6-daf5-9dcf-6336072e6b15%40postgrespro.ru' > > > > postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD'); > to_date > ------------ > 2011-12-08 > (1 row) > > > That is from the regression tests, and obviously handles the date > transformation wrong. My attempt catches this, because I compare the > date with the input date, and do not rely on a valid date only. I suppose that your sample query is an another issue, not date validate task. I sent the patch to the thread https://www.postgresql.org/message-id/b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru . It fixes formatting issues. I thought that it is better to distinguish this issues to: - validation of input date/timestmap string and input format string - result date/timestamp validation -- Artur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company
On 15.08.2016 13:44, Artur Zakirov wrote: > On 15.08.2016 14:33, Andreas 'ads' Scherbaum wrote: >>> Is it right and "true" way to validate date by extra transforming and >>> comparison? >>> >>> Maybe validate date by using ValidateDate(). Attached sample patch. >> >> This does not solve the problem at hand, and let's wrong dates/formats >> slip through: >> >> ./buildclient.py -v -c demo-config-pg.yaml --run-configure --run-make >> --run-install --no-clean-at-all --patch >> 'https://www.postgresql.org/message-id/95738e12-6ed6-daf5-9dcf-6336072e6b15%40postgrespro.ru' >> >> >> >> >> postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD'); >> to_date >> ------------ >> 2011-12-08 >> (1 row) >> >> >> That is from the regression tests, and obviously handles the date >> transformation wrong. My attempt catches this, because I compare the >> date with the input date, and do not rely on a valid date only. > > I suppose that your sample query is an another issue, not date validate > task. I sent the patch to the thread > https://www.postgresql.org/message-id/b2a39359-3282-b402-f4a3-057aae500ee7@postgrespro.ru > . It fixes formatting issues. > > I thought that it is better to distinguish this issues to: > - validation of input date/timestmap string and input format string > - result date/timestamp validation I was a bit confused when I've seen that you modified another function in your patch, but tried it out nevertheless. After all, you answered one of my emails, and I let my tool download your patch directly from the website. The other thread you are mentioning is a different topic. Anyway, what I'm trying to solve is validate that to_date() produces valid and correct output. It does not, as of today. Regards, -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 8/15/16 7:33 AM, Andreas 'ads' Scherbaum wrote: > postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD'); > to_date > ------------ > 2011-12-08 > (1 row) > > > That is from the regression tests, and obviously handles the date > transformation wrong. My attempt catches this, because I compare the > date with the input date, and do not rely on a valid date only. It's debatable what is correct here. Using to_number, the behavior appears to be that a space in the pattern ignores one character. For example: test=# select to_number('123 456', '999 999');to_number ----------- 123456 test=# select to_number('123 456', '999 999');to_number ----------- 12356 Considering that, the above to_date result is not incorrect. So just squashing the spaces and converting the value back is not a correct approach to detecting overflow. I think using ValidateDate() was the right idea. That is what we use for checking date validity everywhere else. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 08.09.2016 17:31, Peter Eisentraut wrote: > On 8/15/16 7:33 AM, Andreas 'ads' Scherbaum wrote: >> postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD'); >> to_date >> ------------ >> 2011-12-08 >> (1 row) >> >> >> That is from the regression tests, and obviously handles the date >> transformation wrong. My attempt catches this, because I compare the >> date with the input date, and do not rely on a valid date only. > > It's debatable what is correct here. > > Using to_number, the behavior appears to be that a space in the pattern > ignores one character. For example: > > test=# select to_number('123 456', '999 999'); > to_number > ----------- > 123456 > > test=# select to_number('123 456', '999 999'); > to_number > ----------- > 12356 > > Considering that, the above to_date result is not incorrect. > > So just squashing the spaces and converting the value back is not a > correct approach to detecting overflow. > > I think using ValidateDate() was the right idea. That is what we use > for checking date validity everywhere else. ValidateDate() will tell you if it's a valid date. But not if the transformation was correct: postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD'); to_date ------------ 2011-12-08 (1 row) (with the patch from Artur) Any idea how to solve this problem? -- Andreas 'ads' Scherbaum German PostgreSQL User Group European PostgreSQL User Group - Board of Directors Volunteer Regional Contact, Germany - PostgreSQL Project
On 9/9/16 4:48 AM, Andreas 'ads' Scherbaum wrote: > ValidateDate() will tell you if it's a valid date. But not if the > transformation was correct: > > postgres=# SELECT to_date('2011 12 18', 'YYYY MM DD'); > to_date > ------------ > 2011-12-08 > (1 row) > > (with the patch from Artur) > > > Any idea how to solve this problem? I think we need to do overflow checking as we read the various values, in do_to_timestamp() and functions called from it. This would be very similar to overflow checking in input functions. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes: > I think using ValidateDate() was the right idea. That is what we use > for checking date validity everywhere else. Note that we've got two different CF entries, and threads, covering fundamentally the same territory here, ie making to_timestamp et al behave more sanely. The other thread is https://commitfest.postgresql.org/10/713/ https://www.postgresql.org/message-id/flat/1873520224.1784572.1465833145330.JavaMail.yahoo%40mail.yahoo.com Robert pointed out several times in the other thread that to_timestamp and friends were intended to be Oracle-compatible, which I agree with. It was also pointed out that Oracle does range validity checks on the timestamp component fields, which I verified just now using http://rextester.com/l/oracle_online_compiler (which is what I found by googling after discovering that sqlfiddle isn't working very well, sigh). I got these results: select banner as "oracle version" from v$version Oracle Database 11g Express Edition Release 11.2.0.2.0 - 64bit Production PL/SQL Release 11.2.0.2.0 - Production CORE 11.2.0.2.0 Production TNS for 64-bit Windows: Version 11.2.0.2.0 - Production NLSRTL Version 11.2.0.2.0 - Production SELECT TO_TIMESTAMP('2016-06-13 99:99:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual ORA-01850: hour must be between 0 and 23 SELECT TO_TIMESTAMP('2016-06-13 19:99:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual ORA-01851: minutes must be between 0 and 59 SELECT TO_TIMESTAMP('2016-06-13 19:59:99', 'YYYY-MM-DD HH24:MI:SS') FROM dual ORA-01852: seconds must be between 0 and 59 SELECT TO_TIMESTAMP('2016-06-13 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual 13.06.2016 19:59:59 SELECT TO_TIMESTAMP('2016-06-13 19:59:59', 'YYYY-MM-DD HH:MI:SS') FROM dual ORA-01849: hour must be between 1 and 12 SELECT TO_TIMESTAMP('2016-02-30 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual ORA-01839: date not valid for month specified SELECT TO_TIMESTAMP('2016-02-29 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual 29.02.2016 19:59:59 SELECT TO_TIMESTAMP('2015-02-29 19:59:59', 'YYYY-MM-DD HH24:MI:SS') FROM dual ORA-01839: date not valid for month specified I think this is sufficient evidence to show that we ought to change to_timestamp et al. to apply range checking on the component fields. And while I've not attempted to characterize exactly what Oracle does with extra spaces, non-matching punctuation, etc, it's also clear to me that their behavior is different and probably saner than ours. So I vote for fixing these functions to behave more like Oracle, and forgetting about having a separate family of to_date_valid() functions or optional parameters or anything of the sort. I might've been in favor of that, until I saw how far down the rabbit hole this thread had gotten. Let's just call it a bug and fix it. Having range checking after the field scanning phase also changes the terms of discussion about what we need to do in the scanning phase, since it would catch many (of course not all) of the problems that arise from field boundary issues. So I think we should get the range changing committed first and then fool with the scanner. The 0002 patch that Artur sent for this purpose in https://www.postgresql.org/message-id/22dbe4e0-b550-ca86-8634-adcda0faa159@postgrespro.ru seems like the right approach to me, though I've not read it in detail yet. I propose to work through that and commit it. I'm much less sold on his 0001 patch, but I tend to agree that what we want is to adjust the scanner behavior. I do not like the reverse-conversion wrapper that Andreas proposes here; quite aside from micro-efficiency issues, it seems to me that that just begs the question of how you know what the input is supposed to mean. In short, I think we should reject this thread + CF entry altogether and instead move forward with the approach in the other thread. It's probably too late in the September CF to close out the other submission entirely, but we could get the range checking committed and then RWF the other part for reconsideration later. regards, tom lane