Thread: bug in date_part() function in 6.5.2, 7.0.2
Alex Karpov (analyst@sibinet.ru) reports a bug with a severity of 1 The lower the number the more severe it is. Short Description bug in date_part() function in 6.5.2, 7.0.2 Long Description Checked on Postgres version 6.5.2 and 7.0.2: Incorrect DOW and DAY parts of date calculation. See exanple code. I`ve not tested this function for another time interval, only for march 2000. ?column? | date_part | date_part ------------+-----------+----------- 24.03.2000 | 24 | 5 25.03.2000 | 25 | 6 26.03.2000 | 25 | 6 27.03.2000 | 27 | 1 (4 rows) Sample Code create table oops (date date); insert into oops (date) values (to_date('24.03.2000','dd.mm.yyyy')); insert into oops (date) values (to_date('25.03.2000','dd.mm.yyyy')); insert into oops (date) values (to_date('26.03.2000','dd.mm.yyyy')); insert into oops (date) values (to_date('27.03.2000','dd.mm.yyyy')); select date::date ,date_part('day',date::date) ,date_part('dow',date::date) from oops order by date; ?column? | date_part | date_part ------------+-----------+----------- 24.03.2000 | 24 | 5 25.03.2000 | 25 | 6 26.03.2000 | 25 | 6 27.03.2000 | 27 | 1 (4 rows) No file was uploaded with this report
> Sample Code > create table oops (date date); > > insert into oops (date) values (to_date('24.03.2000','dd.mm.yyyy')); > insert into oops (date) values (to_date('25.03.2000','dd.mm.yyyy')); > insert into oops (date) values (to_date('26.03.2000','dd.mm.yyyy')); > insert into oops (date) values (to_date('27.03.2000','dd.mm.yyyy')); > > select > date::date > ,date_part('day',date::date) > ,date_part('dow',date::date) > from oops > order by date; > > ?column? | date_part | date_part > ------------+-----------+----------- > 24.03.2000 | 24 | 5 > 25.03.2000 | 25 | 6 > 26.03.2000 | 25 | 6 > 27.03.2000 | 27 | 1 > (4 rows) Interesting... test=# select date::timestamp, date_part('day', date), to_char(date, 'DD') from oops; ?column? | date_part | to_char ------------------------+-----------+--------- 2000-03-24 00:00:00+01 | 24 | 24 2000-03-25 00:00:00+01 | 25 | 25 2000-03-25 23:00:00+01 | 25 | 25 2000-03-27 00:00:00+02 | 27 | 27 (4 rows) to_char() and date_part() has different code but some result, a problem can't be in date_part()... test=# select date::timestamp from oops; ?column? ------------------------ 2000-03-24 00:00:00+01 2000-03-25 00:00:00+01 2000-03-25 23:00:00+01 ^^^^ ???? 2000-03-27 00:00:00+02 (4 rows) ....it's not date_part() bug, it's to_date() bug: test=# select to_date('26.03.2000','dd.mm.yyyy')::timestamp; ?column? ------------------------ 2000-03-25 23:00:00+01 ^^^ ! Bug ! test=# select to_timestamp('26.03.2000','dd.mm.yyyy'); to_timestamp ------------------------ 2000-03-26 00:00:00+01 ^^^ ! correct ! But to_date() call to_timestamp() only: Datum to_date(PG_FUNCTION_ARGS) { /* Quick hack: since our inputs are just like to_timestamp, * hand over the whole input info struct... */ return DirectFunctionCall1(timestamp_date, to_timestamp(fcinfo)); } Comments, some idea? Karel
Karel Zak <zakkr@zf.jcu.cz> writes: > ....it's not date_part() bug, it's to_date() bug: > test=# select to_date('26.03.2000','dd.mm.yyyy')::timestamp; > ?column? > ------------------------ > 2000-03-25 23:00:00+01 > ^^^ > ! Bug ! > test=# select to_timestamp('26.03.2000','dd.mm.yyyy'); > to_timestamp > ------------------------ > 2000-03-26 00:00:00+01 > ^^^ > ! correct ! Looks like it's a DST-transition issue. Here in EST5EDT, where this year's EST->EDT transition occurred at 2000-04-02 02:00, I get regression=# select to_date('26.03.2000','dd.mm.yyyy')::timestamp; ?column? ------------------------ 2000-03-26 00:00:00-05 (1 row) regression=# select to_timestamp('26.03.2000','dd.mm.yyyy'); to_timestamp ------------------------ 2000-03-26 00:00:00-05 (1 row) regression=# select to_timestamp('02.04.2000','dd.mm.yyyy'); to_timestamp ------------------------ 2000-04-02 00:00:00-05 (1 row) regression=# select to_date('02.04.2000','dd.mm.yyyy')::timestamp; ?column? ------------------------ 2000-04-01 23:00:00-05 (1 row) I presume 26 March was the transition date where you live? > But to_date() call to_timestamp() only: > Datum > to_date(PG_FUNCTION_ARGS) > { > /* Quick hack: since our inputs are just like to_timestamp, > * hand over the whole input info struct... > */ > return DirectFunctionCall1(timestamp_date, to_timestamp(fcinfo)); > } > Comments, some idea? There's nothing wrong in the above code. What's broken is the date-to- timestamp type conversion for a DST transition date: regression=# select to_date('02.04.2000','dd.mm.yyyy'); to_date ------------ 2000-04-02 (1 row) regression=# select '2000-04-02'::date::timestamp; ?column? ------------------------ 2000-04-01 23:00:00-05 (1 row) regression=# select '2000-10-29'::date::timestamp; ?column? ------------------------ 2000-10-29 01:00:00-04 (1 row) Looks to me like an off-by-one kind of problem in deciding which timezone applies to midnight of a transition day. regards, tom lane
> > Looks to me like an off-by-one kind of problem in deciding which > timezone applies to midnight of a transition day. > > regards, tom lane Hmm ... really cool solution. You are right! I was sure that in the my to_xxxx() code can't be bug :-)) Thanks. Karel
> > ....it's not date_part() bug, it's to_date() bug: > Looks to me like an off-by-one kind of problem in deciding which > timezone applies to midnight of a transition day. Probably a bit worse (but no problem to solve ;): you need to make sure that you rotate the date type to the correct time zone when you do the conversion to timestamp. If you just blast the yy/mm/dd into the time structure, leaving the time zone fields zero'd out, then you will be doing the conversion in UTC. When the timestamp is read back out it is converted to your local time zone. The date->timestamp conversion code gets this right, so you might want to look at that. From my testing, the only annoying case is for 02:00 on the day of a DST transition, when you either skip or get an extra hour. If you still have trouble, I'd be happy to look at your code... - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Looks to me like an off-by-one kind of problem in deciding which >> timezone applies to midnight of a transition day. > The date->timestamp conversion code gets this right, so you might want > to look at that. Au contraire: the cited examples appear to prove that the date->timestamp conversion code gets this *wrong*. Or did you miss the point of regression=# select '2000-04-02'::date::timestamp; ?column? ------------------------ 2000-04-01 23:00:00-05 (1 row) regards, tom lane
> >> Looks to me like an off-by-one kind of problem in deciding which > >> timezone applies to midnight of a transition day. > > The date->timestamp conversion code gets this right, so you might want > > to look at that. > Au contraire: the cited examples appear to prove that the > date->timestamp conversion code gets this *wrong*. Or did > you miss the point of > regression=# select '2000-04-02'::date::timestamp; > ?column? > ------------------------ > 2000-04-01 23:00:00-05 > (1 row) *sigh* Well, of course I missed the point. Thanks for clarifying that ;) OK, the situation is coming back to me now. The conversion is done in steps: 1) Check that the date is likely to be within the Unix system time range. Otherwise, do all conversions/calculations in UTC. 2) Convert to an integer "Unix system time". 3) Rotate by 12 hours (to UTC noon!). This is supposed to ensure that we stay in the correct day after conversion to local time *no matter what time zone we are actually in*, but is likely the problem in this edge case. 4) Call localtime() to fill in the fields of a tm structure. This is how I get ahold of the time zone (which is not known before this step). For this DST edge case, the time zone is off by one hour :( 5) Copy the fields from the result of the call to localtime() into a new tm structure, with zeros for time fields. 6) Call tm2timestamp(), which recalculates an internal UTC-based timestamp value based on the current contents of the tm structure. It *may* be possible to run through some of these steps *twice*, to verify that the conversion to/from the tm structure including time zone info gives a result on an even day boundary. But it sounds expensive. Comments and suggestions are appreciated... - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >>>>> Looks to me like an off-by-one kind of problem in deciding which >>>>> timezone applies to midnight of a transition day. > 2) Convert to an integer "Unix system time". > 3) Rotate by 12 hours (to UTC noon!). This is supposed to ensure that we > stay in the correct day after conversion to local time *no matter what > time zone we are actually in*, but is likely the problem in this edge > case. > 4) Call localtime() to fill in the fields of a tm structure. This is how > I get ahold of the time zone (which is not known before this step). For > this DST edge case, the time zone is off by one hour :( > 5) Copy the fields from the result of the call to localtime() into a new > tm structure, with zeros for time fields. Seems like you could just skip step 3 and call localtime() with fields indicating midnight of the specified date. Then use the complete localtime result (don't discard any fields) and you should be OK, no? regards, tom lane
> Seems like you could just skip step 3 and call localtime() with fields > indicating midnight of the specified date. Then use the complete > localtime result (don't discard any fields) and you should be OK, no? Pretty sure this won't work, since the complete localtime result will not be local midnight, which is the expected result. The problem is that I don't know what the actual time zone (and DST status) is until *after* converting from UTC to local. And when the input type is "date" I have no time zone context to use as a hint. Rotating to UTC noon solves some problems since converting from UTC to local time will always stay within the same local calendar day. But it falls down on the DST transition days, as we've noticed. I started playing with mktime(), which would seem to be the right thing, but it isn't clear from my docs that this routine will work without the time zone fields filled in. Well, no matter what the docs say it seems to interact badly with my code and I get segfaults a few lines farther along. I'm poking at it to track down the problem. - Thomas
Thomas Lockhart <lockhart@alumni.caltech.edu> writes: >> Seems like you could just skip step 3 and call localtime() with fields >> indicating midnight of the specified date. Then use the complete >> localtime result (don't discard any fields) and you should be OK, no? > Pretty sure this won't work, since the complete localtime result will > not be local midnight, which is the expected result. Actually, now that I go back and reread the man pages, the correct recipe is * fill a struct tm with y/m/d, h = m = s = 0, isdst = -1 * call mktime() to produce a time_t * either use the updated struct tm (mktime() side effect), or convert the time_t directly to timestamp. mktime() is an ANSI C requirement, so even though it's not as old as localtime(), it's probably safe enough. regards, tom lane
> Seems like you could just skip step 3 and call localtime() with fields > indicating midnight of the specified date. Then use the complete > localtime result (don't discard any fields) and you should be OK, no? OK, I have a solution which involves mktime(). As a side effect, I've stripped some code, so date and date,time to timestamp conversions no longer have to run through a complete conversion via a tm structure. Things should be faster (unless mktime() is substantially slower than the other support routines). btw, the date/timetz to timestamp conversion routine has shrunk down to one line since it turns out all required fields were already available :) Regression tests pass (which I guess tells me that they need some added cases, since they passed before too); will commit sometime soon. - Thomas