Re: truncating timestamps on arbitrary intervals - Mailing list pgsql-hackers
From | John Naylor |
---|---|
Subject | Re: truncating timestamps on arbitrary intervals |
Date | |
Msg-id | CAFBsxsGUDP2jky8SLk_zyWXFzj_A0G6BLR7R2kVjOLc5vNROQw@mail.gmail.com Whole thread Raw |
In response to | Re: truncating timestamps on arbitrary intervals (Peter Eisentraut <peter.eisentraut@enterprisedb.com>) |
Responses |
Re: truncating timestamps on arbitrary intervals
|
List | pgsql-hackers |
On Thu, Nov 12, 2020 at 9:56 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
>
> On 2020-06-30 06:34, John Naylor wrote:
> > In v9, I've simplified the patch somewhat to make it easier for future
> > work to build on.
> >
> > - When truncating on month-or-greater intervals, require the origin to
> > align on month. This removes the need to handle weird corner cases
> > that have no straightforward behavior.
> > - Remove hackish and possibly broken code to allow origin to be after
> > the input timestamp. The default origin is Jan 1, 1 AD, so only AD
> > dates will behave correctly by default. This is not enforced for now,
> > since it may be desirable to find a way to get this to work in a nicer
> > way.
> > - Rebase docs over PG13 formatting changes.
>
> This looks pretty solid now. Are there any more corner cases and other
> areas with unclear behavior that you are aware of?
Hi Peter,
Thanks for taking a look!
I believe there are no known corner cases aside from not throwing an error if origin > input, but I'll revisit that when we are more firm on what features we want support.
> A couple of thoughts:
>
> - After reading the discussion a few times, I'm not so sure anymore
> whether making this a cousin of date_trunc is the right way to go. As
> you mentioned, there are some behaviors specific to date_trunc that
> don't really make sense in date_trunc_interval, and maybe we'll have
> more of those.
As far as the behaviors, I'm not sure exactly what you what you were thinking of, but here are two issues off the top of my head:
- If the new functions are considered variants of date_trunc(), there is the expectation that the options work the same way, in particular the timezone parameter. You asked specifically about that below, so I'll address that separately.
- In the "week" case, the boundary position depends on the origin, since a week-long interval is just 7 days.
> Also, date_trunc_interval isn't exactly a handy name.
> Maybe something to think about. It's obviously fairly straightforward
> to change it.
Effectively, it puts timestamps into bins, so maybe date_bin() or something like that?
> - There were various issues with the stride interval having months and
> years. I'm not sure we even need that. It could be omitted unless you
> are confident that your implementation is now sufficient.
Months and years were a bit tricky, so I'd be happy to leave that out if there is not much demand for it. date_trunc() already has quarters, decades, centuries, and millenia.
> - Also, negative intervals could be prohibited, but I suppose that
> matters less.
Good for the sake of completeness. I think they happen to work in v9 by accident, but it would be better not to expose that.
> - I'm curious about the origin being set to 0001-01-01. This seems to
> work correctly in that it sets the origin to a Monday, which is what we
> wanted, but according to Google that day was a Saturday. Something to
> do with Julian vs. Gregorian calendar?
Right, working backwards from our calendar today, it's Monday, but at the time it would theoretically be Saturday, barring leap year miscalculations.
> Maybe we should choose a date
> that is a bit more recent and easier to reason with.
2001-01-01 would also be a Monday aligned with centuries and millenia, so that would be my next suggestion. If we don't care to match with date_trunc() on those larger units, we could also use 1900-01-01, so the vast majority of dates in databases are after the origin.
> - Then again, I'm thinking that maybe we should make the origin
> mandatory. Otherwise, the default answers when having strides larger
> than a day are entirely arbitrary, e.g.,
>
> => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
> 0190-01-01 00:00:00 BC
>
> => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
> 0191-01-01 00:00:00
Right. In the first case, the default origin is also after the input, and crosses the AD/BC boundary. Tricky to get right.
> Perhaps the origin could be defaulted if the interval is less than a day
> or something like that.
If we didn't allow months and years to be units, it seems the default would always make sense?
> - I'm wondering whether we need the date_trunc_interval(interval,
> timestamptz, timezone) variant. Isn't that the same as
> date_trunc_interval(foo AT ZONE xyz, value)?
I based this on 600b04d6b5ef6 for date_trunc(), whose message states:
date_trunc(field, timestamptz, zone_name)
is the same as
date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name
so without the shorthand, you need to specify the timezone twice, once for the calculation, and once for the output.
--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
>
> On 2020-06-30 06:34, John Naylor wrote:
> > In v9, I've simplified the patch somewhat to make it easier for future
> > work to build on.
> >
> > - When truncating on month-or-greater intervals, require the origin to
> > align on month. This removes the need to handle weird corner cases
> > that have no straightforward behavior.
> > - Remove hackish and possibly broken code to allow origin to be after
> > the input timestamp. The default origin is Jan 1, 1 AD, so only AD
> > dates will behave correctly by default. This is not enforced for now,
> > since it may be desirable to find a way to get this to work in a nicer
> > way.
> > - Rebase docs over PG13 formatting changes.
>
> This looks pretty solid now. Are there any more corner cases and other
> areas with unclear behavior that you are aware of?
Hi Peter,
Thanks for taking a look!
I believe there are no known corner cases aside from not throwing an error if origin > input, but I'll revisit that when we are more firm on what features we want support.
> A couple of thoughts:
>
> - After reading the discussion a few times, I'm not so sure anymore
> whether making this a cousin of date_trunc is the right way to go. As
> you mentioned, there are some behaviors specific to date_trunc that
> don't really make sense in date_trunc_interval, and maybe we'll have
> more of those.
As far as the behaviors, I'm not sure exactly what you what you were thinking of, but here are two issues off the top of my head:
- If the new functions are considered variants of date_trunc(), there is the expectation that the options work the same way, in particular the timezone parameter. You asked specifically about that below, so I'll address that separately.
- In the "week" case, the boundary position depends on the origin, since a week-long interval is just 7 days.
> Also, date_trunc_interval isn't exactly a handy name.
> Maybe something to think about. It's obviously fairly straightforward
> to change it.
Effectively, it puts timestamps into bins, so maybe date_bin() or something like that?
> - There were various issues with the stride interval having months and
> years. I'm not sure we even need that. It could be omitted unless you
> are confident that your implementation is now sufficient.
Months and years were a bit tricky, so I'd be happy to leave that out if there is not much demand for it. date_trunc() already has quarters, decades, centuries, and millenia.
> - Also, negative intervals could be prohibited, but I suppose that
> matters less.
Good for the sake of completeness. I think they happen to work in v9 by accident, but it would be better not to expose that.
> - I'm curious about the origin being set to 0001-01-01. This seems to
> work correctly in that it sets the origin to a Monday, which is what we
> wanted, but according to Google that day was a Saturday. Something to
> do with Julian vs. Gregorian calendar?
Right, working backwards from our calendar today, it's Monday, but at the time it would theoretically be Saturday, barring leap year miscalculations.
> Maybe we should choose a date
> that is a bit more recent and easier to reason with.
2001-01-01 would also be a Monday aligned with centuries and millenia, so that would be my next suggestion. If we don't care to match with date_trunc() on those larger units, we could also use 1900-01-01, so the vast majority of dates in databases are after the origin.
> - Then again, I'm thinking that maybe we should make the origin
> mandatory. Otherwise, the default answers when having strides larger
> than a day are entirely arbitrary, e.g.,
>
> => select date_trunc_interval('10 year', '0196-05-20 BC'::timestamp);
> 0190-01-01 00:00:00 BC
>
> => select date_trunc_interval('10 year', '0196-05-20 AD'::timestamp);
> 0191-01-01 00:00:00
Right. In the first case, the default origin is also after the input, and crosses the AD/BC boundary. Tricky to get right.
> Perhaps the origin could be defaulted if the interval is less than a day
> or something like that.
If we didn't allow months and years to be units, it seems the default would always make sense?
> - I'm wondering whether we need the date_trunc_interval(interval,
> timestamptz, timezone) variant. Isn't that the same as
> date_trunc_interval(foo AT ZONE xyz, value)?
I based this on 600b04d6b5ef6 for date_trunc(), whose message states:
date_trunc(field, timestamptz, zone_name)
is the same as
date_trunc(field, timestamptz at time zone zone_name) at time zone zone_name
so without the shorthand, you need to specify the timezone twice, once for the calculation, and once for the output.
--
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date: