Re: Tracking timezone abbreviation removals in the IANA tz database - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Tracking timezone abbreviation removals in the IANA tz database |
Date | |
Msg-id | 9104.1472824610@sss.pgh.pa.us Whole thread Raw |
In response to | Tracking timezone abbreviation removals in the IANA tz database (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Tracking timezone abbreviation removals in the IANA tz database
|
List | pgsql-hackers |
I wrote: > So I'm leaning to the just-remove-it answer for any deleted abbreviation > that relies on a dynamic definition. Names that never had more than one > UTC offset can remain in the tznames list. After a bit more thought and consumption of caffeine, I am thinking that that won't really be good enough. It's clear that the IANA crowd intend to continue removing made-up abbreviations, in fact there's a pile of that in their queue right now: http://mm.icann.org/pipermail/tz/2016-August/023941.html The trouble from our perspective is that a lot of those abbreviations have shifted meaning over the years (if they had any real-world usage maybe they'd have stayed more stable...), which means that we are using dynamic abbreviations for them, which means breakage as soon as the tznames list diverges from the IANA database. Which would be manageable if we only shipped those filesets together, but in installations built with --with-system-tzdata (which at least ought to include most vendor distributions of PG), they don't come from the same place. This means that even if we fix removals when we make new releases, other abbreviations may break the next time the vendor updates their tzdata package. We could maybe tolerate individual abbreviations failing like this, but as noted in bug #14307, the pg_timezone_abbrevs view fails altogether if there are any broken abbreviations in the active timezone_abbreviations list. So that makes this a problem for all users whether or not they care about the specific abbreviations affected. This leads me to think that we need to redefine the dynamic abbreviations feature so it's a bit more robust in the face of this type of situation. A really simple change (at least logically, haven't looked at the code) would be to say that a dynamic abbreviation is just a macro for the referenced zone name, that is if we have NOVT Asia/Novosibirsk in the timezone_abbreviations list then writing NOVT in a timezone value is exactly equivalent to writing Asia/Novosibirsk. However, that breaks backwards compatibility (at least for non-broken abbreviations). Our convention up to now has been that if you write a standard-time zone abbreviation then what it means is your local standard-time UTC offset, even if DST is currently prevailing in your zone. For example, DST is currently in force in the USA, so writing "America/New_York" means UTC-4, but "EST" means UTC-5 regardless. Likewise "EDT" means UTC-4 and will still mean that when winter comes. So the idea I'm toying with (again, haven't tried to code this) is to say that *if* we can match the abbreviation to something in the referenced zone then we'll use that, but otherwise we fall back to treating the abbreviation as a macro for the zone name. This would ensure that updates to the IANA data could not break existing timezone_abbreviations entries (at least, not unless IANA were to remove a zone name altogether, but they have never done that AFAIR). An update could cause an abbreviation's meaning to change, but that's true already, in fact it's kind of the whole point. If we were to do that, then perhaps we would not need to remove existing timezone_abbreviations entries even if IANA deems them obsolete. I'd still be inclined to remove them from our sample data files, but users would easily be able to put them back if they wanted. Thoughts? regards, tom lane
pgsql-hackers by date: