Re: Fix overflow in DecodeInterval - Mailing list pgsql-hackers
From | Joseph Koshakow |
---|---|
Subject | Re: Fix overflow in DecodeInterval |
Date | |
Msg-id | CAAvxfHd+8GAZSPxcuJa+Z3TG2WoPUmJ-ER9NxjAhKwQaDzgdKg@mail.gmail.com Whole thread Raw |
In response to | Re: Fix overflow in DecodeInterval (Joseph Koshakow <koshy44@gmail.com>) |
Responses |
Re: Fix overflow in DecodeInterval
Re: Fix overflow in DecodeInterval Re: Fix overflow in DecodeInterval |
List | pgsql-hackers |
Hi All, Sorry for the delay in the new patch, I've attached my most recent patch to this email. I ended up reworking a good portion of my previous patch so below I've included some reasons why, notes on my current approach, and some pro/cons to the approach. * The main reason for the rework had to do with double conversions and shared code. * The existing code for rounding had a lot of int to double casting and vice versa. I *think* that doubles are able to completely represent the range of ints. However doubles are not able to represent the full range of int64. After making the change I started noticing a lot of lossy behavior. One thought I had was to change the doubles to long doubles, but I wasn't able to figure out if long doubles could completely represent the range of int64. Especially since their size varies depending on the architecture. Does anyone know the answer to this? * I ended up creating two intermediate data structures for Intervals. One for decoding and one for everything else. I'll go into more detail below. * One common benefit was that they both contain a usec field which means that the Interval methods no longer need to carry around a separate fsec argument. * The obvious con here is that Intervals require two unique intermediate data structures, while all other date/time types can share a single intermediate data structure. I find this to be a bit clunky. * pg_itm_in is the struct used for Interval decoding. It's very similar to pg_tm, except all of the time related fields are collapsed into a single `int64 usec` field. * The biggest benefit of this was that all int64-double conversions are limited to a single function, AdjustFractMicroseconds. Instead of fractional units flowing down over every single time field, they only need to flow down into the single `int64 usec` field. * Overflows are caught much earlier in the decoding process which helps avoid wasted work. * I found that the decoding code became simpler for time fields, though this is a bit subjective. * pg_itm is the struct used for all other Interval functionality. It's very similar to pg_tm, except the tm_hour field is converted from int to int64 and an `int tm_usec` field was added. * When encoding and working with Intervals, we almost always want to break the time field out into hours, min, sec, usec. So it's helpful to have a common place to do this, instead of every function duplicating this code. * When breaking the time fields out, a single field will never contain a value greater than could have fit in the next unit higher. Meaning that minutes will never be greater than 60, seconds will be never greater than 60, and usec will never be greater than 1,000. So hours is actually the only field that needs to be int64 and the rest can be an int. * This also helps limit the impact to shared code (see below). * There's some shared code between Intervals and other date/time types. Specifically the DecodeTime function and the datetime_to_char_body function. These functions take in a `struct pg_tm` and a `fsec_t fsec` (fsec_t is just an alias for int32) which allows them to be re-used by all date/time types. The only difference now between pg_tm and pg_itm is the tm_hour field size (the tm_usec field in pg_itm can be used as the fsec). So to get around this I changed the function signatures to take a `struct pg_tm`, `fsec_t fsec`, and an `int64 hour` argument. It's up to the caller to provide to correct hour field. Intervals can easily convert pg_itm to a pg_tm, fsec, and hour. It's honestly a bit error-prone since those functions have to explicitly ignore the pg_tm->tm_hour field and use the provided hour argument instead, but I couldn't think of a better less intrusive solution. If anyone has a better idea, please don't hesitate to bring it up. * This partly existed in the previous patch, but I just wanted to restate it. All modifications to pg_itm_in during decoding is done via helper functions that check for overflow. All invocations of these functions either return an error on overflow or explicitly state why an overflow is impossible. * I completely rewrote the ParseISO8601Number function to try and avoid double to int64 conversions. I tried to model it after the parsing done in DecodeInterval, though I would appreciate extra scrutiny here. - Joe Koshakow
Attachment
pgsql-hackers by date: