Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server - Mailing list pgsql-hackers
From | Shachar Shemesh |
---|---|
Subject | Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server |
Date | |
Msg-id | 4653C0EE.1010006@shemesh.biz Whole thread Raw |
In response to | Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server (Tom Lane <tgl@sss.pgh.pa.us>) |
Responses |
Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
|
List | pgsql-hackers |
Please note - I'm not trying to pick up a fight. Tom Lane wrote: > > Your source appears fairly ignorant of things-float. That is possible, and even likely, however > If they really are > using decimal FP, it's easy to demonstrate that a lossless conversion > to/from binary representation of similar size is impossible. The set of > exactly representable values is simply different. When I originally read this statement my initial response was *dough*. After having time to sleep over it, however, I'm no longer as certain as I was. Before you explode at me (again :), I'm not arguing that you can do binary based calculations of decimal numbers without having rounding errors that come to bite you. I know you can't. What I'm saying is that we have two cases to consider. In one of them the above is irrelevant, and in the other I'm not so sure it's true. The first case to consider is that of the client getting a number from the server and doing calculations on it. Since the client works in base 2, the inaccuracies are built into the model no matter what we'll do and how we export the actual number. As such, I don't think we need worry about it. If the client also works in base 10, see the second case. The second case is of a number being exported from the server, stored in binary (excuse the pun) format on the client, and then resent back to the server, where it is translated from base 2 to base 10 again. You will notice that no actual calculation will be performed on the number while in base 2. The only question is whether the number, when translated to base 2 and then back to base 10 is guaranteed to maintain its original value. I don't have a definite answer to that, but I did calculate the difference in representation. A 64 bit IEEE floating point has 1 bit of sign, 52 bit of mantissa and 11 bit of exponent. The number actually has 53 bits of mantissa for non-denormalized numbers, as there is another implied "1" at the beginning. I'm going to assume, however, that all binary numbers are denormalized, and only use 52. I'm allowed to assume that for two reasons. The first is that it decreases the accuracy of the base 2 representation, and thus makes my own argument harder to prove. If I can prove it under this assumption, it's obvious that it's still going to hold true with an extra bit of accuracy. The second reason I'm going to assume it is because I don't see how we can have "normalized" numbers under the base 10 representation. The assumed "1" is there because a base 2 number will have to have a leading "1" somewhere, and having it at the start will give best accuracy. The moment the leading number can be 1-9, it is no longer possible to assume it. In other words, I don't see how a base 10 representation can assume that bit, and it is thus losing it. Since this assumption may be wrong, I am "penalizing" the base 2 representation as well to compensate. To recap, then. With base 2 we have 52 bits of mantissa, which will get us as high as 4,503,599,627,370,500 combinations. These will have an effective exponent range (not including denormalized numbers) of 2,048 different combinations, which can get us (let's assume no fractions on both bases) as high as 2^2048, or 616.51 decimal digits. With decimal representation, each 4 bits are one digit, so the same 52 bits account for 13 digits, giving 10,000,000,000,000 possible mantissas, with an exponent range of 11 bits, but raised to the power of 10, so resulting in a range of 2048 decimal digits. Of course, we have no use for such a huge exponent range with such small mantissa, so we are likely to move bits from the exponent to the mantissa. Since we have no use for fractions of a decimal digit, we will move the bits in multiples of 4. I'm going now to assume an absurd assumption. I'll assume we move 8 bits from the exponent to the mantissa. This leaves us with only three bits of exponent, which will only cover 8 decimal digits, but give us 60 bits, or 15 decimal digits in the mantissa, or a range of 1,000,000,000,000,000 numbers. Please note that the base 2 representation still has 4.5 times more mantissas it can represent using only 52 bits. So what have we got so far? A 64 bit decimal based floating point can give up almost all of its exponent in order to create a mantissa that has, roughly, the same range as the base 2, and still be outnumbered by 2.17 bits worth ASSUMING WE DON'T USE THE IMPLIED BIT IN THE BASE 2 REPRESENTATION. Now, I suggest that even with "just" 2.17 bits extra, the binary representation will be accurate enough to hold the approximation of the decimal number to such precision that the back and forth translation will reliably produce the original number. Of course, if we do use the extra bit, it's 3.17 bits extra. If we don't give up 8, but only 4 bits from the exponent, we now have 6.49 bits extra (5.49 if you want the above assumption), while having an exponent range of only 128 decimal digits (as opposed to 616 with IEEE). Now, I am by no means as knowledgeable about these things as Tom, so it is possible that rounding considerations will STILL cause us to lose precision. I'm just claiming that the safety margins we have are quite wide. Shachar
pgsql-hackers by date: