Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server - Mailing list pgsql-hackers
From | Shachar Shemesh |
---|---|
Subject | Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server |
Date | |
Msg-id | 46512869.6000108@shemesh.biz Whole thread Raw |
In response to | Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server (Greg Smith <gsmith@gregsmith.com>) |
Responses |
Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server
Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server |
List | pgsql-hackers |
Greg Smith wrote: > On Sun, 20 May 2007, Shachar Shemesh wrote: > >> This is not data given to store. It's data being exported. > > Data being exported has a funny way of turning around and being stored > in the database again. It's kind of nice to know the damage done > during that round trip is minimized. I agree. All I'm asking, and have not received an answer yet, is whether assuring that we don't have any SEMANTIC damage is enough. In other words, if I can assure that data exported and then imported will always, under all circumstances, compare the same to the original, would that be enough of a requirement? In other words, if I offer a format that is assured of preserving both mantissa and exponent precision and range, as well as all extra attributes (+/-Infinity and NaN), but does not guarantee that the semantically identical constructs are told apart (+0 vs. -0, and the different NaNs), would that format be acceptable? > >> Tom seems to think this is not a goal (though, aside from his disbelief >> that such a goal is attainable, I have heard no arguments against it). > > If Tom thinks it's not attainable, the best way to convince him > otherwise would be demonstrate that it's not. Granted. That's why I've been quite. I'm pulling my sources for the ARM FP format details, to make sure what I have in mind would work. > One reason people use text formats for cross-platform exchanges is > that getting portable binary compatibility for things like floating > point numbers is much harder than you seem to think it is. I'll just point out that none of the things that Tom seems to be concerned about are preserved over text format. > > Stepping back for a second, your fundamental argument seem to be based > on the idea that doing conversions to text is such a performance issue > in a driver that it's worth going through these considerable > contortions to avoid it. Converting to text adds a CPU overhead in both client and server, as well as a network transmission overhead. Even if it's not determental to performance, I'm wondering why insist on paying it. You are right that I offered no concrete implementation. I'll do it now, but it is dependent on an important question - what is the range for the ARM floating point. Not having either an ARM to test it on, nor the floating point specs, it may be that a simpler implementation is possible. I offer this implementation up because I see people think I'm talking up my ass. A 64 bit IEEE float can distinguish between almost all 2^64 distinct floats. It loses two combinations for the + and - infinity, one combination for the dual zero notation, and we also lose all of the NaNs, which means (2^mantissa)-2 combinations. Over all, an n bit IEEE float with m bits of mantissa will be able to represent 2^n - 2^m - 1 actual floating point numbers. That means that if we take a general signed floating point number, of which representation we know nothing but the fact it is n bits wide, and that it has a mantissa and an exponent, and we want to encode it as an IEEE number of the same width with mantissa size m and exponent of size e=n-m-1, we will have at most 2^m+1 unrepresentable numbers. In a nutshell, what I suggest is that we export floating points in binary form in IEEE format, and add a status word to it. The status word with dictate how many bits of mantissa there are in the IEEE format, what the exponent bias is, as well as add between one and two bits to the actual number, in case the number of floats the exported platform has is larger than the number of floats that can be represented in IEEE with the same word length. The nice thing about this format is that exporting from an IEEE platform is as easy as exporting the binary image of the float, plus a status word that is a constant. Virtually no overhead. Importing from an IEEE platform to an IEEE platform is, likewise, as easy as comparing the status word to your own constant, and if they match, just copy the binary. This maintains all of Tom's strict round trip requirements. In fact, for export/import on the same IEEE platform no data conversion of any kind takes place at all. There are questions that need to be answered. For example, what happens if you try to import a NaN into a platform that has no such concept? You'd have to put in a NULL or something similar. Similarly, how do you import Infinity. These, however, are questions that should be answered the same way for text imports, so there is nothing binary specific here. I hope that, at least, presents a workable plan. As I said before, I'm waiting for the specs for ARM's floating point before I can move forward. If, as I suspect, ARM's range is even more limited, then I may try and suggest a more compact export representation pending question of whether we have any other platform that is non-IEEE, and what is the situation there. Shachar
pgsql-hackers by date: