Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types - Mailing list pgsql-hackers
From | Kyotaro HORIGUCHI |
---|---|
Subject | Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types |
Date | |
Msg-id | 20170118.173612.13506164.horiguchi.kyotaro@lab.ntt.co.jp Whole thread Raw |
In response to | Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types (Emre Hasegeli <emre@hasegeli.com>) |
Responses |
Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types
|
List | pgsql-hackers |
Hello. (I apologize in advance for possible inaccurate wording on maths, which might cause confusion..) At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in <CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw@mail.gmail.com> > > - Floating point comparisons for gemetric types > > > > Comparison related to geometric types is performed by FPeq > > macro and similars defined in geo_decls.h. This intends to give > > tolerance to the comparisons. > > > > A > > FPeq: |<=e-|-e=>| (<= means inclusive, e = epsilon = tolerance) > > FPne: ->| e | e |<- (<- means exclusive) > > FPlt: | e |<- > > FPle: |<=e | > > FPgt: ->| e | > > FPge: | e=>| > > > > These seems reasonable ignoring the tolerance amount issue. > > I tried to explain below why I don't find this method reasonable. Thank you very much for the explanation. > > At least for the point type, (bitmap) index scan is consistent > > with sequential scan. I remember that the issue was > > "inconsistency between indexed and non-indexed scans over > > geometric types" but they seem consistent with each other. > > You can see on the Git history that it was a lot of trouble to make > the indexes consistent with the operators, and this is limiting > improving indexing of the geometric types. Yeah. geo_ops.c has the following comment from its first version in the current repository. | * Relational operators for Points. | * Since we do have a sense of coordinates being | * "equal" to a given accuracy (point_vert, point_horiz), So, we take it as a requirement for geometric types. All the difficulties come from the "nature". > > You mentioned b-tree, which don't have predefined opclass for > > geometric types. So the "inconsistency" may be mentioning the > > difference between geometric values and combinations of plain > > (first-class) values. But the two are different things and > > apparently using different operators (~= and = for equality) so > > the issue is not fit for this. More specifically, "p ~= p0" and > > "x = x0 and y = y0" are completely different. > > Yes, the default btree operator class doesn't have to implement > standard equality and basic comparison operator symbols. We can > implement it with different symbols. Perhaps I don't understand it. Many opclass are defined for btree. But since (ordinary) btree handles only one-dimentional, totally-orderable sets, geometric types even point don't fit it. Even if we relaxed it by removing EPSILON, multidimentional data still doesn't fit. Still we can implement equality mechanism *over* multiple btree indexes, but it cannot be a opclass for btree. > > Could you let me (or any other guys on this ml) have more precise > > description on the problem and/or what you want to do with this > > patch? > > I will try to itemize the current problems with the epsilon method: > > = Operator Inconsistencies > > My current patches address all of them. No. It just removes tolerans that is a "requirement" for coordinates of geometric types. I think we don't have enough reason to remove it. We could newly define geometric types with no-tolerance as 'exact_point" or something but it still doesn't fit btree. > - Only some operators are using the epsilon. > > I included the list of the ones which doesn't use the epsilon on > initial post of this thread. This makes the operators not compatible > with each other. For example, you cannot say "if box_a @> box_b and > box_b @> point then box_a @> box_b". (It must be "then box_a @> point".) Both of the operators don't seem to use EPSILON so the transitivity holds, but perhaps we should change them to more appropriate shape, that is, to use EPSILON. Even having the tolerance, we can define the operators to keep the transitivity. > - The application of epsilon is implementation dependent. > > Epsilon works between two numbers. There is not always a single way > of implementing geometric operators. This is surprising to the users. > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <= > epsilon". Maybe it should be like "if box_a ~= box_b && box_b @< box_a then ..". I'm not sure off-hand. But I don't see the relationship is significant. > This is also creating a high barrier on developing those operators. > Fixing bugs and performance regressions are very likely to change the > results. I agree to you. The effect of the EPSILON is quite arbitrary and difficult to see their chemistry. > - Some operators are just wrong: > > Line operators are not well maintained. The following are giving > wrong results when neither A or B of lines are not exactly 1: > > * line # line > * line <-> line > * point ## line > * point ## lseg These are not comparison operators so EPSILON is unrelated. And I agree that they needs fix. > They are broken independently from the epsilon, though it is not easy > to fix them without changing the results of the cases that are > currently working. Although I'm not sure they are using EPSILON, I think they don't need it. > - Some operators are violating commutative property. > > For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a". Whether EPSILON is introduced or not, commutativity cannot be assured for it from calculation error, I suppose. > - Some operators are violating transitive property. > > For example, you cannot say "if point_a ~= point_b and point_b ~= > point_c then point_a ~= point_c". It holds only in ideal (or mathematical) world, or for similars of integer (which are strictly implemented mathematica world on computer). Without the EPSILON, two points hardly match by floating point error, as I mentioned. I don't have an evidence though (sorry), mere equality among three floating point numbers is not assured. Sorry, I have no more time today, I'll continue tomorrow. > - The operators with epsilon are not compatible with float operators. > > This is also surprising for the users. For example, you cannot say > "if point_a << point_b then point_a[0] < point_b[0]". > > - The operators are not checking for underflow or overflow. > - NaN values are not treated same as the float datatype. > > Those are independent problems, though checking them with epsilon > would create additional set of inconsistencies. Epsilon creates > especially more problems around 0. > > = Missing Bits on the Operator Classes > > I want to address those in the future, if my patches are accepted. > > - There is no GiST or SP-GiST support for cross datatype operators > other than containment. > > We can develop nice cross-datatype operator families to support more > operators. It would have been easier, if we could rely on some > operators to implement others. Geometric types serve the purpose of > demonstrating our indexing capabilities. We should make them good > examples, not full of special cases with hidden bugs. > > - There is no BRIN support for types other than the box. > > BRIN inclusion operator class is datatype agnostic. It uses some > operators to implement others which doesn't work for anything more the > box vs box operators, because of the inconsistencies. > > - GROUP BY and DISTINCT are not working. > - ORDER BY is not working. > > There are regular user complaints about this on the mailing list. We > can easily provide default hash and btree operator classes that would > fix the problem, if we haven't had the epsilon. > -- Kyotaro Horiguchi NTT Open Source Software Center
pgsql-hackers by date: