Re: GiST support for inet datatypes - Mailing list pgsql-hackers
From | Emre Hasegeli |
---|---|
Subject | Re: GiST support for inet datatypes |
Date | |
Msg-id | CAE2gYzxc0FXEwe59sFdUZNQ24c+fRBDMGXwwvbYvmeaNateidw@mail.gmail.com Whole thread Raw |
In response to | Re: GiST support for inet datatypes (Andreas Karlsson <andreas@proxel.se>) |
Responses |
Re: GiST support for inet datatypes
Re: GiST support for inet datatypes |
List | pgsql-hackers |
Third versions of the patches attached. They are rebased to the HEAD. In this versions, the bitncommon function is changed. <sys/socket.h> included to network_gist.c to be able to compile it on FreeBSD. Geometric mean calculation for partial bucket match on the function inet_hist_inclusion_selectivity reverted back. It was something I changed without enough testing on the second revision of the patch. This version uses the maximum divider calculated from the boundaries of the bucket, like the first version. It is simpler and more reliable. 2014-01-31 3:58, Andreas Karlsson <andreas@proxel.se>: > I agree that the new operator class should be the default one, it is more > useful and also not buggy. No idea about the naming. I have misread the name, rename was not necessary. I removed the DEFAULT keywords for the inet and the cidr operator classes from btree_gist--1.0.sql on the inet-gist patch. > A separate concern: we still have not come to a decision about operators for > inet here. I do not like the fact that the operators differ between ranges > and inet. But I feel this might be out of scope for this patch. I attached a separate optional patch to rename the inclusion operators. It leaves the old names, changes the GiST supported operators and the operator names in the documentation. It would be better to apply it with the GiST support, in my opinion. > I am not convinced of your approach to calculating the selectivity from the > histogram. The thing I have the problem with is the clever trickery involved > with how you handle different operator types. I prefer the clearer code of > the range types with how calc_hist_selectivity_scalar is used. Is there any > reason for why that approach would not work here or result in worse code? Currently we do not have histogram of the lower and upper bounds as the range types. Current histogram can be used nicely as the lower bound, but not the upper bound because the comparison is first on the common bits of the network part, then on the length of the network part. For example, 10.0/16 is defined as greater than 10/8. Using the histogram as the lower bounds of the networks is not enough to calculate selectivity for any of these operators. Using it also as the upper bounds is still not enough for the inclusion operators. The lengths of the network parts should taken into consideration in a way and it is what this patch does. Using separate histograms for the lower bounds, the upper bounds and the lengths of the network parts can solve all of these problems, but it is a lot of work. > I have attached a patch where I improved the language of your comment > describing the workings of the selectivity estimation. Could you please > check it so I did not change the meaning of any of it? Thank you. I only added the "for" to the sentence "This ratio can be big enough to not disregard for addresses with small masklens." I merged the changes to this version of the patch. > I see from the tests that you still are missing selectivity functions for > operators, what is your plan for this? This was because the join selectivity estimation functions. I set the geo_selfuncs for the missing ones. All tests pass with them. I want to develop the join selectivity function too, but not for this commit fest.
Attachment
pgsql-hackers by date: