Re: GiST support for inet datatypes - Mailing list pgsql-hackers
From | Andreas Karlsson |
---|---|
Subject | Re: GiST support for inet datatypes |
Date | |
Msg-id | 52EB0331.5090500@proxel.se Whole thread Raw |
In response to | Re: GiST support for inet datatypes (Emre Hasegeli <emre@hasegeli.com>) |
Responses |
Re: GiST support for inet datatypes
|
List | pgsql-hackers |
On 01/23/2014 11:22 PM, Emre Hasegeli wrote: > inet-gist > --------- > I realized that create extension btree_gist fails with the patch. > > ERROR: could not make operator class "gist_inet_ops" be default for type inet > DETAIL: Operator class "inet_ops" already is the default. > > I think making the new operator class default makes more sense. Name conflict > can be a bigger problem. Should I rename the new operator class? I agree that the new operator class should be the default one, it is more useful and also not buggy. No idea about the naming. >> The only thing is that I think your index would do poorly on the case where >> all values share a prefix before the netmask but have different values after >> the netmask, since gist_union and gist_penalty does not care about the bits >> after the netmask. Am I correct? Have you thought anything about this? > > Yes, I have tried to construct the index with ip_maxbits() instead of ip_bits() > but I could not make it work with the cidr type. It can be done by adding > operator as a parameter for union, penalty and picksplit functions on the > GiST framework. I am not sure it worths the trouble. It would only help > the basic comparison operators and it would slow down other operators because > network length comparison before network address would not be possible for > the intermediate nodes. I mentioned this behavior of the operator class on > the paragraph added to the documentation. I think this is fine since it does not seem like a very useful case to support to me. Would be worth doing only if it is easy to do. A separate concern: we still have not come to a decision about operators for inet here. I do not like the fact that the operators differ between ranges and inet. But I feel this might be out of scope for this patch. Does any third party want to chime in with an opinion? Current inet/cidr << is contained within <<= is contained within or equals >> contains >>= contains or equals Range types @> contains range <@ range is contained by && overlap (have points in common) << strictly left of >> strictly right of >> inet-selfuncs >> ------------- >> >> Here I have to honestly admit I am not sure if I can tell if your >> selectivity function is correct. Your explanation sounds convincing and the >> general idea of looking at the histogram is correct. I am just have no idea >> if the details are right. > > I tried to improve it in this version. I hope it is more readable now. I used > the word inclusion instead of overlap, made some changes on the algorithm > to make it more suitable to the other operators. > > Using the histogram for inclusion which is originally for basic comparison, > is definitely not correct. It is an empirical approach. I have tried several > versions of it, tested them with different data sets and thought it is better > than nothing. Thanks for the updates. The code in this version is cleaner and easier to follow. I am not convinced of your approach to calculating the selectivity from the histogram. The thing I have the problem with is the clever trickery involved with how you handle different operator types. I prefer the clearer code of the range types with how calc_hist_selectivity_scalar is used. Is there any reason for why that approach would not work here or result in worse code? I have attached a patch where I improved the language of your comment describing the workings of the selectivity estimation. Could you please check it so I did not change the meaning of any of it? I see from the tests that you still are missing selectivity functions for operators, what is your plan for this? -- Andreas Karlsson
Attachment
pgsql-hackers by date: