Re: knngist - 0.8 - Mailing list pgsql-hackers
From | Teodor Sigaev |
---|---|
Subject | Re: knngist - 0.8 |
Date | |
Msg-id | 4C865216.1090305@sigaev.ru Whole thread Raw |
In response to | Re: knngist - 0.8 (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: knngist - 0.8
|
List | pgsql-hackers |
>> http://www.sigaev.ru/misc/builtin_knngist_core-0.8.gz >> http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.gz >> http://www.sigaev.ru/misc/builtin_knngist_proc-0.8.gz >> http://www.sigaev.ru/misc/builtin_knngist_contrib_pg_trgm-0.8.gz >> http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.gz New version, synced with CVS HEAD http://www.sigaev.ru/misc/builtin_knngist_itself-0.8.2.gz http://www.sigaev.ru/misc/builtin_knngist_contrib_btree_gist-0.8.1.gz > AFAICS, these patches include no documentation. That's pretty much a > fatal flaw for a feature of this magnitude. At an absolute minimum, > you need to update the system catalog documentation and the > documentation on CREATE / ALTER OPERATOR CLASS. There might be some > other places that need to be touched, also. Oleg promised to do that > + if (opform->oprresult == BOOLOID) > + ereport(ERROR, > + > (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), > + errmsg("index ordering > operators must not return boolean"))); > > My first thought was that this code was there to prevent people from > doing the wrong thing by accident. But I have a niggling feeling that > you're actually relying on this for the correctness of the system. I > hope I'm wrong, because I don't think that would be a very good idea. This play is around do we really want to have support of boolean-distance in GiST? I think no, because it's a strange idea to measure distance in true/false measurement units. I can't imagine such real-life distance definition and never heard about that. Next, pg_amop_opr_fam_index requires uniqueness of operation in operation family and a lot of places in planner believes in that. Suppose, changing that requires a lot of work which has the single aim to support boolean distance in ORDER BY clause. > The GIST code code use more comments; and perhaps the names of some of > the functions and structures could be chosen to be more descriptive. > I think that what used to be called GISTSearchStack has apparently > been replaced with DataPointer; it's not obvious to me that it's good > to change the name, but if it is I don't think DataPointer is a good GISTSearchStack is replaced by RBTree (GISTScanOpaqueData->stack), tree's nodes contain a StackElem struct which represents list of pointers at the same distance. Each pointer could be a pointer to the inner index's page or to the heap's tuple and this struct is a DataPointer. Note, list of DataPointer in StackElem struct is organized by non-obvious way: we keep pointer to the head of list and pointer to the middle of list. New pointer-to-heap is inserted in the beginning of list, pointers-to-index-page - in the middle. That's done because we would like to: 1) pop pointers-to-heap as fast as possible, before any pointers-to-index-page 2) pop pointers-to-index-page to deep page (which is closer to leaf pages) first. That's good for KNN performance and emulatesclassical first-depth search in ordinary search. > choice. gistindex_keytest has been replaced (sort of) by> processIndexTuple, which again seems more generic than whatit> replaced. Renamed, comments are improved > Minor nit: the word "shoould" is mis-spelled. fixed BTW, now consistentFn is able to "manage" tree traversal - even for for ordinary search, GiST will choose child page with minimal distance. -- Teodor Sigaev E-mail: teodor@sigaev.ru WWW: http://www.sigaev.ru/
pgsql-hackers by date: