Re: Updated tsearch documentation - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: Updated tsearch documentation |
Date | |
Msg-id | Pine.LNX.4.64.0706210029410.1881@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Updated tsearch documentation (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Updated tsearch documentation
|
List | pgsql-hackers |
On Wed, 20 Jun 2007, Bruce Momjian wrote: > Oleg Bartunov wrote: >> On Sun, 17 Jun 2007, Bruce Momjian wrote: >> >>> I have completed my first pass over the tsearch documentation: >>> >>> http://momjian.us/expire/fulltext/HTML/sql.html >>> >>> They are from section 14 and following. >>> >>> I have come up with a number of questions that I placed in SGML comments >>> in these files: >>> >>> http://momjian.us/expire/fulltext/SGML/ >>> >>> Teodor/Oleg, let me know when you want to go over my questions. >> >> Below are my answers (marked as ) > > OK. >> >> Comments to editorial work of Bruce Momjian. >> >> fulltext-intro.sgml: >> >> it is useful to have a predefined list of lexemes. >> >> Bruce, here should be list of types of lexemes ! > > Agreed. Are the list of lexemes parser-specific? > yes, it it parser which defines types of lexemes. >> fulltext-opfunc.sgml: >> >> All of the following functions that accept a configuration argument can >> use either an integer <!-- why an integer --> or a textual configuration >> name to select a configuration. >> >> originally it was integer id, probably better use <type>oid</type> > > Uh, my question is why are you allowing specification as an integer/oid > when the name works just fine. I don't see the value in allowing > numbers here. for compatibility reason. Hmm, indeed, i don't recall where oid's could be important. > >> This returns the query used for searching an index. It can be used to test >> for an empty query. The <command>SELECT</> below returns <literal>'T'</>, >> <!-- lowercase? --> which corresponds to an empty query since GIN indexes >> do not support negate queries (a full index scan is inefficient): >> >>> capital case. This looks cumbersome, probably querytree() should >>> just return NULL. > > Agreed. > >> The integer option controls several behaviors which is done using bit-wise >> fields and <literal>|</literal> (for example, <literal>2|4</literal>): >> <!-- why so complex? --> >> >>> to avoid 2 arguments > > But I don't see why you would want to set two of those values --- they > seem mutually exclusive, e.g. > > 1 divides the rank by the 1 + logarithm of the document length > 2 divides the rank by the length itself > > I assume you do either one, not both. but what's about others variants ? What I missed is the definition of extent. From http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking Extent is a shortest and non-nested sequence of words, which satisfy a query. > >> its <replaceable>id</replaceable> or <replaceable>ts_name</replaceable>; <!-- n >> if none is specified that the current configuration is used. >> >>> I don't understand this question > > Same issue as above --- why allow a number here when the name works just > fine. We don't allow tables to be specified by number, so why > configurations? > >> <para> >> <!-- why? --> >> Note that the cascade dropping of the <function>headline</function> function >> cause dropping of the <literal>parser</literal> used in fulltext configuration >> <replaceable>tsname</replaceable>. >> </para> >> >>> hmm, probably it should be reversed - cascade dropping of the parser cause >>> dropping of the headline function. > > Agreed. > >> >> In example below, <literal>fulltext_idx</literal> is >> a GIN index:<!-- why isn't this automatic --> >> >>> It's explained above. The problem is that current index api doesn't allow >>> to say if search was lossy or exact, so to preserve performance of >>> GIN index we had to introduce @@@ operator, which is the same as @@, but >>> lossy. > > Well, then we have to fix the API. Telling users to use a different > operator based on what index is defined is just bad style. This was raised by Heikki and we discussed it a bit in Ottawa, but it's unclear if it's doable for 8.3. @@@ operator is in rare use, so we could say it will be improved in future versions. > >> nly the <token>lword</token> lexeme, then a <acronym>TZ</acronym> >> definition like ' one 1:11' will not work since lexeme type >> <token>digit</token> is not assigned to the <acronym>TZ</acronym>. >> <!-- what do these numbers mean? --> >> </para> > > OK, I changed it to be clearer. > >>> nothing special, just numbers for example. >> >> <function>ts_debug</> displays information about every token of >> <replaceable class="PARAMETER">document</replaceable> as produced by the >> parser and processed by the configured dictionaries using the configuration >> specified by <replaceable class="PARAMETER">cfgname</replaceable> or >> <replaceable class="PARAMETER">oid</replaceable>. <!-- no need for oid >> >>> don't understand this comment. ts_debug accepts cfgname or its oid > > Again, no need for oid. We need to decide if we need oids as user-visible argument. I don't see any value, probably Teodor think other way. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: