Re: Updated tsearch documentation - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Updated tsearch documentation |
Date | |
Msg-id | 200706202024.l5KKOBK11446@momjian.us Whole thread Raw |
In response to | Re: Updated tsearch documentation (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: Updated tsearch documentation
|
List | pgsql-hackers |
Oleg Bartunov wrote: > On Sun, 17 Jun 2007, Bruce Momjian wrote: > > > I have completed my first pass over the tsearch documentation: > > > > http://momjian.us/expire/fulltext/HTML/sql.html > > > > They are from section 14 and following. > > > > I have come up with a number of questions that I placed in SGML comments > > in these files: > > > > http://momjian.us/expire/fulltext/SGML/ > > > > Teodor/Oleg, let me know when you want to go over my questions. > > Below are my answers (marked as ) OK. > > Comments to editorial work of Bruce Momjian. > > fulltext-intro.sgml: > > it is useful to have a predefined list of lexemes. > >Bruce, here should be list of types of lexemes ! Agreed. Are the list of lexemes parser-specific? > </para></listitem> > > <!-- > SEEMS UNNECESSARY > It useless to attempt normalize <type>email address</type> using > morphological dictionary of russian language, but looks reasonable to pick > out <type>domain name</type> and be able to search for <type>domain > name</type>. > --> > > I dont' understand where did you get this para :) Uh, it was in the SGML. I have removed it. > fulltext-opfunc.sgml: > > All of the following functions that accept a configuration argument can > use either an integer <!-- why an integer --> or a textual configuration > name to select a configuration. > > originally it was integer id, probably better use <type>oid</type> Uh, my question is why are you allowing specification as an integer/oid when the name works just fine. I don't see the value in allowing numbers here. > This returns the query used for searching an index. It can be used to test > for an empty query. The <command>SELECT</> below returns <literal>'T'</>, > <!-- lowercase? --> which corresponds to an empty query since GIN indexes > do not support negate queries (a full index scan is inefficient): > > > capital case. This looks cumbersome, probably querytree() should > > just return NULL. Agreed. > The integer option controls several behaviors which is done using bit-wise > fields and <literal>|</literal> (for example, <literal>2|4</literal>): > <!-- why so complex? --> > > > to avoid 2 arguments But I don't see why you would want to set two of those values --- they seem mutually exclusive, e.g. 1 divides the rank by the 1 + logarithm of the document length2 divides the rank by the length itself I assume you do either one, not both. > its <replaceable>id</replaceable> or <replaceable>ts_name</replaceable>; <!-- n > if none is specified that the current configuration is used. > > > I don't understand this question Same issue as above --- why allow a number here when the name works just fine. We don't allow tables to be specified by number, so why configurations? > <para> > <!-- why? --> > Note that the cascade dropping of the <function>headline</function> function > cause dropping of the <literal>parser</literal> used in fulltext configuration > <replaceable>tsname</replaceable>. > </para> > > > hmm, probably it should be reversed - cascade dropping of the parser cause > > dropping of the headline function. Agreed. > > In example below, <literal>fulltext_idx</literal> is > a GIN index:<!-- why isn't this automatic --> > > > It's explained above. The problem is that current index api doesn't allow > > to say if search was lossy or exact, so to preserve performance of > > GIN index we had to introduce @@@ operator, which is the same as @@, but > > lossy. Well, then we have to fix the API. Telling users to use a different operator based on what index is defined is just bad style. > nly the <token>lword</token> lexeme, then a <acronym>TZ</acronym> > definition like ' one 1:11' will not work since lexeme type > <token>digit</token> is not assigned to the <acronym>TZ</acronym>. > <!-- what do these numbers mean? --> > </para> OK, I changed it to be clearer. > > nothing special, just numbers for example. > > <function>ts_debug</> displays information about every token of > <replaceable class="PARAMETER">document</replaceable> as produced by the > parser and processed by the configured dictionaries using the configuration > specified by <replaceable class="PARAMETER">cfgname</replaceable> or > <replaceable class="PARAMETER">oid</replaceable>. <!-- no need for oid > > > don't understand this comment. ts_debug accepts cfgname or its oid Again, no need for oid. -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: