Thread: How to create dictionaries for tsearch
Hi all: I have read the documentation for the tsearch module, specifically the part about creating custom dictionaries for different languages using the "makedict.pl" script. What I don't understand, though, is where do I get the lists of stopwords and endings for each language. Do I have to write them myself? Is there some reference website where I can get that kind of information for a given language? Paulo Jan. DDnet.
On Thu, 3 Oct 2002, Paulo Jan wrote: > Hi all: > > I have read the documentation for the tsearch module, specifically the > part about creating custom dictionaries for different languages using > the "makedict.pl" script. What I don't understand, though, is where do I > get the lists of stopwords and endings for each language. Do I have to which languages ? > write them myself? Is there some reference website where I can get that > kind of information for a given language? > Google is your friend. I'd recommend to use OpenFTS (openfts.sourceforge.net) for full text searching which has support for ispell dictionaries and snowball stemmers, which have support for spanish. > > > > Paulo Jan. > DDnet. > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > > On Thu, 3 Oct 2002, Paulo Jan wrote: > > > Hi all: > > > > I have read the documentation for the tsearch module, specifically the > > part about creating custom dictionaries for different languages using > > the "makedict.pl" script. What I don't understand, though, is where do I > > get the lists of stopwords and endings for each language. Do I have to > > which languages ? > Spanish. > > write them myself? Is there some reference website where I can get that > > kind of information for a given language? > > > Google is your friend. > Oh, okay. And not only that, but now that I've paid more attention to the OpenFTS site, I have seen the link to the snowball stemmers too, including the spanish one. However... > I'd recommend to use OpenFTS (openfts.sourceforge.net) for full text searching > which has support for ispell dictionaries and snowball stemmers, > which have support for spanish. > Can I use OpenFTS to index and search databases que are not "pure text", but only have some text fields? From what I see, I have the impression that OpenFTS is designed to store and search text documents (newspaper articles, papers, etc.) using a Postgres backend, while in my case, I'm storing information (photographs and data associated to them) that has some text fields that need to be indexed and other "normal" fields (numeric, etc.) that don't need to be, and I need to search by both of them; in other words, I need to do something like "SELECT * FROM photos WHERE captionidx @@ 'angelina' AND resolution='high' AND photodate > '01-01-2002'". Can I use OpenFTS for this kind of mixed searches? From what I have read, I have the impression that it's a bit cumbersome to do so. Alternatively, can you use the snowball stemmer only with tsearch, without installing OpenFTS? Paulo Jan. DDnet.
On Thu, 3 Oct 2002, Paulo Jan wrote: > > Can I use OpenFTS to index and search databases que are not "pure > text", but only have some text fields? From what I see, I have the > impression that OpenFTS is designed to store and search text documents > (newspaper articles, papers, etc.) using a Postgres backend, while in my > case, I'm storing information (photographs and data associated to them) > that has some text fields that need to be indexed and other "normal" > fields (numeric, etc.) that don't need to be, and I need to search by > both of them; in other words, I need to do something like "SELECT * FROM > photos WHERE captionidx @@ 'angelina' AND resolution='high' AND > photodate > '01-01-2002'". Can I use OpenFTS for this kind of mixed > searches? From what I have read, I have the impression that it's a bit > cumbersome to do so. OpenFTS is an *engine* and was specially designed to be embedded into application. It has several methods which could be used to construct queries like you need ! For example, get_sql from perldoc Search::OpenFTS get_sql( \@ARRAY_WORD ); get_sql( $STRING ); get_sql( \$STRING ); get_sql( *, %opt ); %opt - as in the constructor (see above), plus a key dict_opt = > {}, transmitted to dictionaries Returns parts of SQL: ($out, $condition, $order) Here is how they can be combined in an SQL statement: SELECT $opt{txttid}$out FROM table WHERE $condition $order; As a bonus you'll get relevance ranking, dictionaries support and more control. > Alternatively, can you use the snowball stemmer only with tsearch, > without installing OpenFTS? > Not for the moment. It's easy to implement but we're very busy. > > > Paulo Jan. > DDnet. > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83