Re: Fulltext search configuration - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Fulltext search configuration |
Date | |
Msg-id | Pine.LNX.4.64.0902022108080.4158@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Fulltext search configuration (Mohamed <mohamed5432154321@gmail.com>) |
Responses |
Re: Fulltext search configuration
|
List | pgsql-general |
Mohamed, please, try to read docs and think a bit first. On Mon, 2 Feb 2009, Mohamed wrote: > On Mon, Feb 2, 2009 at 4:34 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: > >> On Mon, 2 Feb 2009, Oleg Bartunov wrote: >> >> On Mon, 2 Feb 2009, Mohamed wrote: >>> >>> Hehe, ok.. >>>> I don't know either but I took some lines from Al-Jazeera : >>>> http://aljazeera.net/portal >>>> >>>> just made the change you said and created it successfully and tried this >>>> : >>>> >>>> select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ???? >>>> ????????? >>>> ?????') >>>> >>>> but I got nothing... :( You did wrong ! ts_lexize expects word, not phrase ! >>>> >>> >>> Mohamed, what did you expect from ts_lexize ? Please, provide us valuable >>> information, else we can't help you. >>> >> > What I expected was something to be returned. After all they are valid words > taken from an article. (perhaps you don't see the words, but only ???... ) > Am I wrong to expect something ? Should I go for setting up the > configuration completly first? You should definitely read documentation http://www.postgresql.org/docs/8.3/static/textsearch-debugging.html#TEXTSEARCH-DICTIONARY-TESTING Period. > > SELECT ts_lexize('norwegian_ispell', > 'overbuljongterningpakkmesterassistent'); > {over,buljong,terning,pakk,mester,assistent} > > Check out this article if you need a sample. > http://www.aljazeera.net/NR/exeres/103CFC06-0195-47FD-A29F-2C84B5A15DD0.htm > > > > > >> >>> >>>> Is there a way of making sure that words not recognized also gets >>>> indexed/searched for ? (Not that I think this is the problem) >>>> >>> >>> yes >>> >> >> Read >> http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html >> "A text search configuration binds a parser together with a set of >> dictionaries to process the parser's output tokens. For each token type that >> the parser can return, a separate list of dictionaries is specified by the >> configuration. When a token of that type is found by the parser, each >> dictionary in the list is consulted in turn, until some dictionary >> recognizes it as a known word. If it is identified as a stop word, or if no >> dictionary recognizes the token, it will be discarded and not indexed or >> searched for. The general rule for configuring a list of dictionaries is to >> place first the most narrow, most specific dictionary, then the more general >> dictionaries, >> finishing with a very general dictionary, like a Snowball stemmer or >> simple, which recognizes everything." >> > > > Ok, but I don't have Thesaurus or a Snowball to fall back on. So when words > that are words but for some reason is not recognized "it will be discarded > and not indexed or searched for." which I consider a problem since I don't > trust my configuration to cover everything. > > Is this not a valid concern? > > >> >> quick example: >> >> CREATE TEXT SEARCH CONFIGURATION arabic ( >> COPY = english >> ); >> >> =# \dF+ arabic >> Text search configuration "public.arabic" >> Parser: "pg_catalog.default" >> Token | Dictionaries >> -----------------+-------------- >> asciihword | english_stem >> asciiword | english_stem >> email | simple >> file | simple >> float | simple >> host | simple >> hword | english_stem >> hword_asciipart | english_stem >> hword_numpart | simple >> hword_part | english_stem >> int | simple >> numhword | simple >> numword | simple >> sfloat | simple >> uint | simple >> url | simple >> url_path | simple >> version | simple >> word | english_stem >> >> Then you can alter this configuration. > > > > Yes, I figured thats the next step but thought I should get the lexize to > work first? What do you think? > > Just a thought, say I have this : > > ALTER TEXT SEARCH CONFIGURATION pg > ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, > word, hword, hword_part > WITH pga_ardict, ar_ispell, ar_stem; > > is it possible to keep adding dictionaries, to get both arabic and english > matches on the same column (arabic people tend to mix), like this : > > ALTER TEXT SEARCH CONFIGURATION pg > ALTER MAPPING FOR asciiword, asciihword, hword_asciipart, > word, hword, hword_part > WITH pga_ardict, ar_ispell, ar_stem, pg_english_dict, english_ispell, > english_stem; > > > Will something like that work ? > > > / Moe > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: