Home > mailing lists

Re: Fulltext search configuration - Mailing list pgsql-general

From	Oleg Bartunov
Subject	Re: Fulltext search configuration
Date	February 2, 2009 14:10:38
Msg-id	Pine.LNX.4.64.0902022108080.4158@sn.sai.msu.ru Whole thread Raw
In response to	Re: Fulltext search configuration (Mohamed <mohamed5432154321@gmail.com>)
Responses	Re: Fulltext search configuration
List	pgsql-general

Tree view

Mohamed,

please, try to read docs and think a bit first.

On Mon, 2 Feb 2009, Mohamed wrote:

> On Mon, Feb 2, 2009 at 4:34 PM, Oleg Bartunov <oleg@sai.msu.su> wrote:
>
>> On Mon, 2 Feb 2009, Oleg Bartunov wrote:
>>
>>  On Mon, 2 Feb 2009, Mohamed wrote:
>>>
>>>  Hehe, ok..
>>>> I don't know either but I took some lines from Al-Jazeera :
>>>> http://aljazeera.net/portal
>>>>
>>>> just made the change you said and created it successfully and tried this
>>>> :
>>>>
>>>> select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ????
>>>> ?????????
>>>> ?????')
>>>>
>>>> but I got nothing... :(


You did wrong ! ts_lexize expects word, not phrase !

>>>>
>>>
>>> Mohamed, what did you expect from ts_lexize ?  Please, provide us valuable
>>> information, else we can't help you.
>>>
>>
> What I expected was something to be returned. After all they are valid words
> taken from an article. (perhaps you don't see the words, but only ???... )
> Am I wrong to expect something ? Should I go for setting up the
> configuration completly first?

You should definitely read documentation
http://www.postgresql.org/docs/8.3/static/textsearch-debugging.html#TEXTSEARCH-DICTIONARY-TESTING
Period.

>
> SELECT ts_lexize('norwegian_ispell',
> 'overbuljongterningpakkmesterassistent');
> {over,buljong,terning,pakk,mester,assistent}
>
> Check out this article if you need a sample.
> http://www.aljazeera.net/NR/exeres/103CFC06-0195-47FD-A29F-2C84B5A15DD0.htm
>
>
>
>
>
>>
>>>
>>>> Is there a way of making sure that words not recognized also gets
>>>> indexed/searched for ? (Not that I think this is the problem)
>>>>
>>>
>>> yes
>>>
>>
>> Read
>> http://www.postgresql.org/docs/8.3/static/textsearch-dictionaries.html
>> "A text search configuration binds a parser together with a set of
>> dictionaries to process the parser's output tokens. For each token type that
>> the parser can return, a separate list of dictionaries is specified by the
>> configuration. When a token of that type is found by the parser, each
>> dictionary in the list is consulted in turn, until some dictionary
>> recognizes it as a known word. If it is identified as a stop word, or if no
>> dictionary recognizes the token, it will be discarded and not indexed or
>> searched for. The general rule for configuring a list of dictionaries is to
>> place first the most narrow, most specific dictionary, then the more general
>> dictionaries,
>> finishing with a very general dictionary, like a Snowball stemmer or
>> simple, which recognizes everything."
>>
>
>
> Ok, but I don't have Thesaurus or a Snowball to fall back on. So when words
> that are words but for some reason is not recognized "it will be discarded
> and not indexed or searched for." which I consider a problem since I don't
> trust my configuration to cover everything.
>
> Is this not a valid concern?
>
>
>>
>> quick example:
>>
>> CREATE TEXT SEARCH CONFIGURATION arabic (
>>    COPY = english
>> );
>>
>> =# \dF+ arabic
>> Text search configuration "public.arabic"
>> Parser: "pg_catalog.default"
>>      Token      | Dictionaries
>> -----------------+--------------
>>  asciihword      | english_stem
>>  asciiword       | english_stem
>>  email           | simple
>>  file            | simple
>>  float           | simple
>>  host            | simple
>>  hword           | english_stem
>>  hword_asciipart | english_stem
>>  hword_numpart   | simple
>>  hword_part      | english_stem
>>  int             | simple
>>  numhword        | simple
>>  numword         | simple
>>  sfloat          | simple
>>  uint            | simple
>>  url             | simple
>>  url_path        | simple
>>  version         | simple
>>  word            | english_stem
>>
>> Then you can alter this configuration.
>
>
>
> Yes, I figured thats the next step but thought I should get the lexize to
> work first? What do you think?
>
> Just a thought, say I have this :
>
> ALTER TEXT SEARCH CONFIGURATION pg
>    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>                      word, hword, hword_part
>    WITH pga_ardict, ar_ispell, ar_stem;
>
> is it possible to keep adding dictionaries, to get both arabic and english
> matches on the same column (arabic people tend to mix), like this :
>
> ALTER TEXT SEARCH CONFIGURATION pg
>    ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,
>                      word, hword, hword_part
>    WITH pga_ardict, ar_ispell, ar_stem, pg_english_dict, english_ispell,
> english_stem;
>
>
> Will something like that work ?
>
>
> / Moe
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

pgsql-general by date:

From: Bruce Momjian
Date: 02 February 2009, 13:49:34
Subject: Re: Pet Peeves?

From: wstrzalka
Date: 02 February 2009, 14:14:03
Subject: Re: Pet Peeves?

Re: Fulltext search configuration - Mailing list pgsql-general

Previous

Next