Re: Fulltext search configuration - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Fulltext search configuration |
Date | |
Msg-id | Pine.LNX.4.64.0902021813530.4158@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Fulltext search configuration (Mohamed <mohamed5432154321@gmail.com>) |
Responses |
Re: Fulltext search configuration
|
List | pgsql-general |
On Mon, 2 Feb 2009, Mohamed wrote: > Hehe, ok.. > I don't know either but I took some lines from Al-Jazeera : > http://aljazeera.net/portal > > just made the change you said and created it successfully and tried this : > > select ts_lexize('ayaspell', '?????? ??????? ????? ????? ?? ???? ????????? > ?????') > > but I got nothing... :( Mohamed, what did you expect from ts_lexize ? Please, provide us valuable information, else we can't help you. > > Is there a way of making sure that words not recognized also gets > indexed/searched for ? (Not that I think this is the problem) yes > > / Moe > > > > On Mon, Feb 2, 2009 at 3:50 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: > >> Mohamed, >> >> comment line in ar.affix >> #FLAG long >> and creation of ispell dictionary will work. This is temp, solution. Teodor >> is working on fixing affix autorecognizing. >> >> I can't say anything about testing, since somebody should provide >> first test case. I don't know how to type arabic :) >> >> >> Oleg >> >> On Mon, 2 Feb 2009, Mohamed wrote: >> >> Oleg, like I mentioned earlier. I have a different .affix file that I got >>> from Andrew with the stop file and I get no errors creating the dictionary >>> using that one but I get nothing out from ts_lexize. >>> The size on that one is : 406,219 bytes >>> And the size on the hunspell one (first) : 406,229 bytes >>> >>> Little to close, don't you think ? >>> >>> It might be that the arabic hunspell (ayaspell) affix file is damaged on >>> some lines and I got the fixed one from Andrew. >>> >>> Just wanted to let you know. >>> >>> / Moe >>> >>> >>> >>> On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com> >>> wrote: >>> >>> Ok, thank you Oleg. >>>> I have another dictionary package which is a conversion to hunspell >>>> aswell: >>>> >>>> >>>> >>>> http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29 >>>> (Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08 >>>> >>>> And running that gives me this error : (again the affix file) >>>> >>>> ERROR: wrong affix file format for flag >>>> CONTEXT: line 560 of configuration file "C:/Program >>>> Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX >>>> 1013 >>>> Y 6 >>>> " >>>> >>>> / Moe >>>> >>>> >>>> >>>> On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: >>>> >>>> Mohamed, >>>>> >>>>> We are looking on the problem. >>>>> >>>>> Oleg >>>>> >>>>> On Mon, 2 Feb 2009, Mohamed wrote: >>>>> >>>>> No, I don't. But the ts_lexize don't return anything so I figured there >>>>> >>>>>> must >>>>>> be an error somehow. >>>>>> I think we are using the same dictionary + that I am using the >>>>>> stopwords >>>>>> file and a different affix file, because using the hunspell (ayaspell) >>>>>> .aff >>>>>> gives me this error : >>>>>> >>>>>> ERROR: wrong affix file format for flag >>>>>> CONTEXT: line 42 of configuration file "C:/Program >>>>>> Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40 >>>>>> >>>>>> / Moe >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello < >>>>>> daniel.chiaramello@golog.net> wrote: >>>>>> >>>>>> Hi Mohamed. >>>>>> >>>>>>> >>>>>>> I don't know where you get the dictionary - I unsuccessfully tried the >>>>>>> OpenOffice one by myself (the Ayaspell one), and I had no arabic >>>>>>> stopwords >>>>>>> file. >>>>>>> >>>>>>> Renaming the file is supposed to be enough (I did it successfully for >>>>>>> Thailandese dictionary) - the ".aff'" file becoming the ".affix" one. >>>>>>> When I tried to create the dictionary: >>>>>>> >>>>>>> CREATE TEXT SEARCH DICTIONARY ar_ispell ( >>>>>>> TEMPLATE = ispell, >>>>>>> DictFile = ar_utf8, >>>>>>> AffFile = ar_utf8, >>>>>>> StopWords = english >>>>>>> ); >>>>>>> >>>>>>> I had an error: >>>>>>> >>>>>>> ERREUR: mauvais format de fichier affixe pour le drapeau >>>>>>> CONTEXTE : ligne 42 du fichier de configuration ? >>>>>>> /usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y >>>>>>> 40 >>>>>>> >>>>>>> (which means Bad format of Affix file for flag, line 42 of >>>>>>> configuration >>>>>>> file) >>>>>>> >>>>>>> Do you have an error when creating your dictionary? >>>>>>> >>>>>>> Daniel >>>>>>> >>>>>>> Mohamed a ?crit : >>>>>>> >>>>>>> >>>>>>> I have ran into some problems here. >>>>>>> I am trying to implement arabic fulltext search on three columns. >>>>>>> >>>>>>> To create a dictionary I have a hunspell dictionary and and arabic >>>>>>> stop >>>>>>> file. >>>>>>> >>>>>>> CREATE TEXT SEARCH DICTIONARY hunspell_dic ( >>>>>>> TEMPLATE = ispell, >>>>>>> DictFile = hunarabic, >>>>>>> AffFile = hunarabic, >>>>>>> StopWords = arabic >>>>>>> ); >>>>>>> >>>>>>> >>>>>>> 1) The problem is that the hunspell contains a .dic and a .aff file >>>>>>> but >>>>>>> the configuration requeries a .dict and .affix file. I have tried to >>>>>>> change >>>>>>> the endings but with no success. >>>>>>> >>>>>>> 2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing >>>>>>> >>>>>>> 3) How can I convert my .dic and .aff to valid .dict and .affix ? >>>>>>> >>>>>>> 4) I have read that when using dictionaries, if a word is not >>>>>>> recognized >>>>>>> by >>>>>>> any dictionary it will not be indexed. I find that troublesome. I >>>>>>> would >>>>>>> like >>>>>>> everything but the stop words to be indexed. I guess this might be a >>>>>>> step >>>>>>> that I am not ready for yet, but just wanted to put it out there. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Also I would like to know how the process of the fulltext search >>>>>>> implementation looks like, from config to search. >>>>>>> >>>>>>> Create dictionary, then a text configuration, add dic to >>>>>>> configuration, >>>>>>> index columns with gin or gist ... >>>>>>> >>>>>>> How does a search look like? Does it match against the gin/gist >>>>>>> index. >>>>>>> Have that index been built up using the dictionary/configuration, or >>>>>>> is >>>>>>> the >>>>>>> dictionary only used on search frases? >>>>>>> >>>>>>> / Moe >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Regards, >>>>> Oleg >>>>> _____________________________________________________________ >>>>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >>>>> Sternberg Astronomical Institute, Moscow University, Russia >>>>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>>>> phone: +007(495)939-16-83, +007(495)939-23-83 >>>>> >>>>> >>>> >>>> >>> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: