Re: Fulltext search configuration - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: Fulltext search configuration |
Date | |
Msg-id | Pine.LNX.4.64.0902021746080.4158@sn.sai.msu.ru Whole thread Raw |
In response to | Re: Fulltext search configuration (Mohamed <mohamed5432154321@gmail.com>) |
Responses |
Re: Fulltext search configuration
|
List | pgsql-general |
Mohamed, comment line in ar.affix #FLAG long and creation of ispell dictionary will work. This is temp, solution. Teodor is working on fixing affix autorecognizing. I can't say anything about testing, since somebody should provide first test case. I don't know how to type arabic :) Oleg On Mon, 2 Feb 2009, Mohamed wrote: > Oleg, like I mentioned earlier. I have a different .affix file that I got > from Andrew with the stop file and I get no errors creating the dictionary > using that one but I get nothing out from ts_lexize. > The size on that one is : 406,219 bytes > And the size on the hunspell one (first) : 406,229 bytes > > Little to close, don't you think ? > > It might be that the arabic hunspell (ayaspell) affix file is damaged on > some lines and I got the fixed one from Andrew. > > Just wanted to let you know. > > / Moe > > > > On Mon, Feb 2, 2009 at 3:25 PM, Mohamed <mohamed5432154321@gmail.com> wrote: > >> Ok, thank you Oleg. >> I have another dictionary package which is a conversion to hunspell >> aswell: >> >> >> http://wiki.services.openoffice.org/wiki/Dictionaries#Arabic_.28North_Africa_and_Middle_East.29 >> (Conversion of Buckwalter's Arabic morphological analyser) 2006-02-08 >> >> And running that gives me this error : (again the affix file) >> >> ERROR: wrong affix file format for flag >> CONTEXT: line 560 of configuration file "C:/Program >> Files/PostgreSQL/8.3/share/tsearch_data/arabic_utf8_alias.affix": "PFX 1013 >> Y 6 >> " >> >> / Moe >> >> >> >> On Mon, Feb 2, 2009 at 2:41 PM, Oleg Bartunov <oleg@sai.msu.su> wrote: >> >>> Mohamed, >>> >>> We are looking on the problem. >>> >>> Oleg >>> >>> On Mon, 2 Feb 2009, Mohamed wrote: >>> >>> No, I don't. But the ts_lexize don't return anything so I figured there >>>> must >>>> be an error somehow. >>>> I think we are using the same dictionary + that I am using the stopwords >>>> file and a different affix file, because using the hunspell (ayaspell) >>>> .aff >>>> gives me this error : >>>> >>>> ERROR: wrong affix file format for flag >>>> CONTEXT: line 42 of configuration file "C:/Program >>>> Files/PostgreSQL/8.3/share/tsearch_data/hunarabic.affix": "PFX Aa Y 40 >>>> >>>> / Moe >>>> >>>> >>>> >>>> >>>> On Mon, Feb 2, 2009 at 12:13 PM, Daniel Chiaramello < >>>> daniel.chiaramello@golog.net> wrote: >>>> >>>> Hi Mohamed. >>>>> >>>>> I don't know where you get the dictionary - I unsuccessfully tried the >>>>> OpenOffice one by myself (the Ayaspell one), and I had no arabic >>>>> stopwords >>>>> file. >>>>> >>>>> Renaming the file is supposed to be enough (I did it successfully for >>>>> Thailandese dictionary) - the ".aff'" file becoming the ".affix" one. >>>>> When I tried to create the dictionary: >>>>> >>>>> CREATE TEXT SEARCH DICTIONARY ar_ispell ( >>>>> TEMPLATE = ispell, >>>>> DictFile = ar_utf8, >>>>> AffFile = ar_utf8, >>>>> StopWords = english >>>>> ); >>>>> >>>>> I had an error: >>>>> >>>>> ERREUR: mauvais format de fichier affixe pour le drapeau >>>>> CONTEXTE : ligne 42 du fichier de configuration ? >>>>> /usr/share/pgsql/tsearch_data/ar_utf8.affix ? : ? PFX Aa Y 40 >>>>> >>>>> (which means Bad format of Affix file for flag, line 42 of configuration >>>>> file) >>>>> >>>>> Do you have an error when creating your dictionary? >>>>> >>>>> Daniel >>>>> >>>>> Mohamed a ?crit : >>>>> >>>>> >>>>> I have ran into some problems here. >>>>> I am trying to implement arabic fulltext search on three columns. >>>>> >>>>> To create a dictionary I have a hunspell dictionary and and arabic stop >>>>> file. >>>>> >>>>> CREATE TEXT SEARCH DICTIONARY hunspell_dic ( >>>>> TEMPLATE = ispell, >>>>> DictFile = hunarabic, >>>>> AffFile = hunarabic, >>>>> StopWords = arabic >>>>> ); >>>>> >>>>> >>>>> 1) The problem is that the hunspell contains a .dic and a .aff file but >>>>> the configuration requeries a .dict and .affix file. I have tried to >>>>> change >>>>> the endings but with no success. >>>>> >>>>> 2) ts_lexize('hunspell_dic', 'ARABIC WORD') returns nothing >>>>> >>>>> 3) How can I convert my .dic and .aff to valid .dict and .affix ? >>>>> >>>>> 4) I have read that when using dictionaries, if a word is not recognized >>>>> by >>>>> any dictionary it will not be indexed. I find that troublesome. I would >>>>> like >>>>> everything but the stop words to be indexed. I guess this might be a >>>>> step >>>>> that I am not ready for yet, but just wanted to put it out there. >>>>> >>>>> >>>>> >>>>> Also I would like to know how the process of the fulltext search >>>>> implementation looks like, from config to search. >>>>> >>>>> Create dictionary, then a text configuration, add dic to configuration, >>>>> index columns with gin or gist ... >>>>> >>>>> How does a search look like? Does it match against the gin/gist index. >>>>> Have that index been built up using the dictionary/configuration, or is >>>>> the >>>>> dictionary only used on search frases? >>>>> >>>>> / Moe >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>> >>> Regards, >>> Oleg >>> _____________________________________________________________ >>> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >>> Sternberg Astronomical Institute, Moscow University, Russia >>> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >>> phone: +007(495)939-16-83, +007(495)939-23-83 >>> >> >> > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: