Re: fulltext search and hunspell - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: fulltext search and hunspell |
Date | |
Msg-id | Pine.LNX.4.64.1102081333380.31836@sn.sai.msu.ru Whole thread Raw |
In response to | Re: fulltext search and hunspell (Jens Sauer <jsauer65@googlemail.com>) |
Responses |
Re: fulltext search and hunspell
|
List | pgsql-general |
Jens, have you tried german compound dictionary from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/ Oleg On Tue, 8 Feb 2011, Jens Sauer wrote: > Hey, > > thanks for your answer. > > First I checked the links in the tsearch_data directory > de_de.affix, and de_de.dict are symlinks to the corresponding files in > /var/cache/postgresql/dicts/ > Then I recreated them by using pg_updatedicts. > > This is an extract of the de_de.affix file: > > # this is the affix file of the de_DE Hunspell dictionary > # derived from the igerman98 dictionary > # > # Version: 20091006 (build 20100127) > # > # Copyright (C) 1998-2009 Bjoern Jacke <bjoern@j3e.de> > # > # License: GPLv2, GPLv3 or OASIS distribution license agreement > # There should be a copy of both of this licenses included > # with every distribution of this dictionary. Modified > # versions using the GPL may only include the GPL > > SET ISO8859-1 > TRY esijanrtolcdugmphbyfvkwqxz??????????ESIJANRTOLCDUGMPHBYFVKWQXZ????-. > > PFX U Y 1 > PFX U 0 un . > > PFX V Y 1 > PFX V 0 ver . > > SFX F Y 35 > [...] > > I cannot find "compoundwords controlled z" there, so I manually added it. > > [...] > # versions using the GPL may only include the GPL > > compoundwords controlled z > > SET ISO8859-1 > TRY esijanrtolcdugmphbyfvkwqxz??????????ESIJANRTOLCDUGMPHBYFVKWQXZ????-. > [...] > > Then I restarted PostgreSQL. > > Now I get an error: > SELECT * FROM ts_debug('Schokoladenfabrik'); > FEHLER: falsches Affixdateiformat f?r Flag > CONTEXT: Zeile 18 in Konfigurationsdatei > ?/usr/share/postgresql/8.4/tsearch_data/de_de.affix?: ?PFX U Y 1 > ? > SQL-Funktion ?ts_debug? Anweisung 1 > SQL-Funktion ?ts_debug? Anweisung 1 > > Which means: > ERROR: wrong Affixfileformat for flag > CONTEXT: Line 18 in Configuration ... > > If I add > COMPOUNDFLAG Z > ONLYINCOMPOUND L > > instead of "compoundwords controlled z" > > I didn't get an error: > > SELECT * FROM ts_debug('Schokoladenfabrik'); > alias | description | token | > dictionaries | dictionary | lexemes > -----------+-----------------+-------------------+-------------------------------+-------------+------------------- > asciiword | Word, all ASCII | Schokoladenfabrik | > {german_hunspell,german_stem} | german_stem | {schokoladenfabr} > (1 row) > > But it seems that the hunspell dictionary is not working for compound words. > > Maybe pg_updatedicts has a bug and generates affix files in the wrong format? > > Jens > > 2011/2/7 Oleg Bartunov <oleg@sai.msu.su>: >> Jens, >> >> could you check affix file for >> compoundwords controlled z >> >> also, can you provide link to dictionary files, so we can check if they >> supported, since we have only rudiment support of hunspell. >> btw,it'd be nice to have output from ts_debug() to make sure dictionaries >> actually used. >> >> Oleg > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: