Re: Tsearch vector not stored by update/set - Mailing list pgsql-general
From | Justin L. Kennedy |
---|---|
Subject | Re: Tsearch vector not stored by update/set |
Date | |
Msg-id | Pine.SOL.4.33.0503211626030.16947-100000@acmez.gatech.edu Whole thread Raw |
In response to | Tsearch vector not stored by update/set ("Justin L. Kennedy" <jk289@prism.gatech.edu>) |
Responses |
Re: Tsearch vector not stored by update/set
|
List | pgsql-general |
On Sun, 20 Mar 2005, Andrew J. Kopciuch wrote: > On Thursday 17 March 2005 17:55, you wrote: > > The short question is why does this: > > > > select to_tsvector('default', coalesce(name, '') ||' '|| > > coalesce(description, '') ||' '|| coalesce(keywords,'')) from link_items; > > > > give different results than this: > > > > update link_items set linksfti=to_tsvector('default', coalesce(name, '') > > > > ||' '|| coalesce(description, '') ||' '|| coalesce(keywords,'')); > > > > select linksfti from link_items; > > > > > It shouldn't. I just tested with some of your data on my machine, and the > results were fine. (PG 7.4.6). I can not see why they would be any > different for 7.3.2. > > Your data looks rather strange. (re "'60':1 '000':2"). Is that really all > that was inserted? Or have you just left some out for your email? About 95% of the 1900 insertions ended up with empty strings (not NULLs), the other 5% looked like that above. Either just numbers, or occasionally words (defined by consecutive non-whitespace characters separated by whitespace) that had numbers or symbols in them. Like: "U.S. Senate" was transformed in such a way that "Senate" was dropped completely and "U.S." became lowercased "u.s.". Another example was a URL that happened to be in the description column of one was captured, but the rest of the text was not. Another had a name of "World T.E.A.M. Sports" and all that was stored in the vector was "t.e.a.m." It seems to be selective of only numbers, words with numbers in them, words with '.' or '/' characters. It completely ignores any other words or text in any of the 3 fields. > I could see this being a configuration issue possibly. What do your pg_ts > tables look like? Have you made modifications there? This morning, I decided to remove the following trigger from the link_items table: CREATE TRIGGER updateprodtbl BEFORE INSERT OR UPDATE ON link_items FOR EACH ROW EXECUTE PROCEDURE tsearch2('linksfti', 'name', 'description', 'keywords'); Now the UPDATE command I listed above works, so apparently there is something about this trigger that is blocking the search vector from being stored. This trigger was copied and pasted (with only changes to the column names) from http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/tsearch-V2-intro.html in the INDEXING FIELDS IN A TABLE section. This does fix the immediate problem of getting the search results for the live website we are running, but now we have the problem of keeping it up to date. I could run a nightly cronjob to update all the rows, but it seems inefficient, and I would really like to know why the trigger would keep it from working on 7.3, but not 8.0. You requested the pg_ts_* tables: On the Linux-redhat, pg7.3.2 pg_ts_cfg: (4 rows) oid ts_name prs_name locale 106407 "default" "default" "C" 988004 "default_english" "default" "en_US" 106408 "default_russian" "default" "ru_RU.KOI8-R" 106409 "simple" "default" NULL pg_ts_dict: (5 rows) oid dict_name dict_initoption dict_init dict_lexize 106356 en_stem "" snb_en_init snb_lexize 106361 ispell_template NULL spell_init spell_lexize 106358 ru_stem "/usr/local/pgsql/share/contrib/russion.stop" snb_ru_init snb_lexize 106353 simple NULL dex_init dex_lexize 106364 synonym NULL syn_init syn_lexize pg_ts_parser: (1 row) oid prs_name prs_start prs_nexttoken prs_end prs_headline prs_lextype 106389 "default" prsd_start prsd_getlexeme prsd_end prsd_headline prsd_lextype pg_ts_cfgmap(73 rows) ts_name tok_alias dict_name "default" "lword" "{en_stem}" "default" "nlword" "{simple}" "default" "word" "{simple}" "default" "email" "{simple}" "default" "url" "{simple}" "default" "host" "{simple}" "default" "sfloat" "{simple}" "default" "version" "{simple}" "default" "part_hword" "{simple}" "default" "nlpart_hword" "{simple}" "default" "lpart_hword" "{en_stem}" "default" "hword" "{simple}" "default" "lhword" "{en_stem}" "default" "nlhword" "{simple}" "default" "uri" "{simple}" "default" "file" "{simple}" "default" "float" "{simple}" "default" "int" "{simple}" "default" "uint" "{simple}" "default_russian" "lword" "{en_stem}" "default_russian" "nlword" "{ru_stem}" "default_russian" "word" "{ru_stem}" "default_russian" "email" "{simple}" "default_russian" "url" "{simple}" "default_russian" "host" "{simple}" "default_russian" "sfloat" "{simple}" "default_russian" "version" "{simple}" "default_russian" "part_hword" "{simple}" "default_russian" "nlpart_hword" "{ru_stem}" "default_russian" "lpart_hword" "{en_stem}" "default_russian" "hword" "{ru_stem}" "default_russian" "lhword" "{en_stem}" "default_russian" "nlhword" "{ru_stem}" "default_russian" "uri" "{simple}" "default_russian" "file" "{simple}" "default_russian" "float" "{simple}" "default_russian" "int" "{simple}" "default_russian" "uint" "{simple}" "simple" "lword" "{simple}" "simple" "nlword" "{simple}" "simple" "word" "{simple}" "simple" "email" "{simple}" "simple" "url" "{simple}" "simple" "host" "{simple}" "simple" "sfloat" "{simple}" "simple" "version" "{simple}" "simple" "part_hword" "{simple}" "simple" "nlpart_hword" "{simple}" "simple" "lpart_hword" "{simple}" "simple" "hword" "{simple}" "simple" "lhword" "{simple}" "simple" "nlhword" "{simple}" "simple" "uri" "{simple}" "simple" "file" "{simple}" "simple" "float" "{simple}" "simple" "int" "{simple}" "simple" "uint" "{simple}" "default_english" "url" "{simple}" "default_english" "host" "{simple}" "default_english" "sfloat" "{simple}" "default_english" "uri" "{simple}" "default_english" "int" "{simple}" "default_english" "float" "{simple}" "default_english" "email" "{simple}" "default_english" "word" "{simple}" "default_english" "hword" "{simple}" "default_english" "nlword" "{simple}" "default_english" "nlpart_hword" "{simple}" "default_english" "part_hword" "{simple}" "default_english" "nlhword" "{simple}" "default_english" "file" "{simple}" "default_english" "uint" "{simple}" "default_english" "version" "{simple}" On the 8.0.0-beta5 Windows 2000 machine: pg_ts_cfg: (4 rows, identical to 7.3.2) oid ts_name prs_name locale 370162 "default" "default" "C" 370165 "default_english" "default" "en_US" 370163 "default_russian" "default" "ru_RU.KOI8-R" 370164 "simple" "default" NULL pg_ts_cfgmap (73 rows, identical to 7.3.2) Not listed again, all identical except for OIDs pg_ts_dict (5 rows, identical to 7.3.2 except for OIDS) pg_ts_parser (1 row, identical to 7.3.2 except for OIDs) I have made a single change to it from its default installation. When I was working with the rank_cd() function on the 8.0.0 machine, it had errors due to a non-existant english stop file, so I changed pg_ts_dict.dict_initoption = '' where dict_name = 'en_stem'. The indexing system was working fine both before and after the change to the pg_ts_dict table. I also propagated the change to the 7.3.2 machine even though it didn't have the error message (the stop file didn't exist on that computer either, but it never gave an error message about it).
pgsql-general by date: