Thread: text search synonym dictionary anomaly with numbers
I am working with street address data in which 'first st' has been entered as '1 st' and so on. So I have created a text search dictionary with entries: first 1 1st 1 And initially it seems to be working properly: SELECT ts_lexize('rwg_synonym','first'); ts_lexize ----------- {1} SELECT ts_lexize('rwg_synonym','1st'); ts_lexize ----------- {1} But my queries on '1st' are not returning the expected results: SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1'); count ------- 403 <- this is what I want SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first'); count ------- 403 <- this is also good SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st'); count ------- 4 <- this is not good. There are 4 records that do have '1st', but why am I not getting 403 records? Thanks for reading, Rich -- Richard Greenwood richard.greenwood@gmail.com www.greenwoodmap.com
Richard, you should check your mapping - '1st' belongs to 'numword' and may be processed in a different way than 'first' or '1'. Oleg On Sat, 26 Nov 2011, Richard Greenwood wrote: > I am working with street address data in which 'first st' has been > entered as '1 st' and so on. So I have created a text search > dictionary with entries: > first 1 > 1st 1 > And initially it seems to be working properly: > > SELECT ts_lexize('rwg_synonym','first'); > ts_lexize > ----------- > {1} > > > SELECT ts_lexize('rwg_synonym','1st'); > ts_lexize > ----------- > {1} > > But my queries on '1st' are not returning the expected results: > > SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1'); > count > ------- > 403 <- this is what I want > > SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first'); > count > ------- > 403 <- this is also good > > SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st'); > count > ------- > 4 <- this is not good. There are 4 records that do have '1st', > but why am I not getting 403 records? > > Thanks for reading, > Rich > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Oleg, Thank you. I am sure that you have identified my problem. \dF+ english (output below) lists my dictionary which is named 'rwg_synonym' before numword so I would have thought that my dictionary would have normalized '1st' to '1' before the numword dictionary was reached. Maybe this question belongs in a new thread, but I do thank you for helping me to look in the correct place. Best regards, Rich fremontwy=# \dF+ english Text search configuration "pg_catalog.english" Parser: "pg_catalog.default" Token | Dictionaries -----------------+-------------------------- asciihword | english_stem asciiword | rwg_synonym,english_stem email | simple file | simple float | simple host | simple hword | english_stem hword_asciipart | english_stem hword_numpart | simple hword_part | english_stem int | simple numhword | simple numword | simple sfloat | simple uint | simple url | simple url_path | simple version | simple word | english_stem On Sun, Nov 27, 2011 at 7:29 AM, Oleg Bartunov <oleg@sai.msu.su> wrote: > Richard, > > you should check your mapping - '1st' belongs to 'numword' and may be > processed > in a different way than 'first' or '1'. > > Oleg > On Sat, 26 Nov 2011, Richard Greenwood wrote: > >> I am working with street address data in which 'first st' has been >> entered as '1 st' and so on. So I have created a text search >> dictionary with entries: >> first 1 >> 1st 1 >> And initially it seems to be working properly: >> >> SELECT ts_lexize('rwg_synonym','first'); >> ts_lexize >> ----------- >> {1} >> >> >> SELECT ts_lexize('rwg_synonym','1st'); >> ts_lexize >> ----------- >> {1} >> >> But my queries on '1st' are not returning the expected results: >> >> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1'); >> count >> ------- >> 403 <- this is what I want >> >> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first'); >> count >> ------- >> 403 <- this is also good >> >> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st'); >> count >> ------- >> 4 <- this is not good. There are 4 records that do have '1st', >> but why am I not getting 403 records? >> >> Thanks for reading, >> Rich >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83 > -- Richard Greenwood richard.greenwood@gmail.com www.greenwoodmap.com
To answer my own question - my synonym dictionary was not be applied to '1st' because '1st' is a numword, not an asciiword, and my synonym dictionary was not mapped to numword. To map a dictionary token class: ALTER TEXT SEARCH CONFIGURATION english ALTER MAPPING FOR numword WITH my_synonym_dictionary, simple; The dictionary must already have been created with CREATE TEXT SEARCH DICTIONARY Rich On Sun, Nov 27, 2011 at 9:57 AM, Richard Greenwood <richard.greenwood@gmail.com> wrote: > Oleg, > > Thank you. I am sure that you have identified my problem. > > \dF+ english (output below) lists my dictionary which is named > 'rwg_synonym' before numword so I would have thought that my > dictionary would have normalized '1st' to '1' before the numword > dictionary was reached. Maybe this question belongs in a new thread, > but I do thank you for helping me to look in the correct place. > > Best regards, > Rich > > fremontwy=# \dF+ english > Text search configuration "pg_catalog.english" > Parser: "pg_catalog.default" > Token | Dictionaries > -----------------+-------------------------- > asciihword | english_stem > asciiword | rwg_synonym,english_stem > email | simple > file | simple > float | simple > host | simple > hword | english_stem > hword_asciipart | english_stem > hword_numpart | simple > hword_part | english_stem > int | simple > numhword | simple > numword | simple > sfloat | simple > uint | simple > url | simple > url_path | simple > version | simple > word | english_stem > > > > On Sun, Nov 27, 2011 at 7:29 AM, Oleg Bartunov <oleg@sai.msu.su> wrote: >> Richard, >> >> you should check your mapping - '1st' belongs to 'numword' and may be >> processed >> in a different way than 'first' or '1'. >> >> Oleg >> On Sat, 26 Nov 2011, Richard Greenwood wrote: >> >>> I am working with street address data in which 'first st' has been >>> entered as '1 st' and so on. So I have created a text search >>> dictionary with entries: >>> first 1 >>> 1st 1 >>> And initially it seems to be working properly: >>> >>> SELECT ts_lexize('rwg_synonym','first'); >>> ts_lexize >>> ----------- >>> {1} >>> >>> >>> SELECT ts_lexize('rwg_synonym','1st'); >>> ts_lexize >>> ----------- >>> {1} >>> >>> But my queries on '1st' are not returning the expected results: >>> >>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1'); >>> count >>> ------- >>> 403 <- this is what I want >>> >>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('first'); >>> count >>> ------- >>> 403 <- this is also good >>> >>> SELECT count(*) FROM parcel_attrib WHERE txtsrch @@ to_tsquery('1st'); >>> count >>> ------- >>> 4 <- this is not good. There are 4 records that do have '1st', >>> but why am I not getting 403 records? >>> >>> Thanks for reading, >>> Rich >>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 >> > > > > -- > Richard Greenwood > richard.greenwood@gmail.com > www.greenwoodmap.com > -- Richard Greenwood richard.greenwood@gmail.com www.greenwoodmap.com