Thread: Help with tokenization of age-ranges in full text search
development=# \dF+ public.user_search Text search configuration "public.user_search"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+-----------------------
asciihword | simple_nostem_no_stop
asciiword | simple_nostem_no_stop
blank | simple
email | simple_nostem_no_stop
file | simple
float | simple
host | simple
hword | simple_nostem_no_stop
hword_asciipart | simple_nostem_no_stop
hword_numpart | simple_nostem_no_stop
hword_part | simple_nostem_no_stop
int | simple
numhword | simple_nostem_no_stop
numword | simple_nostem_no_stop
sfloat | simple
uint | simple
url | simple
url_path | simple
version | simple
word | simple_nostem_no_stop
development=# select alias, token from ts_debug('public.user_search', 'Boys 9-10');
alias | token
-----------+-------
asciiword | Boys
blank |
uint | 9
int | -10
(4 rows)
development=# select alias, token from ts_debug('public.user_search', 'Boys x9-y10');
alias | token
---------------+--------
asciiword | Boys
blank |
numhword | x9-y10
hword_numpart | x9
blank | -
hword_numpart | y10
(6 rows)
development=# \dF+ public.user_search
Text search configuration "public.user_search"
Parser: "pg_catalog.default"
Token | Dictionaries
-----------------+-----------------------
asciihword | simple_nostem_no_stop
asciiword | simple_nostem_no_stop
email | simple_nostem_no_stop
hword | simple_nostem_no_stop
hword_asciipart | simple_nostem_no_stop
hword_numpart | simple_nostem_no_stop
hword_part | simple_nostem_no_stop
numhword | simple_nostem_no_stop
numword | simple_nostem_no_stop
word | simple_nostem_no_stop
development=# select alias, token, lexemes from ts_debug('public.user_search', 'Boys 9-10');
alias | token | lexemes
-----------+-------+---------
asciiword | Boys | {boys}
blank | |
uint | 9 |
int | -10 |
(4 rows)
Mason Hale wrote: > Hello, I've got a 9.3 database hosted at Heroku. > > I'm full text search to search for "group names" in part of my application, > and some of my group names are the names of youth sports age groups like > "Boys 9-10" or "Girls 11-12". > > I would like for a search for the terms "Boys", "Boys 9-10", "9", "10" or > "9-10" to match "Boys 9-10". Hm, so if there's a sport for Boys 8-10, what will you do when it doesn't match a query for "9"? Does this matter? I mean, maybe tokenization is not the most appropriate thing to do in this case. > So my question is -- can I get the tokenization that I want out of a > configuration of the stock available token types? The tokenizer stuff is not the most configurable part of the FTS stuff, sadly. -- Álvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services