Thread: suitable text search configuration
Hi, Is initdb supposed to pick up reasonable TS configurations in general? If so, it's failing for me: initdb: could not find suitable text search configuration for locale fr_CA.UTF-8 The default text search configuration will be set to "simple". It fails for es_CL as well. ... oh, I see there's a table in initdb.c Are we supposed to add entries to it, one for each country? I'm wondering if we should try to match the part before the _ using just the language, if the complete match fails. (i.e. match "es_CL" using just "es", "fr_CA" using just "fr", etc). -- Alvaro Herrera http://www.PlanetPostgreSQL.org/ "When the proper man does nothing (wu-wei), his thought is felt ten thousand miles." (Lao Tse)
Alvaro Herrera <alvherre@commandprompt.com> writes: > ... oh, I see there's a table in initdb.c > Are we supposed to add entries to it, one for each country? I'm > wondering if we should try to match the part before the _ using just the > language, if the complete match fails. (i.e. match "es_CL" using just > "es", "fr_CA" using just "fr", etc). Actually, looking at the examples so far, I'm thinking we should just consider the string up to the first _, period. An alternative is to try to match the full locale (es_ES) and then try the language (es) if that wasn't found. That would leave room to put country-by-country exceptions in, but for the moment we'd not have any. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > >> ... oh, I see there's a table in initdb.c >> > > >> Are we supposed to add entries to it, one for each country? I'm >> wondering if we should try to match the part before the _ using just the >> language, if the complete match fails. (i.e. match "es_CL" using just >> "es", "fr_CA" using just "fr", etc). >> > > Actually, looking at the examples so far, I'm thinking we should just > consider the string up to the first _, period. > > An alternative is to try to match the full locale (es_ES) and then try > the language (es) if that wasn't found. That would leave room to put > country-by-country exceptions in, but for the moment we'd not have any. > > > Can anyone point to a real world example where country by country would make sense? If we need to distinguish flavors of some languages, I would not be at all surprised if this was not by country anyway. cheers andrew
Andrew Dunstan <andrew@dunslane.net> writes: > Tom Lane wrote: >> Actually, looking at the examples so far, I'm thinking we should just >> consider the string up to the first _, period. > Can anyone point to a real world example where country by country would > make sense? For the current set of built-in dictionaries it seems pretty clear that country distinctions are useless. If we ever did need that distinction it would only be after adding dictionaries that aren't going to be in 8.3 ... so I'm leaning to keeping the code simple for the moment. regards, tom lane
Andrew Dunstan wrote: > > Tom Lane wrote: >> Actually, looking at the examples so far, I'm thinking we should just >> consider the string up to the first _, period. I studied the standards a bit to see if they mandated that the locale names must be in the form "language_COUNTRY", and couldn't find anything. Which makes me think it's mostly by (very well established) convention. I think trying to parse the _ should not be done on a first attempt. >> An alternative is to try to match the full locale (es_ES) and then try >> the language (es) if that wasn't found. That would leave room to put >> country-by-country exceptions in, but for the moment we'd not have any. > > Can anyone point to a real world example where country by country would > make sense? If we need to distinguish flavors of some languages, I would > not be at all surprised if this was not by country anyway. pt_BR versus pt_PT. I'm not sure if it makes a difference to a stemmer, but maybe to a thesaurus it does ... -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
Alvaro Herrera <alvherre@commandprompt.com> writes: > Andrew Dunstan wrote: >> Can anyone point to a real world example where country by country would >> make sense? If we need to distinguish flavors of some languages, I would >> not be at all surprised if this was not by country anyway. > pt_BR versus pt_PT. I'm not sure if it makes a difference to a stemmer, > but maybe to a thesaurus it does ... Right, but only when we have built-in dictionaries that separately address the two countries will there be any need to teach initdb about it. I think we should KISS for now. regards, tom lane
Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > ... oh, I see there's a table in initdb.c > > > Are we supposed to add entries to it, one for each country? I'm > > wondering if we should try to match the part before the _ using just the > > language, if the complete match fails. (i.e. match "es_CL" using just > > "es", "fr_CA" using just "fr", etc). > > Actually, looking at the examples so far, I'm thinking we should just > consider the string up to the first _, period. I found that there is an ISO spec for "cultural elements", ISO/IEC 15897, a working draft for which can be found at http://www.open-std.org/jtc1/sc22/open/n3586.pdf Chapter 13 talks about naming of locales. I think glibc is supposed to follow this standard. -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Have we got consensus that initdb should just look at the first component of the locale name to choose a text search configuration (at least for 8.3)? If so, who's going to make the change? I can do it but don't want to duplicate effort if someone else was already on it. regards, tom lane
Tom Lane wrote: > Have we got consensus that initdb should just look at the first > component of the locale name to choose a text search configuration > (at least for 8.3)? If so, who's going to make the change? > I can do it but don't want to duplicate effort if someone else > was already on it. Thanks, it works wonderfully for me now. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 "Ni aun el genio muy grande llegaría muy lejos si tuviera que sacarlo todo de su propio interior" (Goethe)