Re: default_text_search_config and expression indexes - Mailing list pgsql-hackers
From | Mike Rylander |
---|---|
Subject | Re: default_text_search_config and expression indexes |
Date | |
Msg-id | b918cf3d0708141013u7ff808fds8bcf11a58918f6d1@mail.gmail.com Whole thread Raw |
In response to | Re: default_text_search_config and expression indexes (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: default_text_search_config and expression indexes
Re: default_text_search_config and expression indexes Re: default_text_search_config and expression indexes |
List | pgsql-hackers |
On 8/13/07, Bruce Momjian <bruce@momjian.us> wrote: > Heikki Linnakangas wrote: > > Bruce Momjian wrote: > > > Heikki Linnakangas wrote: > > >> Removing the default configuration setting altogether removes the 2nd > > >> problem, but that's not good from a usability point of view. And it > > >> doesn't solve the general issue, you can still do things like: > > >> SELECT * FROM foo WHERE to_tsvector('confA', textcol) @@ > > >> to_tsquery('confB', 'query'); > > > > > > True, but in that case you are specifically naming different > > > configurations, so it is hopefully obvious you have a mismatch. > > > > There's many more subtle ways to do that. For example, filling a > > tsvector column using a DEFAULT clause. But then you sometimes fill it > > in the application instead, with a different configuration. Or if one of > > the function calls is buried in another user defined function. > > > > I don't think explicitly naming the configuration gives enough protection. > > Oh, wow, OK, well in that case the text search API isn't ready and we > will have to hold this for 8.4. > I've been watching this thread with a mixture of dread and hope, waiting to see where the developers' inclination will end up; whether leaving a useful foot gun available will be allowed. This is just my $0.02 as a fairly heavy user of the current tsearch2 code, but I sincerely hope you do not cripple the system by removing the ability to store tsvectors built using arbitrary configurations in a single column. Yes, it can lead to unexpected results if you do not know what you are doing, but if you have gone beyond building a single tsearch2 configuration then you are required to know what you are doing. What's more, IMO the default configuration mechanism feels very much like a CONSTRAINT, as Oleg suggests. That point is one of cognizance, where if one has gone to the trouble of setting up multiple configurations and has learned enough to do so correctly, then one necessarily understands the importance of the setting and can use it (or not, and use explicit configurations) correctly. The default config lowers the bar to an acceptable level for beginners that have no need of multiple configurations, and while I don't feel too strongly, personally, about having a default, I think it is both useful and helpful for new users -- it was for me. Now, so this email isn't entirely complaining, and as a data point for the discussion, I'll explain why I do not want to see tsearch2 crippled in the way suggested by Heikki and Bruce. My application (http://open-ils.org, which run >80% of the public libraries in Georgia, USA, http://gapines.org and http://georgialibraries.org/lib/pines.html) requires that I be able to search a corpus of bibliographic records in a mix of languages, and potentially with mixed stop-word rules, with one query. I cannot know ahead of time what languages will be used in the corpus and I cannot restrict any one query to one language. To accomplish this, the record itself will be inspected inside an INSERT/UPDATE trigger to determine the language and type, and use the correct configuration for creating the tsvector. This will obviously result in a "mixed" tsvector column, but that's exactly what I need. I can filter on record language if the user happens to specify a query language (and thus configuration), or simply rank the assumed (IP based, perhaps, or browser preference based) preferred language higher, or one of a hundred other things. But I won't be able to do any of that if tsvectors are required to have one and only one configuration per column. Anyway, I felt I needed to provide some outside perspective to this, as a user, since it seems that the external viewpoint (my particular viewpoint, at least) was missing from the discussion. Thanks, folks, for all the work on this so far! --miker
pgsql-hackers by date: