Re: Full Text Search 101? - Mailing list pgsql-general
From | Aurynn Shaw |
---|---|
Subject | Re: Full Text Search 101? |
Date | |
Msg-id | 4AFC7E4A.7080305@commandprompt.com Whole thread Raw |
In response to | Full Text Search 101? (Jonathan <jharahush@gmail.com>) |
List | pgsql-general |
Hi Jonathan; > > I posted to this forum once before and was able to receive help. > Thanks again! > > I'm trying to implement full text search capabilities. Basically, I > have a very simple "data catalog" type of website (http:// > gis.drcog.org/datacatalog), where the user can type in a word or words > to search for records that match the criteria. I have a table with a > few fields, but I want to be able to search/index two fields -- the > "name" of the dataset and the "description." (or more, if I can index > fields from other tables too, that would be great). I'd like to be > able to use a full text search to rank the results in terms of > relevance. You can set up an index per table, but I've found that having a single vector table with bridge tables to the data I want to search works well. This would be in the style of: create table vectors ( id serial primary key, vector tsvector not null ); create table interesting_data ( id serial primary key, textual text not null ); CREATE TABLE vector_to_interesting ( v_id int not null references vectors(id), i_id int not null references interesting_data(id) ); And then SELECT id.* FROM interesting_data itd, vectors v, vector_to_interesting itv WHERE v.id = itv.v_id AND itv.i_id = itd.id AND v.vector @@ to_tsquery('searchpattern'); Allowing for some procedures around that that return multiple row types, to the client software. > > What is the best way to handle this? I've read through the PostgreSQL > documentation and don't quite understand it although I'm trying to > understand and am 'playing around' with this on a development server. > Is the default text search configuration enough? I've tried setting my > own configuration but get errors about not finding .dict dictionary > files? I have a default install of PostgreSQL 8.4. > > Also, I've created a tsvector column and created indexes, but it > didn't seem to delete stop words from the indexes. Should the stop > words be indexed? The default "english" configuration in PostgreSQL should have done this. Building your own configuration tends towards being a bit more advanced, and "english" should suit most needs. > > Also, I don't quite understand how to create indexes but rank certain > words as being more important than others, for instance, maybe having > words that come from the "name" column carrying more importance than > words coming from the "description" column. For part of this, there's ts_rank(tsvector, tsquery), as well as the relative weighting system in the textsearch modules. Unfortunately, I don't have much experience with the relative weighting, but, ts_rank() should get you partly there. I also found an interesting page on relative weighting @ http://www.postgresonline.com/special_feature.php?sf_name=postgresql83tsearch_cheatsheet&outputformat=html that might be useful for you. > Finally, I'd like > "watersheds" to come up when someone searches for "water" so I don't > think I have this configured properly, because this record is not > returned. Well, there's two ways to achieve this. The first is to manually the stopwords, which is getting into the custom configuration and a bit on the more advanced side. The other is that in PG8.4, you can do to_tsquery('water:*') which will tell the search parser to do a partial match, which would return "watershed", in this instance. My testing has shown it to be a bit slower (30ms, vs 15ms for a non-partial search), but not egregiously slow. > > Is there a good tutorial or maybe documentation that is a bit easier > to understand? I know my database is far from complicated but I can't > seem to find a configuration that works well. When I try ranking my > results, most of the results end up with the same rank, so something > must be wrong? > > Can someone point me in the right direction? > > Thanks for the help. I appreciate it. > Hope this helps. :) Regards, -- Aurynn Shaw The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564 ext 103 PostgreSQL Replication, Consulting, Custom Development, 24x7 support ashaw@commandprompt.com
pgsql-general by date: