Re: Simplifying Text Search - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Simplifying Text Search |
Date | |
Msg-id | 1194936519.2644.261.camel@ebony.site Whole thread Raw |
In response to | Re: Simplifying Text Search (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Simplifying Text Search
|
List | pgsql-hackers |
On Mon, 2007-11-12 at 23:03 -0500, Bruce Momjian wrote: > Simon Riggs wrote: > > On Mon, 2007-11-12 at 11:56 -0500, Tom Lane wrote: > > > Simon Riggs <simon@2ndquadrant.com> writes: > > > > So we end up with a normal sounding function that is overloaded to > > > > provide all of the various goodies. > > > > > > As best I can tell, @@ does exactly this already. This is just a > > > different spelling of the same capability, and I don't actually > > > find it better. Why is "text_search(x,y)" better than "x @@ y"? > > > We don't recommend that people write "texteq(x,y)" instead of > > > "x = y". > > > > Most people don't understand those differences. x = y means "make sure > > they are the same" to most people. They don't see what you (and I) see: > > function and operator interchangeability. So text_search() is better > > than @@ and = is better than texteq(). Life ain't neat... > > > > Right now, Full Text Search SQL looks like complete gibberish and it > > dissuades many people from using what is an awesome set of features. I > > just want to add a little sugar to help people get started. > > I realized this when editing the documentation but not clearly. I > noticed that: > > http://momjian.us/main/writings/pgsql/sgml/textsearch-intro.html#TEXTSEARCH-MATCHING > > tsvector @@ tsquery > tsquery @@ tsvector > text @@ tsquery > text @@ text > > The first two of these we saw already. The form text @@ tsquery is > equivalent to to_tsvector(x) @@ y. The form text @@ text is equivalent > to to_tsvector(x) @@ plainto_tsquery(y). > > was quite odd, especially the "text @@ text" case, and in fact it makes > casting almost required unless you can remember which one is a query and > which is a vector (hint, the vector is first). What really adds to the > confusion is that the operator is two _identical_ characters, meaning > the operator is symetric, and it behave symetric if you cast one side, > but as vector @@ query if you don't. I'm thinking we can have an inlinable function contains(text, text) returns int Return values limited to just 0 or 1 or NULL, as with SQL/MM. It's close to SQL/MM, but not exact. contains(sourceText, searchText) is a macro for case to_tsvector(default_text_search_config, sourceText) @@ to_tsquery(default_text_search_config, searchText) when true then 1 when false then 0 else null end that allows us to write indexable queries like this WHERE contains(sourceText, searchText) > 0 where we must still have built the index on a constant config. Not checked that still works yet, maybe not, in which case something slightly more complex to make sure its still indexable. This is the difficult part. So changes are: - add SQL function - simplify first 2 pages of docs using this function -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
pgsql-hackers by date: