Thread: TSearch and rankings
Hi, Is there a way to use tsearch so that it returns documents that have less than all the required keywords? The idea is that if a document only has 3 out of 4 terms, it is still returned, but with a lower ranking. So far I haven't found a way to do this in the documentation. Is there something like a "maybe" operator? (ie: 'foo&bar&~doh', meaning documents with foo and bar, and optionaly doh, but those with would be ranked higher) Cheers, Bas.
Bas Scheffers wrote: > Hi, > > Is there a way to use tsearch so that it returns documents that have less > than all the required keywords? The idea is that if a document only has 3 > out of 4 terms, it is still returned, but with a lower ranking. > > So far I haven't found a way to do this in the documentation. Is there > something like a "maybe" operator? (ie: 'foo&bar&~doh', meaning documents > with foo and bar, and optionaly doh, but those with would be ranked > higher) (foo&bar)|(foo&bar&doh) I think, it's what you want. -- Teodor Sigaev E-mail: teodor@sigaev.ru
Teodor Sigaev said: > (foo&bar)|(foo&bar&doh) > I think, it's what you want. That simple huh? Can become a bit complicated, doing an OR for all the different combinations, but a quick test I just did did show a higher ranking for the documents that matched the larger query. And quite usable in my application. Do big queries have a significant inpact on search performance? (this is something that is important!) Thanks, Bas.
On Mon, 9 Feb 2004, Bas Scheffers wrote: > Teodor Sigaev said: > > (foo&bar)|(foo&bar&doh) > > I think, it's what you want. > That simple huh? Can become a bit complicated, doing an OR for all the > different combinations, but a quick test I just did did show a higher > ranking for the documents that matched the larger query. And quite usable > in my application. > > Do big queries have a significant inpact on search performance? (this is > something that is important!) Sure :( In degenerated case you end with query like (word1|word2|word3|..|wordN) and it's equivalent running N searches with single word query, which isn't effective. Intrinsically, tsearch2 is much faster for long AND queries, which is opposite to standard search engines based on inverted indexes. > > Thanks, > Bas. > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
Oleg Bartunov wrote: > On Mon, 9 Feb 2004, Bas Scheffers wrote: > > >>Teodor Sigaev said: >> >>>(foo&bar)|(foo&bar&doh) >>>I think, it's what you want. >> >>That simple huh? Can become a bit complicated, doing an OR for all the >>different combinations, but a quick test I just did did show a higher >>ranking for the documents that matched the larger query. And quite usable >>in my application. >> >>Do big queries have a significant inpact on search performance? (this is >>something that is important!) > > > Sure :( In degenerated case you end with query like (word1|word2|word3|..|wordN) > and it's equivalent running N searches with single word query, which isn't > effective. Intrinsically, tsearch2 is much faster for long AND queries, > which is opposite to standard search engines based on inverted indexes. Ugh. The performance for complex query such as (foo&bar)|(foo&bar&doh)|(foo&bar&doh&other) will be equals to simple query foo&bar, because other variants is a stronger that simplest variant. Performance is defined by number of page readed (we suppose that CPU is much faster than disks) and if more ANDed words in query than smaller number of readed pages. -- Teodor Sigaev E-mail: teodor@sigaev.ru
Teodor Sigaev said: > (foo&bar)|(foo&bar&doh)|(foo&bar&doh&other) > will be equals to simple query foo&bar, because other variants is a stronger That sounds encouraging. My "documents" are actualy quite small. That is because they are not documents, but just keywords for a user's profile. (like age28, height187, countryuk, etc) So my documents won't have much more than 20-25 terms to begin with. But you do get queries like '(age25|age26|...|age35)&(height180|...|height200)&countryuk)' I tested this with a 10000 user database last night on my Athlon 850/384MB and queries returned actrately in <150ms (and this included a normal where clause on the base table I need to do as well) So so far I am impressed. I'll test with a 100K user set later this week, using the "maybe" query style and let you know my results. Thanks again, Bas.