Re: [GENERAL] Fragments in tsearch2 headline - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: [GENERAL] Fragments in tsearch2 headline |
Date | |
Msg-id | Pine.LNX.4.64.0807170327060.11363@sn.sai.msu.ru Whole thread Raw |
In response to | Re: [GENERAL] Fragments in tsearch2 headline (Sushant Sinha <sushant354@gmail.com>) |
Responses |
Re: [GENERAL] Fragments in tsearch2 headline
|
List | pgsql-hackers |
On Wed, 16 Jul 2008, Sushant Sinha wrote: > I will add test queries and their results for the corner cases in a > separate file. I guess the only thing I am confused about is what should > be the behavior of headline generation when Query items have words of > size less than ShortWord. I guess the answer is to ignore ShortWord > parameter but let me know if the answer is any different. > ShortWord is about headline text, it doesn't affects words in query, so you can't discard them from query. > -Sushant. > > On Thu, 2008-07-17 at 02:53 +0400, Oleg Bartunov wrote: >> Sushant, >> >> first, please, provide simple test queries, which demonstrate the right work >> in the corner cases. This will helps reviewers to test your patch and >> helps you to make sure your new version is ok. For example: >> >> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery); >> ts_headline >> ------------------------------------------------------ >> <b>1</b> 2 <b>3</b> 4 5 <b>1</b> 2 <b>3</b> <b>1</b> >> >> This select breaks your code: >> >> =# select ts_headline('1 2 3 4 5 1 2 3 1','1&3'::tsquery,'maxfragments=2'); >> ts_headline >> -------------- >> ... 2 ... >> >> and so on .... >> >> >> Oleg >> On Tue, 15 Jul 2008, Sushant Sinha wrote: >> >>> Attached a new patch that: >>> >>> 1. fixes previous bug >>> 2. better handles the case when cover size is greater than the MaxWords. >>> Basically it divides a cover greater than MaxWords into fragments of >>> MaxWords, resizes each such fragment so that each end of the fragment >>> contains a query word and then evaluates best fragments based on number of >>> query words in each fragment. In case of tie it picks up the smaller >>> fragment. This allows more query words to be shown with multiple fragments >>> in case a single cover is larger than the MaxWords. >>> >>> The resizing of a fragment such that each end is a query word provides room >>> for stretching both sides of the fragment. This (hopefully) better presents >>> the context in which query words appear in the document. If a cover is >>> smaller than MaxWords then the cover is treated as a fragment. >>> >>> Let me know if you have any more suggestions or anything is not clear. >>> >>> I have not yet added the regression tests. The regression test suite seemed >>> to be only ensuring that the function works. How many tests should I be >>> adding? Is there any other place that I need to add different test cases for >>> the function? >>> >>> -Sushant. >>> >>> >>> Nice. But it will be good to resolve following issues: >>>> 1) Patch contains mistakes, I didn't investigate or carefully read it. Get >>>> http://www.sai.msu.su/~megera/postgres/fts/apod.dump.gz<http://www.sai.msu.su/%7Emegera/postgres/fts/apod.dump.gz>and loadin db. >>>> >>>> Queries >>>> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1') >>>> from apod where to_tsvector(body) @@ plainto_tsquery('black hole'); >>>> >>>> and >>>> >>>> # select ts_headline(body, plainto_tsquery('black hole'), 'MaxFragments=1') >>>> from apod; >>>> >>>> crash postgresql :( >>>> >>>> 2) pls, include in your patch documentation and regression tests. >>>> >>>> >>>>> Another change that I was thinking: >>>>> >>>>> Right now if cover size > max_words then I just cut the trailing words. >>>>> Instead I was thinking that we should split the cover into more >>>>> fragments such that each fragment contains a few query words. Then each >>>>> fragment will not contain all query words but will show more occurrences >>>>> of query words in the headline. I would like to know what your opinion >>>>> on this is. >>>>> >>>> >>>> Agreed. >>>> >>>> >>>> -- >>>> Teodor Sigaev E-mail: teodor@sigaev.ru >>>> WWW: >>>> http://www.sigaev.ru/ >>>> >>> >> >> Regards, >> Oleg >> _____________________________________________________________ >> Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), >> Sternberg Astronomical Institute, Moscow University, Russia >> Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ >> phone: +007(495)939-16-83, +007(495)939-23-83 > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: