Re: tsearch2 headline and postgresql.conf - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: tsearch2 headline and postgresql.conf |
Date | |
Msg-id | Pine.GSO.4.63.0601221110190.14417@ra.sai.msu.su Whole thread Raw |
In response to | tsearch2 headline and postgresql.conf (pgsql-performance@nullmx.com) |
Responses |
Re: tsearch2 headline and postgresql.conf
|
List | pgsql-performance |
You didn't provides us any query with explain analyze. Just to make sure you're fine. Oleg On Sun, 22 Jan 2006, pgsql-performance@nullmx.com wrote: > Hi folks, > > I'm not sure if this is the right place for this but thought I'd ask. I'm > relateively new to postgres having only used it on 3 projects and am just > delving into the setup and admin for the second time. > > I decided to try tsearch2 for this project's search requirements but am > having trouble attaining adequate performance. I think I've nailed it down > to trouble with the headline() function in tsearch2. > In short, there is a crawler that grabs HTML docs and places them in a > database. The search is done using tsearch2 pretty much installed according > to instructions. I have read a couple online guides suggested by this list > for tuning the postgresql.conf file. I only made modest adjustments because > I'm not working with top-end hardware and am still uncertain of the actual > impact of the different paramenters. > > I've been learning 'explain' and over the course of reading I have done > enough query tweaking to discover the source of my headache seems to be > headline(). > > On a query of 429 documents, of which the avg size of the stripped down > document as stored is 21KB, and the max is 518KB (an anomaly), tsearch2 > performs exceptionally well returning most queries in about 100ms. > > On the other hand, following the tsearch2 guide which suggests returning that > first portion as a subquery and then generating the headline() from those > results, I see the query increase to 4 seconds! > > This seems to be directly related to document size. If I filter out that > 518KB doc along with some 100KB docs by returning "substring( stripped_text > FROM 0 FOR 50000) AS stripped_text" I decrease the time to 1.4 seconds, but > increase the risk of not getting a headline. > > Seeing as how this problem is directly tied to document size, I'm wondering > if there are any specific settings in postgresql.conf that may help, or is > this just a fact of life for the headline() function? Or, does anyone know > what the problem is and how to overcome it? > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-performance by date: