Thread: tsearch 2 query

tsearch 2 query

From

"Matthew Terenzio"

Date:

02 October 2008, 11:37:03

There are less than 20,000 records being searched here, but the query takes several minutes.

I know this may not be enough info, but would one suggest I optimize the query or put my attention towards other areas.

SELECT id,date,headline as head,headline(body,q),rank(

vectors,q),timestamp FROM stories,to_tsquery('$query') AS q WHERE vectors @@ q ORDER BY $sort DESC OFFSET $offset LIMIT 20

Re: tsearch 2 query

From

Tom Lane

Date:

02 October 2008, 12:09:30

"Matthew Terenzio" <mterenzio@gmail.com> writes:
> There are less than 20,000 records being searched here, but the query takes
> several minutes.
> I know this may not be enough info, but would one suggest I optimize the
> query or put my attention towards other areas.

What does EXPLAIN ANALYZE show for it?

            regards, tom lane

Re: tsearch 2 query

From

"Matthew Terenzio"

Date:

02 October 2008, 12:32:03

Thanks Tom, Sorry if that last post went over multiple times. I was getting a mailing failure (or so I thought)

here is EXPLAIN ANALYZE . I really need to work on my skills at analyzing these:

Limit  (cost=105505.78..105505.83 rows=20 width=655) (actual time=74806.973..74807.037 rows=20 loops=1)

  ->  Sort  (cost=105505.78..105555.44 rows=19861 width=655) (actual time=74806.968..74806.989 rows=20 loops=1)

        Sort Key: stories."timestamp"

        ->  Nested Loop  (cost=0.00..90497.94 rows=19861 width=655) (actual time=720.251..74798.672 rows=680 loops=1)

              ->  Function Scan on q  (cost=0.00..12.50 rows=1000 width=32) (actual time=0.013..0.017 rows=1 loops=1)

              ->  Index Scan using description_index on stories  (cost=0.00..90.14 rows=20 width=623) (actual time=700.633..63243.713 rows=680 loops=1)

                    Index Cond: (stories.vectors @@ "outer".q)

Total runtime: 74847.177 ms

Re: tsearch 2 query

From

Tom Lane

Date:

02 October 2008, 12:56:48

"Matthew Terenzio" <mterenzio@gmail.com> writes:
> here is EXPLAIN ANALYZE . I really need to work on my skills at analyzing
> these:

> Limit  (cost=105505.78..105505.83 rows=20 width=655) (actual
> time=74806.973..74807.037 rows=20 loops=1)
>    ->  Sort  (cost=105505.78..105555.44 rows=19861 width=655) (actual
> time=74806.968..74806.989 rows=20 loops=1)
>          Sort Key: stories."timestamp"
>          ->  Nested Loop  (cost=0.00..90497.94 rows=19861 width=655)
> (actual time=720.251..74798.672 rows=680 loops=1)
>                ->  Function Scan on q  (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.013..0.017 rows=1 loops=1)
>                ->  Index Scan using description_index on stories
> (cost=0.00..90.14 rows=20 width=623) (actual time=700.633..63243.713
> rows=680 loops=1)
>                      Index Cond: (stories.vectors @@ "outer".q)

>  Total runtime: 74847.177 ms

Huh.  The plan looks fine --- I had thought maybe the optimizer was
dropping the ball, but this seems to be more or less what you need.
The indexscan seems awfully slow though.

The only thought I have to offer is that you're apparently using quite
an old version of Postgres --- the 1000-row estimate for a scalar
function scan would only have happened in 8.0 or before.  Perhaps
updating to something newer would help.  I'm not sure if there are any
big performance improvements in GIST indexes per se, but in 8.2 or
8.3 you'd have the option to switch to a GIN index instead.  If this
table is read-mostly then that'd be a win.

            regards, tom lane

Re: tsearch 2 query

From

Oleg Bartunov

Date:

02 October 2008, 13:32:11

ok,

you calculate headline() 19861 times, while you need only 20.
Use subselect and will be surprized

Oleg
On Thu, 2 Oct 2008, Matthew Terenzio wrote:

> Thanks Tom, Sorry if that last post went over multiple times. I was getting
> a mailing failure (or so I thought)
>
> here is EXPLAIN ANALYZE . I really need to work on my skills at analyzing
> these:
>
> Limit  (cost=105505.78..105505.83 rows=20 width=655) (actual
> time=74806.973..74807.037 rows=20 loops=1)
>
>   ->  Sort  (cost=105505.78..105555.44 rows=19861 width=655) (actual
> time=74806.968..74806.989 rows=20 loops=1)
>
>         Sort Key: stories."timestamp"
>
>         ->  Nested Loop  (cost=0.00..90497.94 rows=19861 width=655)
> (actual time=720.251..74798.672 rows=680 loops=1)
>
>               ->  Function Scan on q  (cost=0.00..12.50 rows=1000
> width=32) (actual time=0.013..0.017 rows=1 loops=1)
>
>               ->  Index Scan using description_index on stories
> (cost=0.00..90.14 rows=20 width=623) (actual time=700.633..63243.713
> rows=680 loops=1)
>
>                     Index Cond: (stories.vectors @@ "outer".q)
>
> Total runtime: 74847.177 ms
>

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83