Re: ts_rank seems very slow (140 ranked documents / second on my machine) - Mailing list pgsql-general
From | Oleg Bartunov |
---|---|
Subject | Re: ts_rank seems very slow (140 ranked documents / second on my machine) |
Date | |
Msg-id | Pine.LNX.4.64.1107131932410.17363@sn.sai.msu.ru Whole thread Raw |
In response to | Re: ts_rank seems very slow (140 ranked documents / second on my machine) (Nicolas Grilly <nicolas@gardentechno.com>) |
Responses |
Re: ts_rank seems very slow (140 ranked documents / second
on my machine)
|
List | pgsql-general |
I didn't notice, reading 40K tuples in random order takes a long time and this is a problem of any database. Can you measure time to read all documents found ? :( The only solution I see is to store enough information for ranking in index. Oleg On Wed, 13 Jul 2011, Nicolas Grilly wrote: > Hi Oleg and all, > > On Wed, Jul 13, 2011 at 08:16, Oleg Bartunov <oleg@sai.msu.su> wrote: >> there is problem with estimating of cost scanning gin index in < 9.1 >> versions, >> so you can set enable_seqscan=off; >> or try 9.1 which beta3 now. > > I re-ran my queries using enable seqscan=off. > > Now the first query, without ts_rank, uses the GIN index: > > set enable_seqscan=off; > explain analyze select id > from posts_1000000 > where to_tsquery('english', 'crare') @@ document_vector > limit 50; > > Limit (cost=42290.12..42306.31 rows=50 width=4) (actual > time=16.259..16.412 rows=50 loops=1) > -> Bitmap Heap Scan on posts_1000000 (cost=42290.12..57877.02 > rows=48152 width=4) (actual time=16.256..16.344 rows=50 loops=1) > Recheck Cond: ('''crare'''::tsquery @@ document_vector) > -> Bitmap Index Scan on index_posts_documents_1000000 > (cost=0.00..42278.08 rows=48152 width=0) (actual time=13.265..13.265 > rows=49951 loops=1) > Index Cond: ('''crare'''::tsquery @@ document_vector) > Total runtime: 16.484 ms > > But the second query, the one that uses ts_rank, is still very slow... > Any idea why? Is ts_rank efficient enough to find the best 50 matches > among 50 000 documents? > > set enable_seqscan=off; > explain analyze select id > from posts_1000000 > where to_tsquery('english', 'crare') @@ document_vector > order by ts_rank_cd(document_vector, to_tsquery('english', 'crare'), 32) desc > limit 50; > > Limit (cost=59596.98..59597.10 rows=50 width=22) (actual > time=296212.052..296212.257 rows=50 loops=1) > -> Sort (cost=59596.98..59717.36 rows=48152 width=22) (actual > time=296186.928..296187.007 rows=50 loops=1)" > Sort Key: (ts_rank_cd(document_vector, '''crare'''::tsquery, 32)) > Sort Method: top-N heapsort Memory: 27kB > -> Bitmap Heap Scan on posts_1000000 > (cost=42290.12..57997.40 rows=48152 width=22) (actual > time=70.861..296059.515 rows=49951 loops=1) > Recheck Cond: ('''crare'''::tsquery @@ document_vector) > -> Bitmap Index Scan on index_posts_documents_1000000 > (cost=0.00..42278.08 rows=48152 width=0) (actual time=24.922..24.922 > rows=49951 loops=1) > Index Cond: ('''crare'''::tsquery @@ document_vector) > Total runtime: 296220.493 ms > >>> By the way, does ts_rank is supposed to use a GIN index when it's >>> available? >> >> no, I see no benefit :) > > Ok. But what is the solution to improve ts_rank execution time? Am I > doing something wrong? > > Thanks for your help, > > Nicolas > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: