Re: Very bad FTS performance with the Polish config - Mailing list pgsql-hackers
From | Oleg Bartunov |
---|---|
Subject | Re: Very bad FTS performance with the Polish config |
Date | |
Msg-id | Pine.LNX.4.64.0911181208290.6801@sn.sai.msu.ru Whole thread Raw |
In response to | Very bad FTS performance with the Polish config (Wojciech Knapik <webmaster@wolniartysci.pl>) |
Responses |
Re: Very bad FTS performance with the Polish config
|
List | pgsql-hackers |
Wojciech, your polish_english, polish configurations uses ispell language and slow, while english configuration doesn't contains ispell. So, what's your complains ? Try add ispell dictionary to english configuration and see timings. Oleg On Wed, 18 Nov 2009, Wojciech Knapik wrote: > > Hello > > > This has been discussed in #postgresql and posted to -performance a > couple days ago, but no solution has been found. The discussion can be > found here: > http://archives.postgresql.org/pgsql-performance/2009-11/msg00162.php > > I just finished implementing a "search engine" for my site and found > ts_headline extremely slow when used with a Polish tsearch > configuration, while fast with English. All of it boils down to a simple > testcase, but first some background. > > I tested on 8.3.1 on G5/OSX 10.5.8 and Xeon/Gentoo AMD64-2008.0 > (2.6.21), then switched both installations to 8.3.8 (both packages > compiled from source, but provided by the distro - port/emerge). The > Polish dictionaries and config were created according to this article > (it's in Polish, but the code is self-explanatory): > > http://www.depesz.com/index.php/2008/04/22/polish-tsearch-in-83-polski-tsearch-w-postgresie-83/ > > Now for the testcase: > > text = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do > eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad > minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip > ex ea commodo consequat. Duis aute irure dolor in reprehenderit in > voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur > sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt > mollit anim id est laborum.' > > # explain analyze select ts_headline('polish', text, > plainto_tsquery('polish', 'foobar')); > QUERY PLAN > ------------------------------------------------------------------------------------ > Result (cost=0.00..0.01 rows=1 width=0) (actual time=6.407..6.470 > rows=1 loops=1) > Total runtime: 6.524 ms > (2 rows) > > # explain analyze select ts_headline('english', text, > plainto_tsquery('english', 'foobar')); > QUERY PLAN > ------------------------------------------------------------------------------------ > Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.861..0.895 > rows=1 loops=1) > Total runtime: 0.935 ms > (2 rows) > > # explain analyze select ts_headline('simple', text, > plainto_tsquery('simple', 'foobar')); > QUERY PLAN > ------------------------------------------------------------------------------------ > Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.627..0.660 > rows=1 loops=1) > Total runtime: 0.697 ms > (2 rows) > > # > > As you can see, the results differ by an order of magnitude between > Polish and English. While in this simple testcase it's a non-issue, in > the real world this translates into enormous overhead. > > One of the queries I ran testing my site's search function took > 1870ms. When I took that query and changed all ts_headline(foo) calls to > just foo, the time dropped below 100ms. That's the difference between > something completely unacceptable and something quite useful. > > I can post various details about the hardware, software and specific > queries, but the testcases speak for themselves. I'm sure you can easily > reproduce my results. > > I'm putting my code into production tomorrow, since I can't wait > anymore. Hints would be very much appreciated! > > > cheers, > Wojciech Knapik > > PS. This issue is not related to the loading time of dictionaries, or > calls to ts_headline for results that won't be displayed. A few other > details can be found here > http://pastie.textmate.org/private/hqnqfnsfsknjyjlffzmog along with > snippets of my conversations in #postgresql that lead to this testcase. > Big thanks to RhodiumToad for helping me with fts for the last couple > days ;] > > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-hackers by date: