Re: Tsearch2 performance on big database - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: Tsearch2 performance on big database |
Date | |
Msg-id | Pine.GSO.4.62.0503221840480.5508@ra.sai.msu.su Whole thread Raw |
In response to | Re: Tsearch2 performance on big database (Mike Rylander <mrylander@gmail.com>) |
Responses |
Re: Tsearch2 performance on big database
|
List | pgsql-performance |
Mike, no comments before Rick post tsearch configs and increased buffers ! Union shouldn't be faster than (term1|term2). tsearch2 internals description might help you understanding tsearch2 limitations. See http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals Also, don't miss my notes: http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes Oleg On Tue, 22 Mar 2005, Mike Rylander wrote: > On Tue, 22 Mar 2005 14:25:19 +0100, Rick Jansen <rick@rockingstone.nl> wrote: >> >> ilab=# explain analyze select count(titel) from books where idxfti @@ >> to_tsquery('default', 'buckingham | palace'); >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------- >> Aggregate (cost=35547.99..35547.99 rows=1 width=56) (actual >> time=125968.119..125968.120 rows=1 loops=1) >> -> Index Scan using idxfti_idx on books (cost=0.00..35525.81 >> rows=8869 width=56) (actual time=0.394..125958.245 rows=3080 loops=1) >> Index Cond: (idxfti @@ '\'buckingham\' | \'palac\''::tsquery) >> Total runtime: 125968.212 ms >> (4 rows) >> >> Time: 125969.264 ms >> ilab=# > > Ahh... I should have qualified my claim. I am creating a google-esqe > search interface and almost every query uses '&' as the term joiner. > 'AND' queries and one-term queries are orders of magnitude faster than > 'OR' queries, and fortunately are the expected default for most users. > (Think, "I typed in these words, therefore I want to match these > words"...) An interesting test may be to time multiple queries > independently, one for each search term, and see if the combined cost > is less than a single 'OR' search. If so, you could use UNION to join > the results. > > However, the example you originally gave ('terminology') should be > very fast. On a comparable query ("select count(value) from > metabib.full_rec where index_vector @@ to_tsquery('default','jane');") > I get 12ms. > > Oleg, do you see anything else on the surface here? > > Try: > > EXPLAIN ANALYZE > SELECT titel FROM books WHERE idxfti @@ > to_tsquery('default', 'buckingham') > UNION > SELECT titel FROM books WHERE idxfti @@ > to_tsquery('default', 'palace'); > > and see if using '&' instead of '|' where you can helps out. I > imagine you'd be surprised by the speed of: > > SELECT titel FROM books WHERE idxfti @@ > to_tsquery('default', 'buckingham&palace'); > > >> >> > As an example of what I think you *should* be seeing, I have a similar >> > box (4 procs, but that doesn't matter for one query) and I can search >> > a column with tens of millions of rows in around a second. >> > >> >> That sounds very promising, I'd love to get those results.. could you >> tell me what your settings are, howmuch memory you have and such? > > 16G of RAM on a dedicated machine. > > > shared_buffers = 15000 # min 16, at least max_connections*2, 8KB each > work_mem = 10240 # min 64, size in KB > maintenance_work_mem = 1000000 # min 1024, size in KB > # big m_w_m for loading data... > > random_page_cost = 2.5 # units are one sequential page fetch cost > # fast drives, and tons of RAM > > > Regards, Oleg _____________________________________________________________ Oleg Bartunov, sci.researcher, hostmaster of AstroNet, Sternberg Astronomical Institute, Moscow University (Russia) Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(095)939-16-83, +007(095)939-23-83
pgsql-performance by date: