Re: [GENERAL] Creation of tsearch2 index is very slow - Mailing list pgsql-performance
From | Oleg Bartunov |
---|---|
Subject | Re: [GENERAL] Creation of tsearch2 index is very slow |
Date | |
Msg-id | Pine.GSO.4.63.0601211808490.14417@ra.sai.msu.su Whole thread Raw |
In response to | Re: [GENERAL] Creation of tsearch2 index is very slow (Martijn van Oosterhout <kleptog@svana.org>) |
Responses |
Re: [GENERAL] Creation of tsearch2 index is very slow
|
List | pgsql-performance |
On Sat, 21 Jan 2006, Martijn van Oosterhout wrote: > On Sat, Jan 21, 2006 at 04:29:13PM +0300, Oleg Bartunov wrote: >> Martijn, you're right! We want not only to split page to very >> different parts, but not to increase the number of sets bits in >> resulted signatures, which are union (OR'ed) of all signatures >> in part. We need not only fast index creation (thanks, Tom !), >> but a better index. Some information is available here >> http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals >> There are should be more detailed document, but I don't remember where:) > > I see how it works, what I don't quite get is whether the "inverted > index" you refer to is what we're working with here, or just what's in > tsearchd? just tsearchd. We plan to implement inverted index into PostgreSQL core and then adapt tsearch2 to use it as option for read-only archives. > >>> That's harder though (this algorithm does approximate it sort of) >>> and I havn't come up with an algorithm yet >> >> Don't ask how hard we thought :) > > Well, looking at how other people are struggling with it, it's > definitly a Hard Problem. One thing though, I don't think the picksplit > algorithm as is really requires you to strictly have the longest > distance, just something reasonably long. So I think the alternate > algorithm I posted should produce equivalent results. No idea how to > test it though... you may try our development module 'gevel' to see how dense is a signature. www=# \d v_pages Table "public.v_pages" Column | Type | Modifiers -----------+-------------------+----------- tid | integer | not null path | character varying | not null body | character varying | title | character varying | di | integer | dlm | integer | de | integer | md5 | character(22) | fts_index | tsvector | Indexes: "v_pages_pkey" PRIMARY KEY, btree (tid) "v_pages_path_key" UNIQUE, btree (path) "v_gist_key" gist (fts_index) # select * from gist_print('v_gist_key') as t(level int, valid bool, a gtsvector) where level =1; level | valid | a -------+-------+-------------------------------- 1 | t | 1698 true bits, 318 false bits 1 | t | 1699 true bits, 317 false bits 1 | t | 1701 true bits, 315 false bits 1 | t | 1500 true bits, 516 false bits 1 | t | 1517 true bits, 499 false bits (5 rows) Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-performance by date: