Re: Full text index not being used - Mailing list pgsql-general
From | Alex Neth |
---|---|
Subject | Re: Full text index not being used |
Date | |
Msg-id | B230030F-2C30-4A6B-9E44-9A40A9FA5E84@liivid.com Whole thread Raw |
In response to | Re: Full text index not being used (Oleg Bartunov <oleg@sai.msu.su>) |
Responses |
Re: Full text index not being used
|
List | pgsql-general |
Based on suggestions from this list, I am trying to create a tsvector column and index that, since it is perhaps the recheck and rebuilding of all the vectors that is slowing things down. I don't understand why a recheck is necessary on a gin index..... My update statement has been running for 36 hours now and has not finished. The statement is: update source_listings set flv = to_tsvector('english', full_listing); I know that it is still working because it occasionally prints out one of those long word errors. I have only 1.6M rows and each entry in that column is a standard size web page with just the text, maybe 3-5K. For sure I don't have meaningful long words. Perhaps that is because it is not handling the HTML well and I should be parsing down the web page first. Hopefully that doesn't mean I need to rebuild this column over the course of 3 days - I didn't expect it to take this long so I thought I'd just try it out. On Feb 2, 2009, at 1:49 AM, Oleg Bartunov wrote: > Alex, > > what text you're indexing ? I don't believe you have meaningful > very long words ( > 2047 characters). > > Do you really need multicolumn index ? > > I'd recommend to separate problem - create column fts for > tsvector('english',full_listing), create index on it and try full-text > query. The way you're doing imply calling to_tsvector every time you > search, which can be very costly. > > > Olegk > > On Sun, 1 Feb 2009, Alex wrote: > >> So this seems to be because the result size is too big. I still >> don't >> know why it is looping through every record and printing a warning, >> but adding a LIMIT makes the queries complete in a reasonable time >> (although not all that fast). >> >> However I need to sort and also have many other facets that may or >> may >> not be included in the query. Adding a sort makes it load every >> record again and take forever. >> >> I tried to create an index including all of the fields I query on to >> see if that would work, but I get an error the the index row is too >> large: >> >> => create index master_index on source_listings(geo_lat, geo_lon, >> price, bedrooms, region, city, listing_type, to_tsvector('english', >> full_listing), post_time); >> NOTICE: word is too long to be indexed >> DETAIL: Words longer than 2047 characters are ignored. >> NOTICE: word is too long to be indexed >> DETAIL: Words longer than 2047 characters are ignored. >> NOTICE: word is too long to be indexed >> DETAIL: Words longer than 2047 characters are ignored. >> NOTICE: word is too long to be indexed >> DETAIL: Words longer than 2047 characters are ignored. >> ERROR: index row requires 13356 bytes, maximum size is 8191 >> >> Any ideas about how to resolve this? >> >> > > Regards, > Oleg > _____________________________________________________________ > Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), > Sternberg Astronomical Institute, Moscow University, Russia > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ > phone: +007(495)939-16-83, +007(495)939-23-83
pgsql-general by date: