Thread: FTS GIN Index Question
Hi List I have a largish partitioned table, it has ~60 million records in each of 12 partitions. It appears that a Full Text Index could speed up some user queries a lot. A quick test with an additional tsvector column revealed that this would take up around 35 GB of space for this column and then maybe 5 more for the gin index on it. As this is a lot of space (~ 480 GB), I'm a bit tempted to use a gin index without the separate tsvector column. However, the doc says that this will be slower. Does anyone have an idea of how much slower we're talking here? The index defintion would be a concatenation of two setweights(), i.e.: ... using gin( (setweight(to_tsvector('config',coalesce(col1,'')), 'A') || setweight(to_tsvector('config',coalesce(col2,'')), 'B'))) Also, general recommendations regarding full text search configurations of that size are very welcome. Christian
Christian, On Wed, 29 Sep 2010, Christian Ramseyer wrote: > Hi List > > I have a largish partitioned table, it has ~60 million records in each of 12 > partitions. It appears that a Full Text Index could speed up some user > queries a lot. > > A quick test with an additional tsvector column revealed that this would take > up around 35 GB of space for this column and then maybe 5 more for the gin > index on it. As this is a lot of space (~ 480 GB), I'm a bit tempted to use a > gin index without the separate tsvector column. However, the doc says that > this will be slower. do you have problem with disk space ? Searching index is usually very fast operation, only small part of index readed. Did you checked time to read index ? > > Does anyone have an idea of how much slower we're talking here? The index > defintion would be a concatenation of two setweights(), i.e.: > > ... using gin( > (setweight(to_tsvector('config',coalesce(col1,'')), 'A') || > setweight(to_tsvector('config',coalesce(col2,'')), 'B'))) > > Also, general recommendations regarding full text search configurations of > that size are very welcome. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On 09/29/2010 12:10 PM, Oleg Bartunov wrote: > Christian, > > On Wed, 29 Sep 2010, Christian Ramseyer wrote: > >> Hi List >> >> I have a largish partitioned table, it has ~60 million records in each >> of 12 partitions. It appears that a Full Text Index could speed up >> some user queries a lot. >> >> A quick test with an additional tsvector column revealed that this >> would take up around 35 GB of space for this column and then maybe 5 >> more for the gin index on it. As this is a lot of space (~ 480 GB), >> I'm a bit tempted to use a gin index without the separate tsvector >> column. However, the doc says that this will be slower. > > do you have problem with disk space ? Searching index is usually very fast > operation, only small part of index readed. Did you checked time to read > index ? > Hi Oleg, thanks for your reply, Well I could get the disk space, but it's on a corporate SAN so it's a bit tedious. I didn't compare directly so far since even creating both index versions for only a single partition would bring me close to the space limit and also take maybe 20 hours or so. So my idea was to ask here first if I'm looking at a "a few percent" or a "orders of magnitude" difference. But of course, if there isn't enough feeback for an informed decision I'll either try both or just go with the space-intensive variant to be on the safe side, user experience is certainly more important than disk usage. Christian