Thread: Can tsearch do some basic text mining
Hi, We have big blobs of text (average 10,000 characters) in a database, from which we would like to discover the most often repeated words or phrases. Can tsearch be used for this kind of pattern search? I suppose it's Text Mining 101 sort of stuff, nothing complex. TIA!
On Fri, 24 Aug 2007, Phoenix Kiula wrote: > Hi, > > We have big blobs of text (average 10,000 characters) in a database, > from which we would like to discover the most often repeated words or > phrases. Can tsearch be used for this kind of pattern search? I > suppose it's Text Mining 101 sort of stuff, nothing complex. there is stat() function, see http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes for more details. It's not fast, so better to save results in a table > > TIA! > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
On 25/08/07, Oleg Bartunov <oleg@sai.msu.su> wrote: > On Fri, 24 Aug 2007, Phoenix Kiula wrote: > > > Hi, > > > > We have big blobs of text (average 10,000 characters) in a database, > > from which we would like to discover the most often repeated words or > > phrases. Can tsearch be used for this kind of pattern search? I > > suppose it's Text Mining 101 sort of stuff, nothing complex. > > there is stat() function, see > http://www.sai.msu.su/~megera/wiki/Tsearch_V2_Notes > for more details. > It's not fast, so better to save results in a table Thanks. This seems to give words only. How about phrases? If words are so slow, I shudder to think how long phrase analysis would take -- it that is possible at all?