Re: multi terabyte fulltext searching - Mailing list pgsql-general
From | Benjamin Arai |
---|---|
Subject | Re: multi terabyte fulltext searching |
Date | |
Msg-id | DA88D23F-A0AB-4154-9463-12D73FFA9066@araisoft.com Whole thread Raw |
In response to | Re: multi terabyte fulltext searching ("Joshua D. Drake" <jd@commandprompt.com>) |
Responses |
Re: multi terabyte fulltext searching
|
List | pgsql-general |
24. Benjamin On Mar 21, 2007, at 9:09 AM, Joshua D. Drake wrote: > Benjamin Arai wrote: >> True, but what happens when my database reaches 100 terabytes? Is 5 >> seconds ok? How about 10? My problem is that I do not believe the >> performance loss I am experiencing as the data becomes large is >> (log the >> # of records). This worries me because I could be doing something >> wrong. Or I might be able to do something better. > > Well a couple of things you could do, especially if you have the > ability > to throw hardware at it. > > How many spindles do you have? > > J > > >> >> Benjamin >> >> On Mar 21, 2007, at 8:49 AM, Joshua D. Drake wrote: >> >>> Benjamin Arai wrote: >>>> Hi, >>>> >>>> I have been struggling with getting fulltext searching for very >>>> large >>>> databases. I can fulltext index 10s if gigs without any problem >>>> but >>>> when I start geting to hundreds of gigs it becomes slow. My >>>> current >>>> system is a quad core with 8GB of memory. I have the resource >>>> to throw >>>> more hardware at it but realistically it is not cost effective >>>> to buy a >>>> system with 128GB of memory. Is there any solutions that people >>>> have >>>> come up with for indexing very large text databases? >>> >>> GIST indexes are very large. >>> >>>> Essentially I have several terabytes of text that I need to >>>> index. Each >>>> record is about 5 paragraphs of text. I am currently using >>>> TSearch2 >>>> (stemming and etc) and getting sub-optimal results. Queries >>>> take more >>>> than a second to execute. >>> >>> you are complaining about more than a second with a terabyte of >>> text? >>> >>> >>>> Has anybody implemented such a database using >>>> multiple systems or some special add-on to TSearch2 to make things >>>> faster? I want to do something like partitioning the data into >>>> multiple >>>> systems and merging the ranked results at some master node. Is >>>> something like this possible for PostgreSQL or must it be a >>>> software >>>> solution? >>>> >>>> Benjamin >>>> >>>> ---------------------------(end of >>>> broadcast)--------------------------- >>>> TIP 9: In versions below 8.0, the planner will ignore your >>>> desire to >>>> choose an index scan if your joining column's datatypes do >>>> not >>>> match >>>> >>> >>> >>> -- >>> === The PostgreSQL Company: Command Prompt, Inc. === >>> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 >>> Providing the most comprehensive PostgreSQL solutions since 1997 >>> http://www.commandprompt.com/ >>> >>> Donate to the PostgreSQL Project: http://www.postgresql.org/about/ >>> donate >>> PostgreSQL Replication: http://www.commandprompt.com/products/ >>> >> >> >> ---------------------------(end of >> broadcast)--------------------------- >> TIP 5: don't forget to increase your free space map settings >> > > > -- > > === The PostgreSQL Company: Command Prompt, Inc. === > Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 > Providing the most comprehensive PostgreSQL solutions since 1997 > http://www.commandprompt.com/ > > Donate to the PostgreSQL Project: http://www.postgresql.org/about/ > donate > PostgreSQL Replication: http://www.commandprompt.com/products/ >
pgsql-general by date: