Thread: gin index creation performance problems
I'm looking for a bit more guidance on gin index creation. The process: - vaccum analyze. - start a transaction that: - drop the triggers to update a tsvector - drop the index on the tsvector - fill several tables - update the tsvector in a table with ~800K records - recreate the gin index - commit To have a rough idea of the data involved: - 800K record - tsvector formed from concatenation of 6 fields - total length of concatenated fields ~ 200 chars * - average N of lexemes in tsvector 10 * [*] guessed 2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5 Index creation takes more than 1h. maintenance_work_mem is still untouched. What would be a good value to start from? Anything else to do to improve performances? -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mon, 3 Nov 2008 16:45:35 +0100 Ivan Sergio Borgonovo <mail@webthatworks.it> wrote: Forgot to add that top say postgresql is using 100% CPU and 15% memory. > I'm looking for a bit more guidance on gin index creation. > > The process: > - vaccum analyze. > - start a transaction that: > - drop the triggers to update a tsvector > - drop the index on the tsvector > - fill several tables > - update the tsvector in a table with ~800K records > - recreate the gin index > - commit > > To have a rough idea of the data involved: > - 800K record > - tsvector formed from concatenation of 6 fields > - total length of concatenated fields ~ 200 chars * > - average N of lexemes in tsvector 10 * > [*] guessed > > 2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5 > > Index creation takes more than 1h. > > maintenance_work_mem is still untouched. What would be a good value > to start from? > Anything else to do to improve performances? -- Ivan Sergio Borgonovo http://www.webthatworks.it
On Mon, 3 Nov 2008, Ivan Sergio Borgonovo wrote: > I'm looking for a bit more guidance on gin index creation. > > The process: > - vaccum analyze. > - start a transaction that: > - drop the triggers to update a tsvector > - drop the index on the tsvector > - fill several tables > - update the tsvector in a table with ~800K records > - recreate the gin index > - commit > > To have a rough idea of the data involved: > - 800K record > - tsvector formed from concatenation of 6 fields > - total length of concatenated fields ~ 200 chars * > - average N of lexemes in tsvector 10 * > [*] guessed > > 2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5 > > Index creation takes more than 1h. > > maintenance_work_mem is still untouched. What would be a good value > to start from? > Anything else to do to improve performances? why you didn't change maintenance_work_mem ? You can change it online just before create index. Bulk gin index creation uses it as a buffer and you can save a lot of IO. All this written in the documentation and there are other parameters you should be concerned about. Regards, Oleg _____________________________________________________________ Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru), Sternberg Astronomical Institute, Moscow University, Russia Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/ phone: +007(495)939-16-83, +007(495)939-23-83
Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > maintenance_work_mem is still untouched. What would be a good value > to start from? GIN index build time is *very* sensitive to maintenance_work_mem. Try cranking it up to a couple hundred megabytes and see if that helps. Also, if you're on 8.2, I think 8.3 might be faster. regards, tom lane
On Mon, 03 Nov 2008 11:04:45 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Ivan Sergio Borgonovo <mail@webthatworks.it> writes: > > maintenance_work_mem is still untouched. What would be a good > > value to start from? > GIN index build time is *very* sensitive to maintenance_work_mem. > Try cranking it up to a couple hundred megabytes and see if that > helps. 200MB did improve the situation. I was waiting a clue about a reasonable value from where to start. But the performance is far from being stable. It can take few minutes to more than 20min even with this setting. I can't understand if it is CPU bound or RAM bound. CPU load is always near 100% while postgresql is using 15% of RAM. Still I've 52K of swap used... But anyway the performance is very erratic. BTW Is maintenance_work_mem set per connection? While gin index looks appreciably faster (actually it is lightening fast) for searches I'm considering to revert to gist since even with 200MB maintenance_work_mem it still look a pain to build up the index, especially considering the random time required to build it up. What puzzle me is that while before increasing maintenance_work mem it was always terribly slow now there is a huge variation in rebuilt time with 200MB. Even vacuum full; can be pretty slow (>5min) and still 100% CPU use. The index is dropped at the beginning and rebuilt at the end inside a quite busy transaction. Could it be related? > Also, if you're on 8.2, I think 8.3 might be faster. 8.3 etch backport. thanks -- Ivan Sergio Borgonovo http://www.webthatworks.it