Thread: gin index creation performance problems

gin index creation performance problems

From

Ivan Sergio Borgonovo

Date:

03 November 2008, 11:45:43

I'm looking for a bit more guidance on gin index creation.

The process:
- vaccum analyze.
- start a transaction that:
- drop the triggers to update a tsvector
- drop the index on the tsvector
- fill several tables
- update the tsvector in a table with ~800K records
- recreate the gin index
- commit

To have a rough idea of the data involved:
- 800K record
- tsvector formed from concatenation of 6 fields
- total length of concatenated fields ~ 200 chars *
- average N of lexemes in tsvector 10 *
[*] guessed

2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5

Index creation takes more than 1h.

maintenance_work_mem is still untouched. What would be a good value
to start from?
Anything else to do to improve performances?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: gin index creation performance problems

From

Ivan Sergio Borgonovo

Date:

03 November 2008, 11:53:56

On Mon, 3 Nov 2008 16:45:35 +0100
Ivan Sergio Borgonovo <mail@webthatworks.it> wrote:

Forgot to add that top say postgresql is using 100% CPU and 15%
memory.

> I'm looking for a bit more guidance on gin index creation.
>
> The process:
> - vaccum analyze.
> - start a transaction that:
> - drop the triggers to update a tsvector
> - drop the index on the tsvector
> - fill several tables
> - update the tsvector in a table with ~800K records
> - recreate the gin index
> - commit
>
> To have a rough idea of the data involved:
> - 800K record
> - tsvector formed from concatenation of 6 fields
> - total length of concatenated fields ~ 200 chars *
> - average N of lexemes in tsvector 10 *
> [*] guessed
>
> 2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5
>
> Index creation takes more than 1h.
>
> maintenance_work_mem is still untouched. What would be a good value
> to start from?
> Anything else to do to improve performances?

--
Ivan Sergio Borgonovo
http://www.webthatworks.it

Re: gin index creation performance problems

From

Oleg Bartunov

Date:

03 November 2008, 12:03:24

On Mon, 3 Nov 2008, Ivan Sergio Borgonovo wrote:

> I'm looking for a bit more guidance on gin index creation.
>
> The process:
> - vaccum analyze.
> - start a transaction that:
> - drop the triggers to update a tsvector
> - drop the index on the tsvector
> - fill several tables
> - update the tsvector in a table with ~800K records
> - recreate the gin index
> - commit
>
> To have a rough idea of the data involved:
> - 800K record
> - tsvector formed from concatenation of 6 fields
> - total length of concatenated fields ~ 200 chars *
> - average N of lexemes in tsvector 10 *
> [*] guessed
>
> 2xXeon HT 3.2GHz, 4Gb RAM, SCSI RAID5
>
> Index creation takes more than 1h.
>
> maintenance_work_mem is still untouched. What would be a good value
> to start from?
> Anything else to do to improve performances?

why you didn't change maintenance_work_mem ? You can change it online just
before create index. Bulk gin index creation uses it as a buffer and you can
save a lot of IO.

All this written in the documentation and there are other parameters you
should be concerned about.

     Regards,
         Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

Re: gin index creation performance problems

From

Tom Lane

Date:

03 November 2008, 12:04:48

Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
> maintenance_work_mem is still untouched. What would be a good value
> to start from?

GIN index build time is *very* sensitive to maintenance_work_mem.
Try cranking it up to a couple hundred megabytes and see if that
helps.

Also, if you're on 8.2, I think 8.3 might be faster.

            regards, tom lane

Re: gin index creation performance problems

From

Ivan Sergio Borgonovo

Date:

03 November 2008, 21:45:04

On Mon, 03 Nov 2008 11:04:45 -0500
Tom Lane <tgl@sss.pgh.pa.us> wrote:

> Ivan Sergio Borgonovo <mail@webthatworks.it> writes:
> > maintenance_work_mem is still untouched. What would be a good
> > value to start from?

> GIN index build time is *very* sensitive to maintenance_work_mem.
> Try cranking it up to a couple hundred megabytes and see if that
> helps.

200MB did improve the situation. I was waiting a clue about a
reasonable value from where to start.

But the performance is far from being stable.
It can take few minutes to more than 20min even with this setting.
I can't understand if it is CPU bound or RAM bound.
CPU load is always near 100% while postgresql is using 15% of RAM.
Still I've 52K of swap used...
But anyway the performance is very erratic.

BTW Is maintenance_work_mem set per connection?

While gin index looks appreciably faster (actually it is
lightening fast) for searches I'm considering to revert to gist since
even with 200MB maintenance_work_mem it still look a pain to build
up the index, especially considering the random time required to
build it up.

What puzzle me is that while before increasing maintenance_work
mem it was always terribly slow now there is a huge variation in
rebuilt time with 200MB.
Even
vacuum full;
can be pretty slow (>5min) and still 100% CPU use.

The index is dropped at the beginning and rebuilt at the end inside a
quite busy transaction.
Could it be related?

> Also, if you're on 8.2, I think 8.3 might be faster.

8.3 etch backport.

thanks

--
Ivan Sergio Borgonovo
http://www.webthatworks.it