Thread: Database growing. Need autovacuum help.

Database growing. Need autovacuum help.

From

Henrik

Date:

03 June 2008, 09:05:01

Hi List,

We are running a couple of 8.3.1 servers and the are growing a lot.

I have the standard autovacuum settings from the 8.3.1 installation
and we are inserting about 2-3 million rows every night and cleaning
out just as many every day.

The database size rose to 80GB but after a dump/restore its only 16GB
which shows that there where nearly 65GB bloat in the database.

#autovacuum_max_workers = 3
#autovacuum_naptime = 1min
#autovacuum_vacuum_threshold = 50
#autovacuum_analyze_threshold = 50
#autovacuum_vacuum_scale_factor = 0.2
#autovacuum_analyze_scale_factor = 0.1
#autovacuum_freeze_max_age = 200000000
#autovacuum_vacuum_cost_delay = 20
#autovacuum_vacuum_cost_limit = -1


And also

max_fsm_pages = 300000
#max_fsm_relations = 1000

Any pointers would be greatly appreciated.

Cheers,
Henke

Re: Database growing. Need autovacuum help.

From

Bill Moran

Date:

03 June 2008, 10:24:00

In response to Henrik <henke@mac.se>:
>
> We are running a couple of 8.3.1 servers and the are growing a lot.
>
> I have the standard autovacuum settings from the 8.3.1 installation
> and we are inserting about 2-3 million rows every night and cleaning
> out just as many every day.

Is this a batch job?  If so, autovac might not be your best friend
here.  There _are_ still some cases where autovac isn't the best
choice.  If you're doing a big batch job that deletes or updates a
bunch of rows, you'll probably be better off making a manual vacuum
the last step of that batch job.  Remember that you can vacuum
individual tables.

> The database size rose to 80GB but after a dump/restore its only 16GB
> which shows that there where nearly 65GB bloat in the database.

Does it keep growing beyond 80G?  While 65G may seem like a lot of bloat,
it may be what your workload needs as working space.  I mean, you _are_
talking about shifting around 2-3 million rows/day.

Crank up the logging.  I believe the autovac on 8.3 can be configured
to log exactly what tables it operates on ... which should help you
determine if it's not configured aggressively enough.

Some other things to do to improve your situation are to isolate exactly
_what_ is bloating.  Use pg_relation_size() to get a list of the sizes
of all DB objects right after a dump/restore and when the database is
bloated and compare to see what's bloating the most.  Don't forget to
check indexes as well.  If rebuilding a single index nightly will take
care of your bloat, that's not bad.  Unusual, but it does happen under
some workloads ... you might be able to adjust the index fill factor
to improve things as well.

If it's just a single table that's bloating, a VACUUM FULL or CLUSTER
of that table alone on a regular schedule might take care of things.
If your data is of a FIFO nature, you could benefit from the old trick
of having two tables and switching between them on a schedule in order
to truncate the one with stale data in it.

Hope some of these ideas help.

--
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

wmoran@collaborativefusion.com
Phone: 412-422-3463x4023

Re: Database growing. Need autovacuum help.

From

Henrik

Date:

03 June 2008, 10:41:37

3 jun 2008 kl. 15.23 skrev Bill Moran:

> In response to Henrik <henke@mac.se>:
>>
>> We are running a couple of 8.3.1 servers and the are growing a lot.
>>
>> I have the standard autovacuum settings from the 8.3.1 installation
>> and we are inserting about 2-3 million rows every night and cleaning
>> out just as many every day.
>
> Is this a batch job?  If so, autovac might not be your best friend
> here.  There _are_ still some cases where autovac isn't the best
> choice.  If you're doing a big batch job that deletes or updates a
> bunch of rows, you'll probably be better off making a manual vacuum
> the last step of that batch job.  Remember that you can vacuum
> individual tables.

Well, sort of. We have different jobs that usually runs at night
filling the database with document information. After that is dont we
have maintenance jobs that clean out old versions of those documents.
Maybe autovacuum is not for us on at least this table. I know that it
is an specific table that has most bloat.

>
>
>> The database size rose to 80GB but after a dump/restore its only 16GB
>> which shows that there where nearly 65GB bloat in the database.
>
> Does it keep growing beyond 80G?  While 65G may seem like a lot of
> bloat,
> it may be what your workload needs as working space.  I mean, you
> _are_
> talking about shifting around 2-3 million rows/day.
>
> Crank up the logging.  I believe the autovac on 8.3 can be configured
> to log exactly what tables it operates on ... which should help you
> determine if it's not configured aggressively enough.

I will do that. But I already which table is the bad boy in this
case. :)
>
> If it's just a single table that's bloating, a VACUUM FULL or CLUSTER
> of that table alone on a regular schedule might take care of things.
> If your data is of a FIFO nature, you could benefit from the old trick
> of having two tables and switching between them on a schedule in order
> to truncate the one with stale data in it.

It is somewhat FIFO but I can't guarantee it...

I will look at CLUSTER and see.

Maybe de design is flawed :) To put it simple we have a document
storing system and the 3 major table is tbl_folder, tbl_file and the
many-to-many table tbl_file_folder.

In the tbl_file we only have unique documents.
But a file can be stored in many folders and a folder can have many
files so we have the tbl_file_folder with fk_file_id and fk_folder_id.

To be able to handle versions we always insert new folders even though
nothing has changed but it seemd like the best way to do it.

E.g

First run:
    tbl_file 500k new files.
    tbl_folder 50k new rows.
    tbl_file_folder 550k new rows.

Second run with no new files.
    tbl_file unchanged.
    tbl_folder 50k new rows
    tbl_file_folder 550k new rows.


The beauty with this is that it is very effective to retrieve the
exact file/folder structure at a given point in time but the drawback
is that it is a lot of overhead in the database.

Maybe someone has some kool new idea about this. :)


Thanks Bill!

Cheers,
henke






>
>
> Hope some of these ideas help.
>
> --
> Bill Moran
> Collaborative Fusion Inc.
> http://people.collaborativefusion.com/~wmoran/
>
> wmoran@collaborativefusion.com
> Phone: 412-422-3463x4023
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Re: Database growing. Need autovacuum help.

From

"Scott Marlowe"

Date:

03 June 2008, 11:07:11

On Tue, Jun 3, 2008 at 7:41 AM, Henrik <henke@mac.se> wrote:
>
> To be able to handle versions we always insert new folders even though
> nothing has changed but it seemd like the best way to do it.
>
> E.g
>
> First run:
>        tbl_file 500k new files.
>        tbl_folder 50k new rows.
>        tbl_file_folder 550k new rows.
>
> Second run with no new files.
>        tbl_file unchanged.
>        tbl_folder 50k new rows
>        tbl_file_folder 550k new rows.

On useful trick is to include a where clause that prevents the extra updates.

I.e. update table set field=123 where field <> 123;

Re: Database growing. Need autovacuum help.

From

Henrik

Date:

05 June 2008, 05:36:13

3 jun 2008 kl. 16.06 skrev Scott Marlowe:

> On Tue, Jun 3, 2008 at 7:41 AM, Henrik <henke@mac.se> wrote:
>>
>> To be able to handle versions we always insert new folders even
>> though
>> nothing has changed but it seemd like the best way to do it.
>>
>> E.g
>>
>> First run:
>>       tbl_file 500k new files.
>>       tbl_folder 50k new rows.
>>       tbl_file_folder 550k new rows.
>>
>> Second run with no new files.
>>       tbl_file unchanged.
>>       tbl_folder 50k new rows
>>       tbl_file_folder 550k new rows.
>
> On useful trick is to include a where clause that prevents the extra
> updates.
>
> I.e. update table set field=123 where field <> 123;


Mmm I keep that in mind. The problem is that on these tables I only do
INSERTS and DELETES. :)

Maybe I can redesign it but I can find a good way...yet...

Thanks!