Home > mailing lists

Re: Backup/dump of huge tables and performance - Mailing list pgsql-general

From	Philip Warner
Subject	Re: Backup/dump of huge tables and performance
Date	July 28, 2000 02:56:18
Msg-id	3.0.5.32.20000728164715.026dc640@mail.rhyme.com.au Whole thread Raw
In response to	Backup/dump of huge tables and performance (brianb-pggeneral@edsamail.com)
Responses	Re: Backup/dump of huge tables and performance
List	pgsql-general

Tree view

At 02:26 28/07/00 GMT, brianb-pggeneral@edsamail.com wrote:
>
>1. pg_dump or COPY take up a lot of I/O resources. That's not surprising
>considering the size of the tables, but the impact on the overall
>production system's performance is not acceptable.

Firstly, you are not using '--inserts', I hope. It is very slow for both
backup & restore. Also, do you know if pg_dump is the IO bottleneck, or the
backend?

Other than that, I'm not sure what can be done about it - the I/O has to be
done some time. Possibly causing pg_dump to (optionally) pause between
records, but that seems like a bad idea, especially with 10s of millions of
records.

Maybe someone who knows more about backend processing can suggest if using
a different kind of retrieval in the COPY command would help.

>2. I don't know how to estimate the size of the resulting dump files from
>the database files.

Not very easy, unless you have very similar data in each tuple...and in the
future pg_dump will support compression, so the size will be even harder to
estimate.

>I would very much prefer to have the backup files in little 10-20MB chunks,
>rather than one humongous dumpfile.

Maybe: pg_dump | split --bytes=10m

> I also want to be able to run the
>backup without shutting down the service that uses the database.

AFAIK, you don't need to shut it down, or are you referring to the
performance problems?

>As noted above, this particular table is no longer being used in
>transactions. I figure I could write a C program to declare a cursor and
>write it out in n-MB or m-thousand row chunks, with rest pauses in between.

You could always do a file-based backup of the database, and restore it
somewhere else, and drop the tables you don't need. Not very elegant, I
realize.

>Any better ideas? Would this be a good feature to incorporate into future
>versions of pg_dump?

I'm not sure what should be changed in pg_dump; delaying between records
seems like a bad idea since it does the dump in a single TX, and besides,
sleeping while a TX is open seems evil to me.

I think making multiple files can be done by 'split', so the real issue is
where the IO problem comes from, and how to reduce it. If pg_dump is the
source of the I/O, then I can try to address it, but if the COPY command is
the problem, that needs to be done by someone else...

----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|
                                 |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/

pgsql-general by date:

From: "anuj"
Date: 28 July 2000, 02:40:30
Subject: 'make' on Linux

From: Karl Trygve Kalleberg
Date: 28 July 2000, 05:54:53
Subject: Async unidirectional replication

Re: Backup/dump of huge tables and performance - Mailing list pgsql-general

Previous

Next