Re: Backup/dump of huge tables and performance - Mailing list pgsql-general
From | Philip Warner |
---|---|
Subject | Re: Backup/dump of huge tables and performance |
Date | |
Msg-id | 3.0.5.32.20000728164715.026dc640@mail.rhyme.com.au Whole thread Raw |
In response to | Backup/dump of huge tables and performance (brianb-pggeneral@edsamail.com) |
Responses |
Re: Backup/dump of huge tables and performance
|
List | pgsql-general |
At 02:26 28/07/00 GMT, brianb-pggeneral@edsamail.com wrote: > >1. pg_dump or COPY take up a lot of I/O resources. That's not surprising >considering the size of the tables, but the impact on the overall >production system's performance is not acceptable. Firstly, you are not using '--inserts', I hope. It is very slow for both backup & restore. Also, do you know if pg_dump is the IO bottleneck, or the backend? Other than that, I'm not sure what can be done about it - the I/O has to be done some time. Possibly causing pg_dump to (optionally) pause between records, but that seems like a bad idea, especially with 10s of millions of records. Maybe someone who knows more about backend processing can suggest if using a different kind of retrieval in the COPY command would help. >2. I don't know how to estimate the size of the resulting dump files from >the database files. Not very easy, unless you have very similar data in each tuple...and in the future pg_dump will support compression, so the size will be even harder to estimate. >I would very much prefer to have the backup files in little 10-20MB chunks, >rather than one humongous dumpfile. Maybe: pg_dump | split --bytes=10m > I also want to be able to run the >backup without shutting down the service that uses the database. AFAIK, you don't need to shut it down, or are you referring to the performance problems? >As noted above, this particular table is no longer being used in >transactions. I figure I could write a C program to declare a cursor and >write it out in n-MB or m-thousand row chunks, with rest pauses in between. You could always do a file-based backup of the database, and restore it somewhere else, and drop the tables you don't need. Not very elegant, I realize. >Any better ideas? Would this be a good feature to incorporate into future >versions of pg_dump? I'm not sure what should be changed in pg_dump; delaying between records seems like a bad idea since it does the dump in a single TX, and besides, sleeping while a TX is open seems evil to me. I think making multiple files can be done by 'split', so the real issue is where the IO problem comes from, and how to reduce it. If pg_dump is the source of the I/O, then I can try to address it, but if the COPY command is the problem, that needs to be done by someone else... ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
pgsql-general by date: