RE: Proposal: More flexible backup/restore via pg_dump - Mailing list pgsql-hackers
From | Peter Mount |
---|---|
Subject | RE: Proposal: More flexible backup/restore via pg_dump |
Date | |
Msg-id | 1B3D5E532D18D311861A00600865478CF1AF9B@exchange1.nt.maidstone.gov.uk Whole thread Raw |
In response to | Proposal: More flexible backup/restore via pg_dump (Philip Warner <pjw@rhyme.com.au>) |
List | pgsql-hackers |
comments prefixed with PM... -- Peter Mount Enterprise Support Maidstone Borough Council Any views stated are my own, and not those of Maidstone Borough Council -----Original Message----- From: Philip Warner [mailto:pjw@rhyme.com.au] Sent: Tuesday, June 27, 2000 10:07 AM To: Giles Lean Cc: Zeugswetter Andreas SB; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Proposal: More flexible backup/restore via pg_dump At 07:00 27/06/00 +1000, Giles Lean wrote: > >Are you are also assuming that a backup fits in a single file, >i.e. that anyone with >2GB of backup has some sort of large file >support? That's up to the format used to save the database; in the case of the 'custom' format, yes. But that is the size after compression. This is not substantially different to pg_dump's behaviour, except that pg_dump can be piped to a tape drive... PM: So can most other Unix based formats. On the intranet server here, I pg_dump into /tmp, then include them in a tar piped to the tape drive. The objective of the API components are to (a) make it very easy to add new metadata to dump (eg. tablespaces), and (b) make it easy to add new output formats (eg. tar archives). Basically the metadata dumping side makes one call to register the thing to be saved, passing an optional function pointer to dump data (eg. table contents) - this *could* even be used to implement dumping of BLOBs. PM: The problem with blobs hasn't been with dumping them (I have some Java code that does it into a compressed zip file), but restoring them - you can't create a blob with a specific OID, so any references in existing tables will break. I currently get round it by updating the tables after the restore - but it's ugly and easy to break :( The 'archiver' format provider must have some basic IO routines: Read/WriteBuf and Read/WriteByte and has a number of hook functions which it can use to output the data. It needs to provide at least one function that actually writes data somewhere. It also has to provide the associated function to read the data. PM: Having a set of api's (either accessible directly into the backend, and/or via some fastpath call) would be useful indeed. > >As someone else answered: no. You can't portably assume random access >to tape blocks. This is probably an issue. One of the motivations for this utility it to allow partial restores (eg. table data for one table only), and arbitrarilly ordered restores. But I may have a solution: PM: That would be useful. I don't know about CPIO, but tar stores the TOC at the start of each file (so you can actually join two tar files together and still read all the files). In this way, you could put the table name as the "filename" in the header, so partial restores could be done. write the schema and TOC out at the start of the file/tape, then compressed data with headers for each indicating which TOC item they correspond to. This metadata can be loaded into /tmp, so fseek is possible. The actual data restoration (assuming constraints are not defined [THIS IS A PROBLEM]) can be done by scanning the rest of the tape in it's own order since RI will not be an issue. I think I'm happy with this. PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek through the tape, so there must be a way of doing it. [snip] >Using either tar or cpio format as defined for POSIX would allow a lot >of us to understand your on-tape format with a very low burden on you >for documentation. (If you do go this route you might want to think >about cpio format; it is less restrictive about filename length than >tar.) Tom Lane was also very favorably disposed to tar format. As I said above, the archive interfaces should be pretty amenable to adding tar support - it's just I'd like to get a version working with custom and directory based formats to ensure the flexibility is there. As I see it, the 'backup to directory' format should be easy to use as a basis for the 'backup to tar' code. PM: I don't see a problem there. The Java stuff I have use the standard java.util.zip package wich has a simple api for zip files. Tar or most other formats could be implemented in a similar compatible fashion. The problem I have with tar is that it does not support random access to the associated data. For reordering large backups, or (ultimately) single BLOB extraction, this is a performance problem. PM: Tar can do this sequentially, which I've had to do many times over the years - restoring just one file from a tape, sequential access is probably the only way. If you have a tar spec (or suitably licenced code), please mail it to me, and I'll be able to make more informed comments. PM: The tar spec should be around somewhere - just be careful, the obvious source I was thinking of would be GPL'd, and we don't want to be poluted :-) [snip] ---------------------------------------------------------------- Philip Warner | __---_____ Albatross Consulting Pty. Ltd. |----/ - \ (A.C.N. 008 659 498) | /(@) ______---_ Tel: (+61) 0500 83 82 81 | _________ \ Fax: (+61) 0500 83 82 82 | ___________ | Http://www.rhyme.com.au | / \| | --________-- PGP key available upon request, | / and from pgp5.ai.mit.edu:11371 |/
pgsql-hackers by date: