Re: Proposal: More flexible backup/restore via pg_dump - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Proposal: More flexible backup/restore via pg_dump |
Date | |
Msg-id | 200010110049.UAA18367@candle.pha.pa.us Whole thread Raw |
In response to | Re: Proposal: More flexible backup/restore via pg_dump (Philip Warner <pjw@rhyme.com.au>) |
Responses |
Re: Proposal: More flexible backup/restore via
pg_dump
|
List | pgsql-hackers |
Can I ask on a status? > At 10:17 26/06/00 +0200, Zeugswetter Andreas SB wrote: > > > >A problem I see with an index at file end is, that you will need to read the > >file twice, and that may be very undesireable if e.g the backup is on tape > >or a compressed file. > > The proposal has actually come a fairly long way after extensive > discussions with Tom Lane, and I have added the current plans at the end of > this message. The TOC-at-end problem is an issue that I am trying to deal > with; I am planning a 'custom' format that has the large parts (data dumps) > compressed, to avoid the need of compressing the entire file. This means > that you would not need to uncompress the entire file to get to the TOC, or > to restore just the schema. It also allows good random access to defns and > data. I'm also considering putting the dumped data at the end of the file, > but this has issues when you want to restore table data before defining > indexes, for example. > > I must admit that I've been working on the assumption that people using > PostgreSQL don't have multi-GB (compressed) database dumps, so that (in > theory) a restore can be loaded onto disk from tape before being used. I > know this is pretty evil, but it will cover 95% of users. For those people > with huge backups, they will have to suffer tapes that go backward and > forwards a bit. From the details below, you will see that this is unavoidable. > > Sanity Check: does fseek work on tapes? If not, what is the correct way to > read a particular block/byte from a file on a tape? > > ----------------------------------------------------------- > Updated Proposal: > ------------------------- > > For the sake of argument, call the new utilities pg_backup and pg_restore. > > pg_backup > --------- > > Dump schema [and data] in OID order (to try to make restores sequential, > for when tar/tape storage is used). Each dumped item has a TOC entry which > includes the OID and description, and for those items for which we know > some dependencies (functions for types & aggregates; types for tables; > superclasses for classes; - any more?), it will also dump the dependency OIDs. > > Each object (table defn, table data, function defn, type defn etc) is > dumped to a separate file/thing in the output file. The TOC entries go into > a separate file/thing (probably only one file/thing for the whole TOC). > > The output scheme will be encapsulated, and in the initial version will be > a custom format (since I can't see an API for tar files), and a > dump-to-a-directory format. Future use of tar, DB, PostgreSQL or even a > Make file should not be excluded in the IO design. This last goal *may* not > be achieved, but I don't see why it can't be at this stage. Hopefully > someone with appropriate skills & motivation can do a tar archive 8-}. > > The result of a pg_backup should be a single file with metadata and > optional data, along with whatever dependency and extra data is available > pg_backup, or provided by the DBA. > > > pg_restore > ---------- > > Reads a backup file and dumps SQL suitable for sending to psql. > > Options will include: > > - No Data (--no-data? -nd? -s?) > - No metadata (--no-schema? -ns? -d?) > - Specification of items to dump from an input file; this allows custom > ordering AND custom selection of multiple items. Basically, I will allow > the user to dump part of the TOC, edit it, and tell pg_restore to use the > edited partial TOC. (--item-list=<file>? -l=<file>?) > - Dump TOC (--toc-only? -c?) > > [Wish List] > - Data For a single table (--table=<name>? -t=<name>) > - Defn/Data for a single OID; (--oid=<oid>? -o=<oid>?) > - User definied dependencies. Allow the DB developer to specify once for > thier DB what the dependencies are, then use that files as a guide to the > restore process. (--deps=<file> -D=<file>) > > pg_restore will use the same custom IO routines to allow IO to > tar/directory/custom files. In the first pass, I will do custom file IO. > > If a user selects to restore the entire metadata, then it will be dumped > according to the defaul policy (OID order). If they select to specify the > items from an input file, then the file ordering is used. > > > ------- > > Typical backup procedure: > > pg_backup mydb mydb.bkp > > or *maybe* > > pg_backup mydb > mydb.bkp > > BUT AFAIK, fseek does not work on STDOUT, and at the current time pg_backup > will use fseek. > > > Typical restore procedure: > > pg_restore mydb mydb.bkp | psql > > A user will be able to extract only the schema (-s), only the data (-d), a > specific table (-t=name), or even edit the object order and selection via: > > pg_restore --dump-toc mydb.bkp > mytoc.txt > vi mytoc.txt {ie. reorder TOC elements as per known dependency problems} > pg_restore --item-list=mytoc.txt mydb.bkp | psql > > FWIW, I envisage the ---dump-toc output to look like: > > ID; FUNCTION FRED(INT4) > ID; TYPE MY_TYPE > ID; TABLE MY_TABLE > ID; DATA MY_TABLE > ID; INDEX MY_TABLE_IX1 > ...etc. > > so editing and reordering the dump plan should not be too onerous. > > > ---------------------------------------------------------------- > Philip Warner | __---_____ > Albatross Consulting Pty. Ltd. |----/ - \ > (A.C.N. 008 659 498) | /(@) ______---_ > Tel: (+61) 0500 83 82 81 | _________ \ > Fax: (+61) 0500 83 82 82 | ___________ | > Http://www.rhyme.com.au | / \| > | --________-- > PGP key available upon request, | / > and from pgp5.ai.mit.edu:11371 |/ > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
pgsql-hackers by date: