Incremental backups, and backup history - Mailing list pgsql-general
From | Antonios Christofides |
---|---|
Subject | Incremental backups, and backup history |
Date | |
Msg-id | 20030619084228.GA20482@itia.ntua.gr Whole thread Raw |
Responses |
Re: Incremental backups, and backup history
Re: Incremental backups, and backup history Re: Incremental backups, and backup history Re: Incremental backups, and backup history Re: Incremental backups, and backup history |
List | pgsql-general |
Hi, I have two backup questions, not much related to each other; here they are. First: With PostgreSQL, I can't do incremental backups. pg_dump will dump the entire database. Thus, if I want to keep daily backups on tape, I'm in trouble because I'll have to do a full backup every day, which may need several hours and several tapes. One workaround I'm thinking is to not store BLOBs in the database, but store them in the filesystem and store the filenames in the database instead. This needs some additional work (for example, a garbage collector to periodically delete unreferenced files), but will move a large amount of space from the database into the filesystem, which is capable of incremental backups. Only BLOBs, that is; for some tables that will have several tens of millions of small rows, I can't think of any workaround. Is this filenames-instead-of-BLOBs for easier backup common practice? Any other ideas or comments? My second question is a general relational database backup question, not specifically related to pgsql. Sometimes a user accidentally deletes/corrupts a file, and discovers it three days later. After they come panicing to me, I can give them their file as it was three days ago, because of the backup tape rotation. Now suppose a user deletes ten employees from the database, and three days later they understand that this was a bad thing. Now what? I can restore the entire database and make it as it was three days ago, but I can't restore the particular deleted records in the current database, as the relations make the database behave as a single unit. A colleague suggested, instead of updating or deleting rows, to only insert rows with a timestamp; for example, instead of updating the employee row with id=55, you insert a new row with id=55 with the updated data, and change the primary key to (id, dateinserted). You then always select the row with max dateinserted. A garbage collector is also needed to periodically delete obsolete rows older than, say, six months. Improvements can be made (such as using dateobsoleted instead of dateinserted or moving the old rows to another table), but even in the simplest cases I think it will be extremely hard to implement such a system, again because of the relations. So, it is a matter of database design? Do I have to design the database so that it keeps the history of what happened? Thanks everyone for the answers.
pgsql-general by date: