Re: Proposals for PITR - Mailing list pgsql-hackers-pitr
From | Tom Lane |
---|---|
Subject | Re: Proposals for PITR |
Date | |
Msg-id | 9293.1076868784@sss.pgh.pa.us Whole thread Raw |
In response to | Proposals for PITR ("Simon Riggs" <simon@2ndquadrant.com>) |
Responses |
Re: Proposals for PITR
Archival API |
List | pgsql-hackers-pitr |
"Simon Riggs" <simon@2ndquadrant.com> writes: > The general PITR recovery scenario requires: > A - Take a full database backup > B - Take regular log file backups > C - Restore a full backup of a database. Note that pg_dump has absolutely zero to do with this. The base for a PITR recovery scenario has to be a physical dump, not a logical dump; else WAL replay won't work. The conclusion that we came to in previous discussions was that the physical dump can be taken simply by doing "tar" (or your favorite alternative tool) on the $PGDATA directory tree --- you could optionally exclude the pg_xlog directory but must include all else. It is okay to do this while the database is live, so long as the series of log files you will use to recover starts from a checkpoint that occurred before you began the tar run. When you do this the contents of the tar archive will be an inconsistent snapshot of the database and therefore not directly useful. However, if you load the tar archive and then replay WAL from a prior checkpoint to any time after completion of the tar run, you will have brought the database back to consistency. Part A of the problem therefore only requires tar plus enough management software to keep track of the start and stop times of the tar run and correlate the start time to the WAL log series. It is worth pointing out though that you need to start up your Part B solution (archiving off WAL files as they come free) *before* you start taking your base dump. (When we get around to implementing tablespaces, things might get more complicated, since you'd need to remember to archive tablespace trees that might not live under $PGDATA. But we can ignore that refinement for now, I think.) > We should assume that the system onto which the restore takes place is > completely different from the system on which the backup was taken. It can't be too different, since you will need a binary-compatible server; you won't be able to move across machine architectures this way. > - Add entries to WAL files so it contains enough information for > recovery after a full backup > This is assumed to be complete. [I'm not sure where the role of pg_clog > is in all of this, so this may be a broken assumption.] Comments? I believe that the major problems (lack of recording of file creation/ deletion) were solved by the J.R. Nield patches I applied last week. There may be some minor issues left to fix but I can't think of any showstoppers. > - Write application to archive WAL files to tape, disk, or network > Probably need to do first part, but I'm arguing not to do the copy to > tape.. I'd like to somehow see this handled by a user-supplied program or script. What we mainly need is to define a good API that lets the archiver program understand which WAL segment files to archive when. > B - Backing up WAL log files > -Ordinarily, when old log segment files are no longer needed, they are > recycled (renamed to become the next segments in the numbered sequence). > This means that the data within them must be copied from there to > another location > AFTER postgres has closed that file > BEFORE it is renamed and recycled My inclination would be to change the backend code so that as soon as a WAL segment is completed, it is flagged as being ready to dump to tape (or wherever). Possibly the easiest way to do this is to rename the segment file somehow, perhaps "nnn" becomes "nnn.full". Then, after the archiver process has properly dumped the file, reflag it as being dumped (perhaps rename to "nnn.done"). Obviously there are any number of ways we could do this flagging, and depending on an OS rename facility might not be the best. A segment then can be recycled when it is both (a) older than the latest checkpoint and (b) flagged as dumped. Note that this approach allows dumping of a file to start before the first time at which it could be recycled. In the event of a crash and restart, WAL replay has to be able to find the flagged segments, so the flagging mechanism can't be one that would make this impossible. > Think about > -what will happen if postgres tries to reuse file while we are still > copying. This is a non-problem; segments that haven't been recycled can't become reuse targets. > -what will happen if copy fails? This is the archiver's problem to deal with. It only gets to be a serious problem when you run out of disk space for WAL segment files, so in most scenarios there is time for manual intervention to fix any such problem and restart the archiver. > -Manual indicates that current WAL format is bulky and would require > some compressed format to be implemented. Initially, I suggest ignoring > this and simply relying of OS or hardware/tape compression methods. This is definitely something we could leave for later. > With full OS file backup, if the database is shutdown correctly, then we > will need a way to tell the database "you think you're up to date, but > you're not - I've added some more WAL files into the directories, so > roll forward on those now please". I do not think this is an issue either, because my vision of this does not include tar backups of shutdown databases. What will be backed up is a live database, therefore the postmaster will definitely know that it needs to perform WAL replay. What we will need is hooks to make sure that the full set of required log files is available. It's entirely possible that that set of log files exceeds available disk space, so it needs to be possible to run WAL replay incrementally, loading and then replaying additional log segments after deleting old ones. Possibly we could do this with some postmaster command-line switches. J. R. Nield's patch embodied an "interactive recovery" backend mode, which I didn't like in detail but the general idea is not necessarily wrong. regards, tom lane
pgsql-hackers-pitr by date: