Re: Issues Outstanding for Point In Time Recovery (PITR) - Mailing list pgsql-hackers
From | Richard Tucker |
---|---|
Subject | Re: Issues Outstanding for Point In Time Recovery (PITR) |
Date | |
Msg-id | EKEKLEKKLDAEEKDBDMMAEEOBCCAA.richt@multera.com Whole thread Raw |
In response to | Re: Issues Outstanding for Point In Time Recovery (PITR) ("J. R. Nield" <jrnield@usol.com>) |
Responses |
Re: Issues Outstanding for Point In Time Recovery (PITR)
Re: Issues Outstanding for Point In Time Recovery (PITR) |
List | pgsql-hackers |
We also have implemented a roll forward recovery mechanism. We modified a 7.2.1 version of Postgres. The mechanism is designed to provide a means of recoverying from the loss or corruption of media. It provides for duplicating wal_files so that if a wal_file is lost roll forward recovery can recover the database using the duplicated wal_files. Hooks were also added so that the roll forward recovery mechanism can be used to implement a hot standby database. Along with the roll forward recovery mechanism we have also implemented an online database backup utility which is synchronized with the recovery log so that the backup can be a starting point for a roll forward recovery session. Roll forward recovery is enabled for a database cluster by specifying one of two postmaster configuration parameters. wal_file_reuse = false This parameter tells the wal system to not reuse wal files. This option is intended for sites wishing to implement a hot standby database where wal files will be periodically copied to another machine where they will be rolled forward into the standby database. wal_file_duplicate = <directory_path> This parameter tells the system to mirror the files in the pg_xlog directory to the specified directory. This allows for the recovery of a database where a wal file has been damaged or lost. It also allows for a variant of a hot standby database where the duplicate directory is the pg_xlog directory of the standby database. Since both of these options cause wal files to accumulate indefinately the dba needs a means of purging wal files when they are no longer needed. So an sql command, "ALTER SYSTEM PURGE WAL_FILES <wal_file_name>", has also been implemented. This command deletes all wal files up to and including the specified <wal_file_name> as long as those wal files are not needed to recover the database in the event of a system crash. To find out the status of the wal files a function has been implemented to return a wal file name. The function is: Wal_file( <request_tye>) Request_type := [ current | last | checkpoint | oldest] Wal_file ('current') returns the name of the log file currently being written to. Wal_file('last') returns the name of the last log file filled. Wal_file('checkpoint') returns the name of the file containing the current redo position. The current redo position is the position in the recovery log where crash recovery would start if the system were to crash now. All logs prior to this one will not be needed to recover the database cluster and could be safely removed. Wal_file('oldest') returns the oldest xlog file found in the pg_xlog directory. To actually perform a roll forward you use the postmaster configuration parameter "roll_forward=yes". This parameter tells the startup process to perform crash recovery even though the state of the database as found in the pg_control file indicates a normal shutdown. This is necessary since the starting point of roll forward session could be the restore of a database cluster that was shutdown in order to back it up. Furthermore this parameter tells the startup process not to write out a checkpoint record at the end of the roll forward session. This allows for the database cluster to receive subsequent wal files and to have those rolled forward as well. When starting the postmaster with the roll_forward=yes option, it shuts down the database as soon as the startup process completes. So the idea is to restore a backup, copy all of your saved/duplicated wal files into the pg_xlog directory of the restored database and start the postmaster with the roll_forward option. For point in time recovery there is also a roll_forward_until = <time> which rolls forward through the wal files until the first transaction commit note that is greater than or equal to the specified time. The pg_copy utility performs an on line copy of a database cluster. Its syntax is: pg_copy <backup_directory> [-h host] [-p port] ... This makes a copy of the database where backup_directory is what you would set PGDATA to in order start a postmaster against the backup copy. The database can be being updated while the copy occurs. If you start a postmaster against this copy it will appear to the startup process as a database that crashed at the instant the pg_copy operation completed. Futhermore the pg_copy utility automatically removes any wal files not needed to recover the database from either pg_xlog directory or the wal_file_duplicate directory. So a DBA to protect the database from media loss just needs to set the wal_file_duplicate paramater and periodically pg_copy the database. The BIG THING we have not done is address the issue that add/drop tables and indexes do not propagate through the roll forward recovery mechanism properly. -regards Richard Tucker
pgsql-hackers by date: