Backups using non-consistent snapshots (e.g. COW reflinks) - Mailing list pgsql-admin
From | Wilson, Chris |
---|---|
Subject | Backups using non-consistent snapshots (e.g. COW reflinks) |
Date | |
Msg-id | AS8PR09MB52534CD9C738ECB3ADC5DBBE9390A@AS8PR09MB5253.eurprd09.prod.outlook.com Whole thread Raw |
Responses |
Re: Backups using non-consistent snapshots (e.g. COW reflinks)
|
List | pgsql-admin |
Dear fellow PG admins,
Every day we take a basebackup of our main database, which is about 1.5 TB. We store the backup on the same volume, so that it’s easily available if we need to do an emergency recovery (e.g. we can flip the directories and do a point-in-time restore), and we also tar, compress and encrypt it for archival (and then restore that backup to a testing environment to make sure that our backups actually work).
The database volume is on a networked block device (similar to iSCSI) which deduplicates the data. The I/O from taking this basebackup and rewriting the local backup directory is quite heavy on network, storage device CPU and disks, and our IT have asked us to reduce it. While we could do a basebackup in tar format and stream it through compression and encryption, we’d lose the ability to restore it quickly. Snapshots on the block device are an option, but they’re controlled by IT, and we don’t have direct access to them or the ability to automatically create or mount new snapshots.
Another option, since the database volume uses XFS, is taking a reflink copy (COW) onto the same filesystem, which is nearly instantaneous (rm -rf $backup && cp -a –reflink $PGDATA $backup) and avoids actually copying any data. This is however not a consistent snapshot. The manual says:
“the physical backup doesn't have to be an instantaneous snapshot of the database state — if it is made over some period of time, then replaying the WAL for that period will fix any internal inconsistencies.”
However it’s not clear to me exactly what the requirements for this are – especially how Postgres knows where to start replaying the WAL from. (That may not be the only risk of taking backups this way, but it’s the only one that I can think of apart from bugs in the filesystem and cp. I am assuming that all required WAL segments will be available as we keep them in the archive for much longer than the checkpoint interval).
The manual also says:
“After a checkpoint has been made and the WAL flushed, the checkpoint's position is saved in the file pg_control. Therefore, at the start of recovery, the server first reads pg_control and then the checkpoint record; then it performs the REDO operation by scanning forward from the WAL location indicated in the checkpoint record.” (emphasis mine)
So if I save a copy of pg_control first (before the backup starts) and then place it inside the backup directory at the end, that might be enough to ensure that recovery cover the entire period while the backup was running. But what exactly is this “checkpoint record” – is it in the WAL? And if so, what is the location that it indicates? Is the record written at the end of the checkpoint, and it points back to the WAL location at the beginning of the checkpoint?
Or is it enough to run pg_start_backup before taking the copy? Will this prevent any additional checkpoints from being taken (once the function returns) until pg_stop_backup is called? Are there any other risks that I’m missing?
Thanks in advance for your help and advice. I will try to submit updates to the manual with what I learn.
Thanks, Chris.
pgsql-admin by date: