Thread: pg_basebackup and pg_receivewal timing, missing WAL files
Hello all,
I'm attempting to create a long term, stand alone backup archive script for a pretty active and large database (3TB) which will store backups going back a few months.
Using the combination of pg_basebackup in TAR format, with the `Xn` flag, and running pg_receivewal in the background, I'm ending up with a backup that is missing quite a few WAL files.
Trying to figure out where my understanding of the process is incorrect, and what I need to do to correct this.
Backup process:
* Start pg_receivewal to archive location* Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn --checkpoint=fast | ...`
* This backup is taken from a standby node
* Once pg_basebackup finishes, send SIGINT to pg_receivewal
Recovery Process:
* Untar basebackup to data directory* Copy archived WAL files to a separate directory on the same drive* settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'recovery_target = immediate
Once I start the DB, I immediately get
cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No such file or directory
WAL FILES:
backup_label file:0000001200003CF80000003F << First WAL file the DB looks for0000001200003CF700000075 << The next WAL file---------------------------------------- << Missing WAL files from pg_receivewal archive...0000001200003CF8000000C8 << First WAL file in pg_receivewal archive
START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075)CHECKPOINT LOCATION: 3CF8/3F1139C0BACKUP METHOD: streamedBACKUP FROM: standbySTART TIME: 2022-06-08 12:51:52 EDTLABEL: pg_basebackup base backupSTART TIMELINE: 18
I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the basebackup starts to recover from, but how come these are not in my pg_receivewal archive, if I'm starting it before starting the pg_basebackup process? How can I ensure they are part of this archive?
Since this is a recent backup, I have a separate WAL archive from where I can just recover those using a different restore_command, and once that missing gap of WAL files is recovered from this archive, stopping the server
and switching back to `restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'` and recovering the rest of the WAL files from the pg_receivewal archive works fine. The DB recovers and accepts connections shortly after.
The whole point of this is to have a several months old, stand alone backup, I need to have all the WAL files available in that original WAL archive created with pg_receivewal.
Appreciate the help, thank you in advance!
Greetings, * Tim (timfosho@gmail.com) wrote: > I'm attempting to create a long term, stand alone backup archive script for > a pretty active and large database (3TB) which will store backups going > back a few months. > Using the combination of pg_basebackup in TAR format, with the `Xn` flag, > and running pg_receivewal in the background, I'm ending up with a backup > that is missing quite a few WAL files. > > Trying to figure out where my understanding of the process is incorrect, > and what I need to do to correct this. > > *Backup *process: > > * Start pg_receivewal to archive location > * Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn > --checkpoint=fast | ...` What's wrong with using -Xs instead..? Basically does exactly what you seem to be trying to do here. > * This backup is taken from a *standby node* Where is the pg_receivewal run? Against the primary or the replica? > * Once pg_basebackup finishes, send SIGINT to pg_receivewal > > *Recovery* Process: > > * Untar basebackup to data directory > * Copy archived WAL files to a separate directory on the same drive > * settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p' > recovery_target = immediate > > > Once I start the DB, I immediately get > > cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No > such file or directory > > > > *WAL FILES: * > > 0000001200003CF80000003F << First WAL file the DB looks for > 0000001200003CF700000075 << The next WAL file > ---------------------------------------- << Missing WAL files from > pg_receivewal archive > ... > 0000001200003CF8000000C8 << First WAL file in pg_receivewal archive > > *backup_label *file: > > START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075) > CHECKPOINT LOCATION: 3CF8/3F1139C0 > BACKUP METHOD: streamed > BACKUP FROM: standby > START TIME: 2022-06-08 12:51:52 EDT > LABEL: pg_basebackup base backup > START TIMELINE: 18 > > > I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the > basebackup starts to recover from, but how come these are not in my > pg_receivewal archive, if I'm starting it before starting the pg_basebackup > process? How can I ensure they are part of this archive? Well, using -Xs should do that. > Since this is a recent backup, I have a separate WAL archive from where I > can just recover those using a different restore_command, and once that > missing gap of WAL files is recovered from this archive, stopping the > server > and switching back to `restore_command = 'cp /var/lib/pgsql/wal_archive/%f > %p'` and recovering the rest of the WAL files from the pg_receivewal > archive works fine. The DB recovers and accepts connections shortly after. > > The whole point of this is to have a several months old, stand alone > backup, I need to have all the WAL files available in that original WAL > archive created with pg_receivewal. The use-case generally makes sense, in pgbackrest we have --archive-copy for more-or-less the same kind of thing, though you can now just use a separate newly created repo that you back up the standalone backup to (when multiple repos are configured, archive-push will archive to all of them) and then archive that copy. The latter is what we'd generally recommend these days, but with archive-copy and a full backup, you can just grab the full backup directory from inside the repo and it'll have everything. Thanks, Stephen
Attachment
We currently use pgbackrest as our primary backup tool actually, and it works great! The issue I ran into with trying to set it up that way with pgbackrest, is when configuring the second repo, pgbackrest will have to archive all WAL files to it, going back to the oldest backup. And we have this (admittedly arbitrary) legal policy requiring a single instance of a 6 month old backup for auditing purposes. So pgbackrest has to keep WALs that far back, the main repo only needs 4 weeks of WALs for PITR, its a bit overkill archiving WALs twice and also storing them an extra 5 months. So I opted to write a custom backup script.
Though I was not aware of the `--archive-copy` option, I wonder if I can just turn that option on for the main repo, and then extract the particular backup I need to another location. We use Azure Blobs for backups.
Unfortunately `-Xs` cannot be used with pg_basebackup in TAR format, I'm using the AzCopy utility, by piping pg_basebackup into it, to stream the basebackup directly into Azure Blob storage, without storing it anywhere first (Which pgbackrest does natively as well!). So there is no other way without using the TAR format option.
Nevertheless, it seems like I need the contents of the pg_wal directory at the start of pg_basebackup, since pg_receivewal only streams WAL generated after it starts, the restored DB is missing all the files contained within, I've adjusted my script to copy them as well and testing currently.
On Thu, Jun 9, 2022 at 2:36 PM Stephen Frost <sfrost@snowman.net> wrote:
Greetings,
* Tim (timfosho@gmail.com) wrote:
> I'm attempting to create a long term, stand alone backup archive script for
> a pretty active and large database (3TB) which will store backups going
> back a few months.
> Using the combination of pg_basebackup in TAR format, with the `Xn` flag,
> and running pg_receivewal in the background, I'm ending up with a backup
> that is missing quite a few WAL files.
>
> Trying to figure out where my understanding of the process is incorrect,
> and what I need to do to correct this.
>
> *Backup *process:
>
> * Start pg_receivewal to archive location
> * Then run: `pg_basebackup -h {host} -p {port} -D - -Ft -Xn
> --checkpoint=fast | ...`
What's wrong with using -Xs instead..? Basically does exactly what you
seem to be trying to do here.
> * This backup is taken from a *standby node*
Where is the pg_receivewal run? Against the primary or the replica?
> * Once pg_basebackup finishes, send SIGINT to pg_receivewal
>
> *Recovery* Process:
>
> * Untar basebackup to data directory
> * Copy archived WAL files to a separate directory on the same drive
> * settings: restore_command = 'cp /var/lib/pgsql/wal_archive/%f %p'
> recovery_target = immediate
>
>
> Once I start the DB, I immediately get
>
> cp: cannot stat '/var/lib/pgsql/wal_archive/0000001200003CF80000003F': No
> such file or directory
>
>
>
> *WAL FILES: *
>
> 0000001200003CF80000003F << First WAL file the DB looks for
> 0000001200003CF700000075 << The next WAL file
> ---------------------------------------- << Missing WAL files from
> pg_receivewal archive
> ...
> 0000001200003CF8000000C8 << First WAL file in pg_receivewal archive
>
> *backup_label *file:
>
> START WAL LOCATION: 3CF7/75974400 (file 0000001200003CF700000075)
> CHECKPOINT LOCATION: 3CF8/3F1139C0
> BACKUP METHOD: streamed
> BACKUP FROM: standby
> START TIME: 2022-06-08 12:51:52 EDT
> LABEL: pg_basebackup base backup
> START TIMELINE: 18
>
>
> I can see that START WAL LOCATION & CHECKPOINT LOCATION are where the
> basebackup starts to recover from, but how come these are not in my
> pg_receivewal archive, if I'm starting it before starting the pg_basebackup
> process? How can I ensure they are part of this archive?
Well, using -Xs should do that.
> Since this is a recent backup, I have a separate WAL archive from where I
> can just recover those using a different restore_command, and once that
> missing gap of WAL files is recovered from this archive, stopping the
> server
> and switching back to `restore_command = 'cp /var/lib/pgsql/wal_archive/%f
> %p'` and recovering the rest of the WAL files from the pg_receivewal
> archive works fine. The DB recovers and accepts connections shortly after.
>
> The whole point of this is to have a several months old, stand alone
> backup, I need to have all the WAL files available in that original WAL
> archive created with pg_receivewal.
The use-case generally makes sense, in pgbackrest we have --archive-copy
for more-or-less the same kind of thing, though you can now just use a
separate newly created repo that you back up the standalone backup to
(when multiple repos are configured, archive-push will archive to all of
them) and then archive that copy. The latter is what we'd generally
recommend these days, but with archive-copy and a full backup, you can
just grab the full backup directory from inside the repo and it'll have
everything.
Thanks,
Stephen
Greetings, * Tim (timfosho@gmail.com) wrote: > We currently use pgbackrest as our primary backup tool actually, and it > works great! The issue I ran into with trying to set it up that way with > pgbackrest, is when configuring the second repo, pgbackrest will have to > archive all WAL files to it, going back to the oldest backup. And we have Eh? Why is that? I think there's some confusion here regarding how pgbackrest multi-repo works. We won't archive back to the 'oldest backup' when you add a second repo, we'll just archive *new* WAL to it after it's been configured. > this (admittedly arbitrary) legal policy requiring a single instance of a 6 > month old backup for auditing purposes. So pgbackrest has to keep WALs that > far back, the main repo only needs 4 weeks of WALs for PITR, its a bit > overkill archiving WALs twice and also storing them an extra 5 months. So I > opted to write a custom backup script. You just need to handle the modification of the pgbackrest config to add/remove the second repo (and run the backup to the second repo when it's enabled) when you want a new backup of it. > Though I was not aware of the `--archive-copy` option, I wonder if I can > just turn that option on for the main repo, and then extract the particular > backup I need to another location. We use Azure Blobs for backups. Probably this would also work, yes. > Unfortunately `-Xs` cannot be used with pg_basebackup in TAR format, I'm > using the AzCopy utility, by piping pg_basebackup into it, to stream the > basebackup directly into Azure Blob storage, without storing it anywhere > first (Which pgbackrest does natively as well!). So there is no other way > without using the TAR format option. ... yikes, that sounds horribly rickety. > Nevertheless, it seems like I need the contents of the pg_wal directory at > the start of pg_basebackup, since pg_receivewal only streams WAL generated > after it starts, the restored DB is missing all the files contained > within, I've adjusted my script to copy them as well and testing currently. I have to say that I strongly recommend NOT trying to go down this direction, as it seems terribly risky, not to mention that the WAL files in pg_wal are still being written to... Thanks, Stephen
Attachment
On 6/9/22 5:24 PM, Stephen Frost wrote: > > * Tim (timfosho@gmail.com) wrote: >> We currently use pgbackrest as our primary backup tool actually, and it >> works great! The issue I ran into with trying to set it up that way with >> pgbackrest, is when configuring the second repo, pgbackrest will have to >> archive all WAL files to it, going back to the oldest backup. And we have > > Eh? Why is that? I think there's some confusion here regarding how > pgbackrest multi-repo works. We won't archive back to the 'oldest > backup' when you add a second repo, we'll just archive *new* WAL to it > after it's been configured. By default pgbackrest will keep all WAL since the start of the oldest backup in a repo. Each repo can have its own retention settings so this might be a *different* oldest WAL per repo. However, you can use repo-retention-archive and repo-retention-archive-type to change this behavior, see https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive. Basically, if you need to keep a number of full backups for compliance you can set repo-retention-archive-type=full and repo-retention-archive-type=1 in that repo and WAL in between the backups will only be kept after the last backup. WAL required to make backups consistent is always kept until the backup expires. We've had requests for repo-retention-archive-type=0, i.e. only archive WAL during a backup, but we have not had time to implement that. Some users have implemented a work-around based on checking the backup lock. There is an issue that describes the method but I was not able to find it after a brief search. This is safer than it sounds because pgbackrest will check that all WAL required to make the backup consistent reached the archive. So, if the wrapper is broken you'll get errors. Regards, -- -David david@pgmasters.net
Ugh, typos. On 6/9/22 6:34 PM, David Steele wrote: > On 6/9/22 5:24 PM, Stephen Frost wrote: > > Basically, if you need to keep a number of full backups for compliance > you can set repo-retention-archive-type=full and > repo-retention-archive-type=1 in that repo and WAL in between the Here I meant repo-retention-archive-type=full and repo-retention-archive=1. > > We've had requests for repo-retention-archive-type=0, i.e. only archive And here repo-retention-archive=0. Regards, -- -David david@pgmasters.net
However, you can use repo-retention-archive and
repo-retention-archive-type to change this behavior, see
https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive.
Basically, if you need to keep a number of full backups for compliance
you can set repo-retention-archive-type=full and
repo-retention-archive-type=1 in that repo and WAL in between the
backups will only be kept after the last backup. WAL required to make
backups consistent is always kept until the backup expires.
So as I understand it, it would keep WALs from the most current backup up to the most current WALs.
In my case, the only purpose of this 2nd repo is to keep monthly backups that are 2-6 months old (and occasionally test restores) for legal purposes.
The most recent backup in this repo would still be 2 months old, so it would be keeping an unnecessary 2 months of WALs.
We've had requests for repo-retention-archive-type=0, i.e. only archive
WAL during a backup, but we have not had time to implement that. Some
users have implemented a work-around based on checking the backup lock.
There is an issue that describes the method but I was not able to find
it after a brief search. This is safer than it sounds because pgbackrest
will check that all WAL required to make the backup consistent reached
the archive. So, if the wrapper is broken you'll get errors.
I'm guessing you're referring to: optional wal archiving · Issue #900 · pgbackrest/pgbackrest (github.com),
This seems a bit tricky to get right (and will take some work to convince my team that its legit), also doesn't look immediately possible to only omit WAL archiving on one repo out of multiple.
Definitely a +1 here for implementing that feature (repo-retention-archive=0). pgbackrest's ability to stream into Azure storage & simultaneously compress
using parallel CPUs is absolutely killer. It's also limiting us from using it in our non production databases, as
we run those in Docker containers, and don't want any WAL archiving. Being able to quickly configure and store 1 off backups
for an often changing environment would be great.
Stephen Frost wrote:
You just need to handle the modification of the pgbackrest config to
add/remove the second repo (and run the backup to the second repo when
it's enabled) when you want a new backup of it.
I will have to try this method of dynamically tweaking the configuration, did not consider that.
Thank you for all your help everyone!
On Thu, Jun 9, 2022 at 6:37 PM David Steele <david@pgmasters.net> wrote:
Ugh, typos.
On 6/9/22 6:34 PM, David Steele wrote:
> On 6/9/22 5:24 PM, Stephen Frost wrote:
>
> Basically, if you need to keep a number of full backups for compliance
> you can set repo-retention-archive-type=full and
> repo-retention-archive-type=1 in that repo and WAL in between the
Here I meant repo-retention-archive-type=full and repo-retention-archive=1.
>
> We've had requests for repo-retention-archive-type=0, i.e. only archive
And here repo-retention-archive=0.
Regards,
--
-David
david@pgmasters.net
Greetings, * Tim (timfosho@gmail.com) wrote: > > However, you can use repo-retention-archive and > > repo-retention-archive-type to change this behavior, see > > > > https://pgbackrest.org/configuration.html#section-repository/option-repo-retention-archive > > . > > Basically, if you need to keep a number of full backups for compliance > > you can set repo-retention-archive-type=full and > > repo-retention-archive-type=1 in that repo and WAL in between the > > backups will only be kept after the last backup. WAL required to make > > backups consistent is always kept until the backup expires. > > So as I understand it, it would keep WALs from the most current backup up > to the most current WALs. > In my case, the only purpose of this 2nd repo is to keep monthly backups > that are 2-6 months old (and occasionally test restores) > for legal purposes. > The most recent backup in this repo would still be 2 months old, so it > would be keeping an unnecessary 2 months of WALs. You don't need to have the second repo be enabled all the time, just enable it when you do the next full backup into the new repo, and disable it afterwards, so you don't continue to push WAL to it. > We've had requests for repo-retention-archive-type=0, i.e. only archive > > WAL during a backup, but we have not had time to implement that. Some > > users have implemented a work-around based on checking the backup lock. > > There is an issue that describes the method but I was not able to find > > it after a brief search. This is safer than it sounds because pgbackrest > > will check that all WAL required to make the backup consistent reached > > the archive. So, if the wrapper is broken you'll get errors. > > I'm guessing you're referring to: optional wal archiving · Issue #900 · > pgbackrest/pgbackrest (github.com) > <https://github.com/pgbackrest/pgbackrest/issues/900>, This was from before we had multi-repo support and I'd think simply enabling/disabling that second repo would be the simpler approach. I see that it sounds like you're going to take a look at trying that approach- would be great to get your feedback on how hard it was and how well it did or didn't work out. Thanks, Stephen