Thread: Unable to recovery due missing wal_file
Hi guys,
I'm currently using PostgreSQL 9.2.
One of my backup servers went down, and I had to re-sync the all the DB.
I used pg_basebackup and, of course, at the same time wal_archive.
Steps:
1 - Confirm the wal_files are being copied into the new server.
2 - Delete /var/lib/pgsql/9.2/data/*
3 - Double check the wal_files are ok into the server
4 - Start pg_basebackup
5 - pg_basebackup took 10 days, as our DB is 2.0TB.
6 - Start postgres with the recovery.conf.restore_comand
7 - Wait for the wal_files being processed
I got an error here: 00000002000013B40000001A` not found
How is that possible? If I double checked!! :(
I can't find that file anywhere.. and now I'll have to re-do all the work =(
Is there any tips you guys could give me on this situation?
Should I check the backup_label file when the pg_basebackup starts, and check if the wal_file inside the backup_label file exists?
Thanks
Patrick
One more question:
Could I use pg_basebackup (or another tool like RSYNC) and re-sync the data folder only with the missing data? for example... incremental? So I wouldn't need to copy 2TB again?
On 07/03/2016 02:48 PM, Patrick B wrote: > Hi guys, > > I'm currently using PostgreSQL 9.2. > > One of my backup servers went down, and I had to re-sync the all the DB. > I used pg_basebackup and, of course, at the same time wal_archive. > > Steps: > > 1 - Confirm the wal_files are being copied into the new server. > 2 - Delete /var/lib/pgsql/9.2/data/* > 3 - Double check the wal_files are ok into the server > 4 - Start pg_basebackup > 5 - pg_basebackup took 10 days, as our DB is 2.0TB. > 6 - Start postgres with the recovery.conf.restore_comand > 7 - Wait for the wal_files being processed > > I got an error here: 00000002000013B40000001A` not found > How is that possible? If I double checked!! :( Did you read this?: https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html -X method --xlog-method=method Includes the required transaction log files (WAL files) in the backup. This will include all transaction logs generated during the backup. If this option is specified, it is possible to start a postmaster directly in the extracted directory without the need to consult the log archive, thus making this a completely standalone backup. The following methods for collecting the transaction logs are supported: f fetch The transaction log files are collected at the end of the backup. Therefore, it is necessary for the wal_keep_segments parameter to be set high enough that the log is not removed before the end of the backup. If the log has been rotated when it's time to transfer it, the backup will fail and be unusable. s stream Stream the transaction log while the backup is created. This will open a second connection to the server and start streaming the transaction log in parallel while running the backup. Therefore, it will use up two connections configured by the max_wal_senders parameter. As long as the client can keep up with transaction log received, using this mode requires no extra transaction logs to be saved on the master. > I can't find that file anywhere.. and now I'll have to re-do all the work =( > > Is there any tips you guys could give me on this situation? > > Should I check the backup_label file when the pg_basebackup starts, and > check if the wal_file inside the backup_label file exists? > > Thanks > Patrick > -- Adrian Klaver adrian.klaver@aklaver.com
On 07/03/2016 03:17 PM, Patrick B wrote: > One more question: > > Could I use pg_basebackup (or another tool like RSYNC) and re-sync the > data folder only with the missing data? for example... incremental? So I > wouldn't need to copy 2TB again? That assumes the needed WAL files are still on the original server, which seemed to be not the case per your previous post. -- Adrian Klaver adrian.klaver@aklaver.com
Yes, I read that!
However, I store the wal_files manually into three different servers. I've double checked the files! I got over 500GB of wal_files when the pg_basebackup finished, and still, wasn't enough.
I'll re-do the steps but now using the STREAM option.
On 07/03/2016 05:23 PM, Patrick B wrote: > Yes, I read that! > > However, I store the wal_files manually into three different servers. > I've double checked the files! I got over 500GB of wal_files when the > pg_basebackup finished, and still, wasn't enough. Yes, but did you have the 16MB that are 00000002000013B40000001A? > > I'll re-do the steps but now using the STREAM option. -- Adrian Klaver adrian.klaver@aklaver.com
I don't have it now!
But I didn't know that postgres would need that file! If I knew it, I'd have checked just after pg_basebackup started....
On 07/03/2016 05:36 PM, Patrick B wrote: > I don't have it now! > > But I didn't know that postgres would need that file! If I knew it, I'd > have checked just after pg_basebackup started.... Not sure that would have mattered for the reasons below. You might want to take a look at the below: https://www.postgresql.org/docs/9.5/static/wal.html In particular: https://www.postgresql.org/docs/9.5/static/wal-intro.html Short version WAL files are essential to restoring and on the originating server are recycled, subject to the configuration parameters explained here: https://www.postgresql.org/docs/9.5/static/wal-configuration.html So the wal archiving you have set up is not storing everything, it removes older files over time? -- Adrian Klaver adrian.klaver@aklaver.com
Not sure that would have mattered for the reasons below.
You might want to take a look at the below:
https://www.postgresql.org/docs/9.5/static/wal.html
In particular:
https://www.postgresql.org/docs/9.5/static/wal-intro.html
Short version WAL files are essential to restoring and on the originating server are recycled, subject to the configuration parameters explained here:
https://www.postgresql.org/docs/9.5/static/wal-configuration.html
So the wal archiving you have set up is not storing everything, it removes older files over time?
Yes... it removes... I've changed to store them for up to 72h.
I'll restart the process now, by using stream instead fletch.
Let's see.... =\
pg_basebackup --pgdata=- --format=tar --label=bkp_server --progress --host=localhost --port=5432 --username=replicator --xlog-method=stream
Is that right? Once is finished, just need to restart postgres and set the recovery.conf.restored.command?
Cheers
slave_new: server that needs a new copy of the DB
slave01: streaming replication slave
My steps are:
1. ssh slave_new
2. Stop postgres
3. rm -rf /var/lib/pgsql/9.2/data/*
4. ssh postgres@slave01 'pg_basebackup --pgdata=- --format=tar --label=slave_new --progress --host=localhost --port=5432 --username=replicator --xlog-method=stream' | tar -x --no-same-owner
5. Once the STEP 4 is done, copy the original postgresql.conf pg_hba.conf recovery.conf into /var/lib/pgsql/9.2/data
6. Set recovery,conf with restored command.
7. Start Postgres
I should be able to see the DB recovering itself from the wal_files via LOGS
Is that right?
On 07/03/2016 06:21 PM, Patrick B wrote: > > > Not sure that would have mattered for the reasons below. > > You might want to take a look at the below: > > https://www.postgresql.org/docs/9.5/static/wal.html > > In particular: > > https://www.postgresql.org/docs/9.5/static/wal-intro.html > > Short version WAL files are essential to restoring and on the > originating server are recycled, subject to the configuration > parameters explained here: > > https://www.postgresql.org/docs/9.5/static/wal-configuration.html > > So the wal archiving you have set up is not storing everything, it > removes older files over time? > > > Yes... it removes... I've changed to store them for up to 72h. You say it took 10 days to run the pg_basebackup, so I am not keeping the last 72 hrs is going to help. > > I'll restart the process now, by using stream instead fletch. Remember: https://www.postgresql.org/docs/9.5/static/app-pgbasebackup.html --xlog-method=method "This will include all transaction logs generated during the backup." Given that you WAL archiving for a period less then 72 hrs generated over 500GB of files, do you have the space to store 10 days worth? > > Let's see.... =\ > -- Adrian Klaver adrian.klaver@aklaver.com
El 04/07/16 a las 01:06, Adrian Klaver escribió: > On 07/03/2016 06:21 PM, Patrick B wrote: >> >> >> Not sure that would have mattered for the reasons below. >> >> You might want to take a look at the below: >> >> https://www.postgresql.org/docs/9.5/static/wal.html >> >> In particular: >> >> https://www.postgresql.org/docs/9.5/static/wal-intro.html >> >> Short version WAL files are essential to restoring and on the >> originating server are recycled, subject to the configuration >> parameters explained here: >> >> https://www.postgresql.org/docs/9.5/static/wal-configuration.html >> >> So the wal archiving you have set up is not storing everything, it >> removes older files over time? >> >> >> Yes... it removes... I've changed to store them for up to 72h. > > You say it took 10 days to run the pg_basebackup, so I am not keeping > the last 72 hrs is going to help. Not only that, if you add up another 500GB to transfer over the WALs, that might mean another 2 to 3 days to finish the transfer (and that if it's only 500GB of WALs) BTW, 2TB in 10 days means an avg speed of 2.3Mb/s. I guess this must be some standby in a DR site without a dedicated network bandwidth. Wouldn't it be faster to clone the disk locally, detach it and send it over with a Courier? ;) Regards, -- Martín Marqués http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services