Re: [GENERAL] 8.1.4 - problem with PITR - .backup.done / - Mailing list pgsql-hackers
From | Rafael Martinez |
---|---|
Subject | Re: [GENERAL] 8.1.4 - problem with PITR - .backup.done / |
Date | |
Msg-id | 1149022892.980.60.camel@linux.site Whole thread Raw |
In response to | Re: [GENERAL] 8.1.4 - problem with PITR - .backup.done / backup.ready version of the same file at the same time. (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-hackers |
On Tue, 2006-05-30 at 15:38 -0400, Tom Lane wrote: [.......] > > My thought is that the stat()s on the .done file failed for some obscure > reason, perhaps insufficient kernel resources, even though the file was > actually there. > > If you have postmaster log output for the interval in which this > happened, it would be interesting to look for occurrences of this > warning message from pgarch_archiveDone: > > if (rename(rlogready, rlogdone) < 0) > ereport(WARNING, > (errcode_for_file_access(), > errmsg("could not rename file \"%s\" to \"%s\": %m", > rlogready, rlogdone))); > > If you find any then we might need a different theory ... > I do not find any warning message "could not rename file ...". These are the relevant entries in the log file: -------------------------------------------------------- [2006-05-29 17:31:55.212 CEST] 12022 LOG: archived transaction log file "00000001000000080000000F" **** PITR_basebackup script started around 17:32 **** [2006-05-29 17:40:27.735 CEST] 12022 LOG: archived transaction log file "000000010000000800000010" [2006-05-29 17:49:32.075 CEST] 12022 LOG: archived transaction log file "000000010000000800000011" [2006-05-29 17:59:40.575 CEST] 12022 LOG: archived transaction log file "000000010000000800000012" [2006-05-29 18:08:27.229 CEST] 12022 LOG: archived transaction log file "000000010000000800000013" [2006-05-29 18:11:36.434 CEST] 12022 LOG: archived transaction log file "000000010000000800000010.0006D5E8.backup" [2006-05-29 18:11:36.467 CEST] 12022 LOG: archive command "archive_wal.sh -P pg_xlog/000000010000000800000010.0006D5E8.backup -F 000000010000000800000010.0006D5E8.backup" failed: return code 256 [2006-05-29 18:11:37.479 CEST] 12022 LOG: archive command "archive_wal.sh -P pg_xlog/000000010000000800000010.0006D5E8.backup -F 000000010000000800000010.0006D5E8.backup" failed: return code 256 [2006-05-29 18:11:38.492 CEST] 12022 LOG: archive command "archive_wal.sh -P pg_xlog/000000010000000800000010.0006D5E8.backup -F 000000010000000800000010.0006D5E8.backup" failed: return code 256 [2006-05-29 18:11:38.492 CEST] 12022 WARNING: transaction log file "000000010000000800000010.0006D5E8.backup" could not be archived: too many failures **** PITR_basebackup script finnished 18:12:16 **** ............................... **** Same error several times until we deleted the .backup.ready file at 18:15 **** [2006-05-29 18:19:14.546 CEST] 12022 LOG: archived transaction log file "000000010000000800000014" [2006-05-29 18:30:10.939 CEST] 12022 LOG: archived transaction log file "000000010000000800000015" ............................... -------------------------------------------------------- Our PITR_basebackup script does this: * Checks if Backup label file exists * Starts Backup process with pg_start_backup() * Creates a LVM2 Snapshot of data partition * Mounts the Snapshot partition * Creates a tar.bz2 file of data * Umounts Snapshot partition * Removes Snapshot LV * Backup last WAL file not yet archived * Stops Backup process with pg_stop_backup() * Waits for *.backup file to appear under the archivedir * Removes old WAL archived files * Removes old tar.bz2 data file To create the tar.bz file and to delete old WAL files can take some time. The total running time of the PITR_basebackup script was 2412 sec. If we get the same problem again, I will try to get more information from the system. As I said in my last e-mail, this has been a one time problem. regards, -- Rafael Martinez, <r.m.guerrero@usit.uio.no> Center for Information Technology Services University of Oslo, Norway PGP Public Key: http://folk.uio.no/rafael/
pgsql-hackers by date: