Re: pg_combinebackup does not detect missing files - Mailing list pgsql-hackers
From | David Steele |
---|---|
Subject | Re: pg_combinebackup does not detect missing files |
Date | |
Msg-id | dc10b9d7-b484-489f-b2bc-070c425151dc@pgmasters.net Whole thread Raw |
In response to | Re: pg_combinebackup does not detect missing files (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: pg_combinebackup does not detect missing files
|
List | pgsql-hackers |
On 4/19/24 00:50, Robert Haas wrote: > On Wed, Apr 17, 2024 at 7:09 PM David Steele <david@pgmasters.net> wrote: > >> Fair enough. I accept that your reasoning is not random, but I'm still >> not very satisfied that the user needs to run a separate and rather >> expensive process to do the verification when pg_combinebackup already >> has the necessary information at hand. My guess is that most users will >> elect to skip verification. > > I think you're probably right that a lot of people will skip it; I'm > just less convinced than you are that it's a bad thing. It's not a > *great* thing if people skip it, but restore time is actually just > about the worst time to find out that you have a problem with your > backups. I think users would be better served by verifying stored > backups periodically when they *don't* need to restore them. Agreed, running verify regularly is a good idea, but in my experience most users are only willing to run verify once they suspect (or know) there is an issue. It's a pretty expensive process depending on how many backups you have and where they are stored. > Also, > saying that we have all of the information that we need to do the > verification is only partially true: > > - we do have to parse the manifest anyway, but we don't have to > compute checksums anyway, and I think that cost can be significant > even for CRC-32C and much more significant for any of the SHA variants > > - we don't need to read all of the files in all of the backups. if > there's a newer full, the corresponding file in older backups, whether > full or incremental, need not be read > > - incremental files other than the most recent only need to be read to > the extent that we need their data; if some of the same blocks have > been changed again, we can economize > > How much you save because of these effects is pretty variable. Best > case, you have a 2-backup chain with no manifest checksums, and all > verification will have to do that you wouldn't otherwise need to do is > walk each older directory tree in toto and cross-check which files > exist against the manifest. That's probably cheap enough that nobody > would be too fussed. Worst case, you have a 10-backup (or whatever) > chain with SHA512 checksums and, say, a 50% turnover rate. In that > case, I think having verification happen automatically could be a > pretty major hit, both in terms of I/O and CPU. If your database is > 1TB, it's ~5.5TB of read I/O (because one 1TB full backup and 9 0.5TB > incrementals) instead of ~1TB of read I/O, plus the checksumming. > > Now, obviously you can still feel that it's totally worth it, or that > someone in that situation shouldn't even be using incremental backups, > and it's a value judgement, so fair enough. But my guess is that the > efforts that this implementation makes to minimize the amount of I/O > required for a restore are going to be important for a lot of people. Sure -- pg_combinebackup would only need to verify the data that it uses. I'm not suggesting that it should do an exhaustive verify of every single backup in the chain. Though I can see how it sounded that way since with pg_verifybackup that would pretty much be your only choice. The beauty of doing verification in pg_combinebackup is that it can do a lot less than running pg_verifybackup against every backup but still get a valid result. All we care about is that the output is correct -- if there is corruption in an unused part of an earlier backup pg_combinebackup doesn't need to care about that. As far as I can see, pg_combinebackup already checks most of the boxes. The only thing I know that it can't do is detect missing files and that doesn't seem like too big a thing to handle. Regards, -David
pgsql-hackers by date: