Re: where should I stick that backup? - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: where should I stick that backup? |
Date | |
Msg-id | 20200406182307.GC13712@tamriel.snowman.net Whole thread Raw |
In response to | Re: where should I stick that backup? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: where should I stick that backup?
|
List | pgsql-hackers |
Greetings, * Robert Haas (robertmhaas@gmail.com) wrote: > On Mon, Apr 6, 2020 at 10:45 AM Stephen Frost <sfrost@snowman.net> wrote: > > For my 2c, at least, introducing more shell commands into critical parts > > of the system is absolutely the wrong direction to go in. > > archive_command continues to be a mess that we refuse to clean up or > > even properly document and the project would be much better off by > > trying to eliminate it rather than add in new ways for users to end up > > with bad or invalid backups. > > > > Further, having a generic shell script approach like this would result > > in things like "well, we don't need to actually add support for X, Y or > > Z, because we have this wonderful generic shell script thing and you can > > write your own, and therefore we won't accept patches which do add those > > capabilities because then we'd have to actually maintain that support." > > > > In short, -1 from me. > > I'm not sure that there's any point in responding to this because I > believe that the wording of this email suggests that you've made up > your mind that it's bad and that position is not subject to change no > matter what anyone else may say. However, I'm going to try on reply > anyway, on the theory that (1) I might be wrong and (2) even if I'm > right, it might influence the opinions of others who have not spoken > yet, and whose opinions may be less settled. Chances certainly aren't good that you'll convince me that putting more absolutely crticial-to-get-perfect shell scripts into the backup path is a good idea. > First of all, while I agree that archive_command has some problems, I > don't think that means that every case where we use a shell command > for anything is a hopeless mess. The only problem I really see in this > case is that if you route to a local file via an intermediate program > you wouldn't get an fsync() any more. But we could probably figure out > some clever things to work around that problem, if that's the issue. > If there's some other problem, what is it? We certainly haven't solved the issues with archive_command (at least, not in core), so this "well, maybe we could fix all the issues" claim really doesn't hold any water. Having commands like this ends up just punting on the whole problem and saying "here user, you deal with it." *Maybe* if we *also* wrote dedicated tools to be used with these commands (as has been proposed multiple times with archive_command, but hasn't actually happened, at least, not in core), we could build something where this would work reasonably well and it'd be alright, but that wasn't what seemed to be suggested here, and if we're going to write all that code anyway, it doesn't really seem like a shell interface is a best one to go with. There's also been something of an expectation that if we're going to provide an interface then we should have an example of something that uses it- but when it comes to something like archive_command, the example we came up with was terrible and yet it's still in our documentation and is commonly used, much to the disservice of our users. Sure, we can point to our users and say "well, that's now how you should actually use that feature, you should do all this other stuff in that command" and punt on this and push it back on our users and tell them that they're using the interface we provide wrong but the only folks who can possibly actually like that answer is ourselves- our users aren't happy with it because they're left with a broken backup that they can't restore from when they needed to. That your initial email had more-or-less the exact same kind of "example" certainly doesn't inspire confidence that this would end up being used sensibly by our users. Yes, fsync() is part of the issue but it's not the only one- retry logic, and making sure the results are correct, is pretty darn important too, especially with things like s3 (even dedicated tools have issues in this area- I just saw a report about wal-g failing to archive a WAL file properly because there was an error which resulted in a 0-byte WAL file being stored; wal-g did properly retry, but then it saw the file was there and figured "all is well" and returned success even though the file was 0-byte in s3). I don't doubt that David could point out a few other issues- he routinely does whenever I chat with him about various ideas I've got. So, instead of talking about 'bzip2 > %f.bz2', and then writing into our documentation that that's how this feature can be used, what about proposing something that would actually work reliably with this interface? Which properly fsync's everything, has good retry logic for when failures happen, is able to actually detect when a failure happened, how to restore from a backup taken this way, and it'd probably be good to show how pg_verifybackup could be used to make sure the backup is actually correct and valid too. > Second, PostgreSQL is not realistically going to link pg_basebackup > against every compression, encryption, and remote storage library out > there. One, yeah, we don't want to maintain that. Two, we don't want > PostgreSQL to have build-time dependencies on a dozen or more > libraries that people might want to use for stuff like this. We might > well want to incorporate support for a few of the more popular things > in this area, but people will always want support for newer things > than what existing server releases feature, and for more of them. We don't need to link to 'every compression, encryption and remote storage library out there'. In some cases, yes, it makes sense to use an existing library (OpenSSL, zlib, lz4), but in many other cases it makes more sense to build support directly into the system (s3, gcs, probably others) because a good library doesn't exist. It'd also be good to build a nicely extensible system which people can add to, to support other storage or compression options but I don't think that's reasonable to do with a shell-script based interface- maybe with shared libraries, as Magnus suggests elsewhere, maybe, but even there I have some doubts. > Third, I am getting pretty tired of being told every time I try to do > something that is related in any way to backup that it's wrong. If > your experience with pgbackrest motivated you to propose ways of > improving backup and restore functionality in the community, that > would be great. But in my experience so far, it seems to mostly > involve making a lot of negative comments that make it hard to get > anything done. I would appreciate it if you would adopt a more > constructive tone. pgbackrest is how we're working to improve backup and restore functionality in the community, and we've come a long way and gone through a great deal of fire getting there. I appreciate that it's not in core and I'd love to discuss how we can change that, but it's absolutely a part of the PG community and ecosystem- with changes being made in core routinely which improve the in-core tools as well as pgbackrest by the authors contributing back. As far as my tone, I'm afraid that's simply coming from having dealt with and discussed many of these, well, shortcuts, to trying to improve backup and recovery. Did David and I discuss using s3cmd? Of course. Did we research various s3 libraries? http libraries? SSL libraries? compression libraries? Absolutely, which is why we ended up using OpenSSL (PG links to it already, so if you're happy enough with PG's SSL then you'll probably accept pgbackrest using the same one- and yes, we've talked about supporting others as PG is moving in that direction too), and zlib (same reasons), we've now added lz4 (after researching it and deciding it was pretty reasonable to include), but when it came to dealing with s3, we wrote our own HTTP and s3 code- none of the existing libraries were a great answer and trying to make it work with s3cmd was, well, about like saying that you should just use CSV files and forget about this whole database thing. We're very likely to write our own code for gcs too, but we already have the HTTP code, which means it's not actually all that heavy of a lift to do. I'm not against trying to improve the situation in core, and I've even talked about and tried to give feedback about what would make the most sense for that to look like, but I feel like every time I do that there's a bunch of push-back that I want it to look like pgbackrest or that I'm being negative about things that don't look like pgbackrest. Guess what? Yes, I do think it should look like pgbackrest, but that's not because I have some not invented here syndrome issue, it's because we've been through this and have learned a great deal and have taken what we've learned and worked to build the best tool we can, much the way the PG community works to make the best database we can. Yes, we were able to argue and make it clear that a manifest really did make sense and even that it should be in json format, and then argue that checking WAL was a pretty important part of verifying any backup, but each and every one of these ends up being a long and drawn out argument and it's draining. The thing is, this stuff isn't new to us. Thanks, Stephen
Attachment
pgsql-hackers by date: