Re: where should I stick that backup? - Mailing list pgsql-hackers
From | Stephen Frost |
---|---|
Subject | Re: where should I stick that backup? |
Date | |
Msg-id | 20200413002743.GQ13712@tamriel.snowman.net Whole thread Raw |
In response to | Re: where should I stick that backup? (David Steele <david@pgmasters.net>) |
Responses |
Re: where should I stick that backup?
|
List | pgsql-hackers |
Greetings, * David Steele (david@pgmasters.net) wrote: > On 4/12/20 6:37 PM, Andres Freund wrote: > >On 2020-04-12 17:57:05 -0400, David Steele wrote: > >>On 4/12/20 3:17 PM, Andres Freund wrote: > >>>There's various ways we could address the issue for how the subcommand > >>>can access the file data. The most flexible probably would be to rely on > >>>exchanging file descriptors between basebackup and the subprocess (these > >>>days all supported platforms have that, I think). Alternatively we > >>>could invoke the subcommand before really starting the backup, and ask > >>>how many files it'd like to receive in parallel, and restart the > >>>subcommand with that number of file descriptors open. > >> > >>We don't exchange FDs. Each local is responsible for getting the data from > >>PostgreSQL or the repo based on knowing the data source and a path. For > >>pg_basebackup, however, I'd imagine each local would want a replication > >>connection with the ability to request specific files that were passed to it > >>by the main process. > > > >I don't like this much. It'll push more complexity into each of the > >"targets" and we can't easily share that complexity. And also, needing > >to request individual files will add a lot of back/forth, and thus > >latency issues. The server would always have to pre-send a list of > >files, we'd have to deal with those files vanishing, etc. > > Sure, unless we had a standard interface to "get a file from the PostgreSQL > cluster", which is what pgBackRest has via the storage interface. There's a couple of other pieces here that I think bear mentioning. The first is that pgBackRest has an actual 'restore' command- and that works with the filters and works with the storage drivers, so what you're looking at when it comes to these interfaces isn't just "put a file" but it's also "get a file". That's actually quite important to have when you start thinking about these more complicated methods of doing backups. That then leads into the fact that, with a manifest, you can do things like excluding 0-byte files from going through any of this processing or from being stored (which costs actual money too, with certain cloud storage options..), or even for just storing *small* files, which we tend to have lots of in PG and which also end up costing more and you end up 'losing' money because you've got lots of 8K files around. We haven't fully optimized for it in pgBackRest, yet, but avoiding having lots of little files (again, because there's real $$ costs involved) is something we actively think about and consider and is made possible when you've got a 'restore' command. Having a manifest where a given file might actually be a reference to a *part* of a file (ie: pgbackrest_smallfiles, offset: 8192, length: 16384) could result in savings when using cloud storage. These are the kinds of things we're thinking about today. Maybe there's some way you could implement something like that using shell commands as an API, but it sure looks like it'd be pretty hard from here. Even just managing to get users to use the right shell commands for backup, and then the right ones for restore, seems awful daunting. I get that I'm probably going to get flak for playing up the 'worst case', but the reality is that far too many people don't fully test their restore processes and trying to figure out the right shell commands to pass into some 'restore' command, or even just to pull all of the data back down from $cloudstorage to perform a restore, when everything is down and your boss is breathing down your neck to get it all back online as fast as possible, isn't how I want this project to be remembered. David and I are constantly talking about how to make the restore process as smooth and as fast as possible, because that's where the rubber really meets the road- you've gotta make that part easy and fast because that's the high-pressure situation. Taking backups is rarely where the real pressure is at- sure, take it today, take it tomorrow, let it run for a few hours, it's all fine, but when you need something restored, you best make that as simple and as fast as absolutely possible because that's the time when your entire business is potentially going to be offline and waiting for you to get everything back up. Thanks, Stephen
Attachment
pgsql-hackers by date: