Re: refactoring basebackup.c - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: refactoring basebackup.c |
Date | |
Msg-id | CA+TgmoYgVN=-Yoh71r3P9N7eKysd7_9b9s+1QFfFcs3w7Z-tig@mail.gmail.com Whole thread Raw |
In response to | Re: refactoring basebackup.c (Mark Dilger <mark.dilger@enterprisedb.com>) |
Responses |
Re: refactoring basebackup.c
Re: refactoring basebackup.c Re: refactoring basebackup.c Re: refactoring basebackup.c |
List | pgsql-hackers |
On Wed, Oct 21, 2020 at 12:14 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > v2-0001 through v2-0009 still apply cleanly, but v2-0010 no longer applies. It seems to be conflicting with Heikki's workfrom August. Could you rebase please? Here at last is a new version. I've dropped the "bbarchiver" patch for now, added a new patch that I'll talk about below, and revised the others. I'm pretty happy with the code now, so I guess the main things that I'd like feedback on are (1) whether design changes seem to be needed and (2) the UI. Once we have that stuff hammered out, I'll work on adding documentation, which is missing at present. The interesting patches in terms of functionality are 0006 and 0007; the rest is preparatory refactoring. 0006 adds a concept of base backup "targets," which means that it lets you send the base backup to someplace other than the client. You specify the target using a new "-t" option to pg_basebackup. By way of example, 0006 adds a "blackhole" target which throws the backup away instead of sending it anywhere, and also a "server" target which stores the backup to the server filesystem in lieu of streaming it to the client. So you can say something like "pg_basebackup -Xnone -Ft -t server:/backup/2021-07-08" and, provided that you're superuser, the server will try to drop the backup there. At present, you can't use -Fp or -Xfetch or -Xstream with a backup target, because that functionality is implemented on the client side. I think that's an acceptable restriction. Eventually I imagine we will want to have targets like "aws" or "s3" or maybe some kind of plug-in system for new targets. I haven't designed anything like that yet, but I think it's probably not all that hard to generalize what I've got. 0007 adds server-side compression; currently, it only supports server-side compression using gzip, but I hope that it won't be hard to generalize that to support LZ4 as well, and Andres told me he thinks we should aim to support zstd since that library has built-in parallel compression which is very appealing in this context. So you say something like "pg_basebackup -Ft --server-compression=gzip -D /backup/2021-07-08" or, if you want that compressed backup stored on the server and compressed as hard as possible, you could say "pg_basebackup -Xnone -Ft --server-compression=gzip9 -t server:/backup/2021-07-08". Unfortunately, here again there are a number of features that are implemented on the client side, and they don't work in combination with this. -Fp could be made to work by teaching the client to decompress; I just haven't written the code to do that. It's probably not very useful in general, but maybe there's a use case if you're really tight on network bandwidth. Making -R work looks outright useless, because the client would have to get the whole compressed tarfile from the server and then uncompress it, edit the tar file, and recompress. That seems like a thing no one can possibly want. Also, if you say pg_basebackup -Ft -D- >whatever.tar, the server injects the backup manifest into the tarfile, which if you used --server-compression would require decompressing and recompressing the whole thing, so it doesn't seem worth supporting. It's more likely to be a footgun than to help anybody. This option can be used with -Xstream or -Xfetch, but it doesn't compress pg_wal.tar, because that's generated on the client side. The thing I'm really unhappy with here is the -F option to pg_basebackup, which presently allows only p for plain or t for tar. For purposes of these patches, I've essentially treated this as if -Fp means "I want the tar files the server sends to be extracted" and "-Ft" as if it means "I'm happy with them the way they are." Under that interpretation, it's fine for --server-compression to cause e.g. base.tar.gz to be written, because that's what the server sent. But it's not really a "tar" output format; it's a "tar.gz" output format. However, it doesn't seem to make any sense to define -Fz to mean "i want tar.gz output" because -Z or -z already produces tar.gz output when used with -Ft, and also because it would be redundant to make people specify both -Fz and --server-compression. Similarly, when you use --target, the output format is arguably, well, nothing. I mean, some tar files got stored to the target, but you don't have them, but again it seems redundant to have people specify --target and then also have to change the argument to -F. Hindsight being 20-20, I think we would have been better off not having a -Ft or -Fp option at all, and having an --extract option that says you want to extract what the server sends you, but it's probably too late to make that change now. Or maybe it isn't, and we should just break command-line argument compatibility for v15. I don't know. Opinions appreciated, especially if they are nuanced. If you're curious about what the other patches in the series do, here's a very fast recap; see commit messages for more. 0001 revises the grammar for some replication commands to use an extensible-options syntax. 0002 is a trivial refactoring of basebackup.c. 0003 and 0004 refactor the server's basebackup.c and the client's pg_basebackup.c, respectively, by introducing abstractions called bbsink and bbstreamer. 0005 introduces a new COPY sub-protocol for taking base backups. I think it's worth mentioning that I believe that this refactoring is quite powerful and could let us do a bunch of other things that this patch set doesn't attempt. For instance, since this makes it pretty easy to implement server-side compression, it could probably also pretty easily be made to do server-side encryption, if you're brave enough to want to have a discussion on pgsql-hackers about how to design an encryption feature. Thanks to my colleague Tushar Ahuja for helping test some of this code. -- Robert Haas EDB: http://www.enterprisedb.com
Attachment
- v3-0001-Flexible-options-for-BASE_BACKUP-and-CREATE_REPLI.patch
- v3-0006-Support-base-backup-targets.patch
- v3-0005-Modify-pg_basebackup-to-use-a-new-COPY-subprotoco.patch
- v3-0007-WIP-Server-side-gzip-compression.patch
- v3-0004-Introduce-bbstreamer-abstraction-to-modularize-pg.patch
- v3-0002-Refactor-basebackup.c-s-_tarWriteDir-function.patch
- v3-0003-Introduce-bbsink-abstraction-to-modularize-base-b.patch
pgsql-hackers by date: