Re: backup manifests - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: backup manifests |
Date | |
Msg-id | CA+TgmoY1EpREir7QvAHBzyj370-+bmfFV2F_X3ZDytpxf=iezw@mail.gmail.com Whole thread Raw |
In response to | Re: backup manifests (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: backup manifests
|
List | pgsql-hackers |
On Fri, Jan 3, 2020 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote: > > Well, I don't know how to make you happy here. > > I suppose I should admit that, first off, I don't feel you're required > to make me happy, and I don't think it's necessary to make me happy to > get this feature into PG. Fair enough. That is gracious of you, but I would like to try to make you happy if it is possible to do so. > Since you expressed that interest though, I'll go out on a limb and say > that what would make me *really* happy would be to think about where the > project should be taking pg_basebackup, what we should be working on > *today* to address the concerns we hear about from our users, and to > consider the best way to implement solutions to what they're actively > asking for a core backup solution to be providing. I get that maybe > that isn't how the world works and that sometimes we have people who > write our paychecks wanting us to work on something else, and yes, I'm > sure there are some users who are asking for this specific thing but I > certainly don't think it's a common ask of pg_basebackup or what users > feel is missing from the backup options we offer in core; we had users > on this list specifically saying they *wouldn't* use this feature > (referring to the differential backup stuff, of course), in fact, > because of the things which are missing, which is pretty darn rare. Well, I mean, what you seem to be suggesting here is that somebody is driving me with a stick to do something that I don't really like but have to do because otherwise I won't be able to make rent, but that's actually not the case. I genuinely believe that this is a good design, and it's driven by me, not some shadowy conglomerate of EnterpriseDB executives who are out to make PostgreSQL sucks. If I'm wrong and the design sucks, that's again not the fault of shadowy EnterpriseDB executives; it's my fault. Incidentally, my boss is not very shadowy anyhow; he's a super-nice guy, and a major reason why I work here. :-) I don't think the issue here is that I haven't thought about what users want, but that not everybody wants the same thing, and it's seems like the people with whom I interact want somewhat different things than those with whom you interact. EnterpriseDB has an existing tool that does parallel and block-level incremental backup, and I started out with the goal of providing those same capabilities in core. They are quite popular with EnterpriseDB customers, and I'd like to make them more widely available and, as far as I can, improve on them. From our previous discussion and from a (brief) look at pgbackrest, I gather that the interests of your customers are somewhat different. Apparently, block-level incremental backup isn't quite as important to your customers, perhaps because you've already got file-level incremental backup, but various other things like encryption and backup verification are extremely important, and you've got a set of ideas about what would be valuable in the future which I'm sure is based on real input from your customers. I hope you pursue those ideas, and I hope you do it in core rather than in a separate piece of software, but that's up to you. Meanwhile, I think that if I have somewhat different ideas about what I'd like to pursue, that ought to be just fine. And I don't think it is unreasonable to hope that you'll acknowledge my goals as legitimate even if you have different ones. I want to point out that my idea about how to do all of this has shifted by a considerable amount based on the input that you and David have provided. My original design didn't involve a backup manifest, but now it does. That turned out to be necessary, but it was also something you suggested, and something where I asked and took advice on what ought to go into it. Likewise, you suggested that the process of taking the backup should involve giving the client more control rather than trying to do everything on the server side, and that is now the design which I plan to pursue. You suggested that because it would be more advantageous for out-of-core backup tools, such as pgbackrest, and I acknowledge that as a benefit and I think we're headed in that direction. I am not doing a single thing which, to my knowledge, blocks anything that you might want to do with pg_basebackup in the future. I have accepted as much of your input as I believe that I can without killing the project off completely. To go further, I'd have to either accept years of delay or abandon my priorities entirely and pursue yours. > That's what would make *me* happy. Even some comments about how to > *get* there while also working towards these features would be likely > to make me happy. Instead, I feel like we're being told that we need > this feature badly in v13 and we're going to cut bait and do whatever > is necessary to get us there. This seems like a really unfair accusation given how much work I've put into trying to satisfy you and David. If this patch, the parallel full backup patch, and the incremental backup patch were all to get committed to v13, an outcome which seems pretty unlikely to me at this point, then you would have a very significant number of things that you have requested in the course of the various discussions, and AFAICS the only thing you'd have that you don't want is the need to parse the manifest file use while (<>) { @a = split /\t/, $_ } rather than $a = parse_json(join '', <>). You would, for example, have the ability to request an individual file from the server rather than a complete tarball. Maybe the command that requests a file would lack an encryption option, something which IIUC you would like to have, but that certainly does not leave you worse off. It is easier to add an encryption option to a command which you already have than it is to invent a whole new command -- or really several whole new commands, since such a command is not really usable unless you also have facilities to start and stop a backup through the replication protocol. All that being said, I continue to maintain that insisting on JSON is not a reasonable request. It is not easy to parse JSON, or a subset of JSON. The amount of code required to write even a stripped-down JSON parser is far more than the amount required to split a file on tabs, and the existing code we have for the backend cannot be easily (or even with moderate effort) adapted to work in the frontend. On the other hand, the code that pgbackrest would need to parse the manifest file format I've proposed could have easily been written in less time than you've spent arguing about it. Heck, if it helps, I'll offer write that patch myself (I could be a pgbackrest contributor!). I don't want this effort to suck because something gets rushed through too quickly, but I also don't want it to get derailed because of what I view as a relatively minor detail. It is not always right to take the easier road, but it is also not always wrong. I have no illusions that what is being proposed here is perfect, but lots of features started out imperfect and get better over time -- RLS and parallel query come to mind, among others -- and we often learn from the experience of shipping something which parts of the feature are most in need of improvement. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: