Home > mailing lists
Re: backup manifests - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: backup manifests
Date	January 7, 2020 18:05:33
Msg-id	CA+TgmoY1EpREir7QvAHBzyj370-+bmfFV2F_X3ZDytpxf=iezw@mail.gmail.com Whole thread Raw
In response to	Re: backup manifests (Stephen Frost <sfrost@snowman.net>)
Responses	Re: backup manifests
List	pgsql-hackers
Tree view
On Fri, Jan 3, 2020 at 2:35 PM Stephen Frost <sfrost@snowman.net> wrote:
> > Well, I don't know how to make you happy here.
>
> I suppose I should admit that, first off, I don't feel you're required
> to make me happy, and I don't think it's necessary to make me happy to
> get this feature into PG.

Fair enough. That is gracious of you, but I would like to try to make
you happy if it is possible to do so.

> Since you expressed that interest though, I'll go out on a limb and say
> that what would make me *really* happy would be to think about where the
> project should be taking pg_basebackup, what we should be working on
> *today* to address the concerns we hear about from our users, and to
> consider the best way to implement solutions to what they're actively
> asking for a core backup solution to be providing.  I get that maybe
> that isn't how the world works and that sometimes we have people who
> write our paychecks wanting us to work on something else, and yes, I'm
> sure there are some users who are asking for this specific thing but I
> certainly don't think it's a common ask of pg_basebackup or what users
> feel is missing from the backup options we offer in core; we had users
> on this list specifically saying they *wouldn't* use this feature
> (referring to the differential backup stuff, of course), in fact,
> because of the things which are missing, which is pretty darn rare.

Well, I mean, what you seem to be suggesting here is that somebody is
driving me with a stick to do something that I don't really like but
have to do because otherwise I won't be able to make rent, but that's
actually not the case. I genuinely believe that this is a good design,
and it's driven by me, not some shadowy conglomerate of EnterpriseDB
executives who are out to make PostgreSQL sucks. If I'm wrong and the
design sucks, that's again not the fault of shadowy EnterpriseDB
executives; it's my fault. Incidentally, my boss is not very shadowy
anyhow; he's a super-nice guy, and a major reason why I work here. :-)

I don't think the issue here is that I haven't thought about what
users want, but that not everybody wants the same thing, and it's
seems like the people with whom I interact want somewhat different
things than those with whom you interact. EnterpriseDB has an existing
tool that does parallel and block-level incremental backup, and I
started out with the goal of providing those same capabilities in
core. They are quite popular with EnterpriseDB customers, and I'd like
to make them more widely available and, as far as I can, improve on
them. From our previous discussion and from a (brief) look at
pgbackrest, I gather that the interests of your customers are somewhat
different. Apparently, block-level incremental backup isn't quite as
important to your customers, perhaps because you've already got
file-level incremental backup, but various other things like
encryption and backup verification are extremely important, and you've
got a set of ideas about what would be valuable in the future which
I'm sure is based on real input from your customers. I hope you pursue
those ideas, and I hope you do it in core rather than in a separate
piece of software, but that's up to you. Meanwhile, I think that if I
have somewhat different ideas about what I'd like to pursue, that
ought to be just fine. And I don't think it is unreasonable to hope
that you'll acknowledge my goals as legitimate even if you have
different ones.

I want to point out that my idea about how to do all of this has
shifted by a considerable amount based on the input that you and David
have provided. My original design didn't involve a backup manifest,
but now it does. That turned out to be necessary, but it was also
something you suggested, and something where I asked and took advice
on what ought to go into it. Likewise, you suggested that the process
of taking the backup should involve giving the client more control
rather than trying to do everything on the server side, and that is
now the design which I plan to pursue. You suggested that because it
would be more advantageous for out-of-core backup tools, such as
pgbackrest, and I acknowledge that as a benefit and I think we're
headed in that direction. I am not doing a single thing which, to my
knowledge, blocks anything that you might want to do with
pg_basebackup in the future. I have accepted as much of your input as
I believe that I can without killing the project off completely. To go
further, I'd have to either accept years of delay or abandon my
priorities entirely and pursue yours.

> That's what would make *me* happy.  Even some comments about how to
> *get* there while also working towards these features would be likely
> to make me happy.  Instead, I feel like we're being told that we need
> this feature badly in v13 and we're going to cut bait and do whatever
> is necessary to get us there.

This seems like a really unfair accusation given how much work I've
put into trying to satisfy you and David. If this patch, the parallel
full backup patch, and the incremental backup patch were all to get
committed to v13, an outcome which seems pretty unlikely to me at this
point, then you would have a very significant number of things that
you have requested in the course of the various discussions, and
AFAICS the only thing you'd have that you don't want is the need to
parse the manifest file use while (<>) { @a = split /\t/, $_ } rather
than $a = parse_json(join '', <>). You would, for example, have the
ability to request an individual file from the server rather than a
complete tarball. Maybe the command that requests a file would lack an
encryption option, something which IIUC you would like to have, but
that certainly does not leave you worse off. It is easier to add an
encryption option to a command which you already have than it is to
invent a whole new command -- or really several whole new commands,
since such a command is not really usable unless you also have
facilities to start and stop a backup through the replication
protocol.

All that being said, I continue to maintain that insisting on JSON is
not a reasonable request. It is not easy to parse JSON, or a subset of
JSON. The amount of code required to write even a stripped-down JSON
parser is far more than the amount required to split a file on tabs,
and the existing code we have for the backend cannot be easily (or
even with moderate effort) adapted to work in the frontend. On the
other hand, the code that pgbackrest would need to parse the manifest
file format I've proposed could have easily been written in less time
than you've spent arguing about it. Heck, if it helps, I'll offer
write that patch myself (I could be a pgbackrest contributor!). I
don't want this effort to suck because something gets rushed through
too quickly, but I also don't want it to get derailed because of what
I view as a relatively minor detail. It is not always right to take
the easier road, but it is also not always wrong. I have no illusions
that what is being proposed here is perfect, but lots of features
started out imperfect and get better over time -- RLS and parallel
query come to mind, among others -- and we often learn from the
experience of shipping something which parts of the feature are most
in need of improvement.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
pgsql-hackers by date:
From: Tomas Vondra
Date: 07 January 2020, 18:05:24
Subject: Re: Berserk Autovacuum (let's save next Mandrill)
From: Tom Lane
Date: 07 January 2020, 18:06:08
Subject: Re: Assert failure due to "drop schema pg_temp_3 cascade" for temporary tables and \d+ is not showing any info after drooping temp table schema
Re: backup manifests - Mailing list pgsql-hackers

Previous

Next