Re: refactoring basebackup.c - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: refactoring basebackup.c |
Date | |
Msg-id | CA+TgmobWAD+b+-rA9zdwhaOxS0quK9iJjD4eq2N8qzUsPYD+CA@mail.gmail.com Whole thread Raw |
In response to | Re: refactoring basebackup.c (Mark Dilger <mark.dilger@enterprisedb.com>) |
Responses |
Re: refactoring basebackup.c
|
List | pgsql-hackers |
On Mon, Jul 19, 2021 at 2:51 PM Mark Dilger <mark.dilger@enterprisedb.com> wrote: > The difficulty in v3-0007 with pg_basebackup only knowing how to parse tar archives seems to be a natural consequence ofnot sufficiently abstracting out the handling of the tar format. If the bbsink and bbstreamer abstractions fully encapsulateda set of parsing callbacks, then pg_basebackup wouldn't contain things like: > > streamer = bbstreamer_tar_parser_new(streamer); > > but instead would use the parser callbacks without knowledge of whether they were parsing tar vs. cpio vs. whatever. Itjust seems really odd that pg_basebackup is using the extensible abstraction layer and then defeating the purpose by knowingtoo much about the format. It might even be a useful exercise to write cpio support into this patch set rather thanwaiting until v16, just to make sure the abstraction layer doesn't have tar-specific assumptions left over. Well, I had a patch in an earlier patch set that tried to get knowledge of tar out of basebackup.c, but it couldn't use the bbsink abstraction; it needed a whole separate abstraction layer which I had called bbarchiver with a different API. So I dropped it, for fear of being told, not without some justification, that I was just changing things for the sake of changing them, and also because having exactly one implementation of some interface is really not great. I do conceptually like the idea of making the whole thing flexible enough to generate cpio or zip archives, because like you I think that having tar-specific stuff all over the place is grotty, but I have a feeling there's little market demand for having pg_basebackup produce cpio, pax, zip, iso, etc. archives. On the other hand, server-side compression and server-side backup seem like functionality with real utility. Still, if you or others want to vote for resurrecting bbarchiver on the grounds that general code cleanup is worthwhile for its own sake, I'm OK with that, too. I don't really understand what your problem is with how the patch set leaves pg_basebackup. On the server side, because I dropped the bbarchiver stuff, basebackup.c still ends up knowing a bunch of stuff about tar. pg_basebackup.c, however, really doesn't know anything much about tar any more. It knows that if it's getting a tar file and needs to parse a tar file then it had better call the tar parsing code, but that seems difficult to avoid. What we can avoid, and I think the patch set does, is pg_basebackup.c having any real knowledge of what the tar parser is doing under the hood. Thanks also for the detailed comments. I'll try to the right number of verbs in each sentence in the next version of the patch. I will also look into the issues mentioned by Dilip and Tushar. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: