Thread: pgsql: Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
Introduce 'bbstreamer' abstraction to modularize pg_basebackup. pg_basebackup knows how to do quite a few things with a backup that it gets from the server, like just write out the files, or compress them first, or even parse the tar format and inject a modified postgresql.auto.conf file into the archive generated by the server. Unforatunely, this makes pg_basebackup.c a very large source file, and also somewhat difficult to enhance, because for example the knowledge that the server is sending us a 'tar' file rather than some other sort of archive is spread all over the place rather than centralized. In an effort to improve this situation, this commit invents a new 'bbstreamer' abstraction. Each archive received from the server is fed to a bbstreamer which may choose to dispose of it or pass it along to some other bbstreamer. Chunks may also be "labelled" according to whether they are part of the payload data of a file in the archive or part of the archive metadata. So, for example, if we want to take a tar file, modify the postgresql.auto.conf file it contains, and the gzip the result and write it out, we can use a bbstreamer_tar_parser to parse the tar file received from the server, a bbstreamer_recovery_injector to modify the contents of postgresql.auto.conf, a bbstreamer_tar_archiver to replace the tar headers for the file modified in the previous step with newly-built ones that are correct for the modified file, and a bbstreamer_gzip_writer to gzip and write the resulting data. Only the objects with "tar" in the name know anything about the tar archive format, and in theory we could re-archive using some other format rather than "tar" if somebody wanted to write the code. These chances do add a substantial amount of code, but I think the result is a lot more maintainable and extensible. pg_basebackup.c itself shrinks by roughly a third, with a lot of the complexity previously contained there moving into the newly-added files. Patch by me. The larger patch series of which this is a part has been reviewed and tested at various times by Andres Freund, Sumanta Mukherjee, Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger, Sergei Kornilov, and Jeevan Ladhe. Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com Branch ------ master Details ------- https://git.postgresql.org/pg/commitdiff/23a1c6578c87fca0e361c4f5f9a07df5ae1f9858 Modified Files -------------- src/bin/pg_basebackup/Makefile | 12 +- src/bin/pg_basebackup/bbstreamer.h | 217 +++++++ src/bin/pg_basebackup/bbstreamer_file.c | 579 +++++++++++++++++++ src/bin/pg_basebackup/bbstreamer_inject.c | 250 ++++++++ src/bin/pg_basebackup/bbstreamer_tar.c | 444 +++++++++++++++ src/bin/pg_basebackup/pg_basebackup.c | 912 ++++++------------------------ 6 files changed, 1687 insertions(+), 727 deletions(-)
Re: pgsql: Introduce 'bbstreamer' abstraction to modularize pg_basebackup.
From
Peter Geoghegan
Date:
On Fri, Nov 5, 2021 at 7:30 AM Robert Haas <rhaas@postgresql.org> wrote: > Introduce 'bbstreamer' abstraction to modularize pg_basebackup. I'm seeing a pretty straightforward looking compiler warning following this commit: bbstreamer_file.c:564:24: warning: unused variable ‘mystreamer’ [-Wunused-variable] 564 | bbstreamer_extractor *mystreamer = (bbstreamer_extractor *) streamer; | ^~~~~~~~~~ Thanks -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Fri, Nov 5, 2021 at 7:30 AM Robert Haas <rhaas@postgresql.org> wrote: >> Introduce 'bbstreamer' abstraction to modularize pg_basebackup. > I'm seeing a pretty straightforward looking compiler warning following > this commit: > bbstreamer_file.c:564:24: warning: unused variable ‘mystreamer’ And apparently, that file isn't getting built at all on the MSVC side: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2021-11-06%2003%3A52%3A19 regards, tom lane
On 11/6/21 15:30, Tom Lane wrote: > Peter Geoghegan <pg@bowt.ie> writes: >> On Fri, Nov 5, 2021 at 7:30 AM Robert Haas <rhaas@postgresql.org> wrote: >>> Introduce 'bbstreamer' abstraction to modularize pg_basebackup. > >> I'm seeing a pretty straightforward looking compiler warning following >> this commit: >> bbstreamer_file.c:564:24: warning: unused variable ‘mystreamer’ > > And apparently, that file isn't getting built at all on the MSVC side: > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird&dt=2021-11-06%2003%3A52%3A19 > I got annoyed by the warning so I marked the variable as assert-only. Haven't done anything about the MSVC issue. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Tomas Vondra <tomas.vondra@enterprisedb.com> writes: > I got annoyed by the warning so I marked the variable as assert-only. Thanks! > Haven't done anything about the MSVC issue. I pushed a quick-hack fix for it. Given the ongoing meson work, I'm not inclined to put in a lot of effort there (like teaching the MSVC stuff about BBOBJS). regards, tom lane