Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers
From | Amul Sul |
---|---|
Subject | Re: pg_waldump: support decoding of WAL inside tarfile |
Date | |
Msg-id | CAAJ_b95PZp+XnATM=rhV2MEHRa4mL9HS8_JAMh=n-COHUP_uEg@mail.gmail.com Whole thread Raw |
In response to | Re: pg_waldump: support decoding of WAL inside tarfile (Jakub Wartak <jakub.wartak@enterprisedb.com>) |
List | pgsql-hackers |
On Mon, Sep 8, 2025 at 7:07 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote: > > On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote: > > > [..patch] > > Hi Amul! > Thanks for your review. I'm replying to a few of your comments now, but for the rest, I need to think about them. I'm kind of in agreement with some of them for the fix, but I won't be able to spend time on that next week due to official travel. I'll try to get back as soon as possible after that. > a. Why should it be necessary to provide startLSN (-s) ? Couldn't > it autodetect the first WAL (tar file) inside and just use that with > some info message? > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar > pg_waldump: error: no start WAL location given > There are two reasons. First, existing pg_waldump --path=some_directory would result in the same error. Second, it would force us to re-read the archive twice just to locate the first WAL segment, which is inefficient. > c. It doesnt work when using SEGSTART, but it's there: > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar > 000000010000000000000059 > pg_waldump: error: could not open file "000000010000000000000059": > Not a directory > $ tar tf /tmp/base/pg_wal.tar | head -1 > 000000010000000000000059 > I don't believe this is the correct use case. The WAL files are inside a tar archive, and the requirement is to use a starting LSN and a timeline (if not the default). > d. I've later noticed that follow-up patches seem to use the > -s switch and there it seems to work OK. The above SEGSTART issue was > not detected, probably because tests need to be extended cover of > segment name rather than just --start LSN (see test_pg_waldump): > $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats > -s 0/59000358 > pg_waldump: first record is after 0/59000358, at 0/590003E8, > skipping over 144 bytes > WAL statistics between 0/590003E8 and 0/61000000: > [..] > Hope previous reasoning makes sense to you. > e. Code around`if (walpath == NULL && directory != NULL)` needs > some comments. > I think this is an existing one. > f. Code around `if (fname != NULL && is_tar_file(fname, > &compression))` , so if fname is WAL segment here > (00000001000000000000005A) and we do check again if that has been > tar-ed (is_tar_file())? Why? > Again, how? > g. Just a question: the commit message says `Note that this patch > requires that the WAL files within the archive be in sequential order; > an error will be reported otherwise`. I'm wondering if such > occurrences are known to be happening in the wild? Or is it just an > assumption that if someone would modify the tar somehow? (either way > we could just add a reason why we need to handle such a case if we > know -- is manual alternation the only source of such state?). For the > record, I've tested crafting custom archives with out of sequence WAL > archives and the code seems to work (it was done using: tar --append > -f pg_wal.tar --format=ustar ..) > This is an almost nonexistent occurrence. While pg_basebackup archives WAL files in sequential order, we don't have an explicit code to enforce that order within it. Furthermore, since we can't control how external tools might handle the files, this extra precaution is necessary. > Another open question I have is this: shouldn't backup_manifest come > with CRC checksum for the archived WALs? Or does that guarantee that > backup_manifest WAL-Ranges are present in pg_wal.tar is good enough > because individual WAL files are CRC-protected itself? > I don't know, I have to check pg_verifybackup. Regards, Amul
pgsql-hackers by date: