Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers
From | Jakub Wartak |
---|---|
Subject | Re: pg_waldump: support decoding of WAL inside tarfile |
Date | |
Msg-id | CAKZiRmzzmaQupeHoASWVDwuoizTttDcBB8mjB7KNVhB6F6x2-Q@mail.gmail.com Whole thread Raw |
In response to | Re: pg_waldump: support decoding of WAL inside tarfile (Amul Sul <sulamul@gmail.com>) |
List | pgsql-hackers |
On Tue, Aug 26, 2025 at 1:53 PM Amul Sul <sulamul@gmail.com> wrote: > [..patch] Hi Amul! 0001: LGTM, maybe I would just slightly enhance the commit message ("This is in preparation for adding a second source file to this directory.") -- maye bit a bit more verbose or use a message from 0002? 0002: LGTM 0003: LGTM Tested here (after partial patch apply, and test suite did work fine). 0004: a. Why should it be necessary to provide startLSN (-s) ? Couldn't it autodetect the first WAL (tar file) inside and just use that with some info message? $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar pg_waldump: error: no start WAL location given b. Why would it like to open "blah" dir if I wanted that "blah" segment from the archive? Shouldn't it tell that it was looking in the archive and couldn find it inside? $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar blah pg_waldump: error: could not open file "blah": Not a directory c. It doesnt work when using SEGSTART, but it's there: $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar 000000010000000000000059 pg_waldump: error: could not open file "000000010000000000000059": Not a directory $ tar tf /tmp/base/pg_wal.tar | head -1 000000010000000000000059 d. I've later noticed that follow-up patches seem to use the -s switch and there it seems to work OK. The above SEGSTART issue was not detected, probably because tests need to be extended cover of segment name rather than just --start LSN (see test_pg_waldump): $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats -s 0/59000358 pg_waldump: first record is after 0/59000358, at 0/590003E8, skipping over 144 bytes WAL statistics between 0/590003E8 and 0/61000000: [..] e. Code around`if (walpath == NULL && directory != NULL)` needs some comments. f. Code around `if (fname != NULL && is_tar_file(fname, &compression))` , so if fname is WAL segment here (00000001000000000000005A) and we do check again if that has been tar-ed (is_tar_file())? Why? g. Just a question: the commit message says `Note that this patch requires that the WAL files within the archive be in sequential order; an error will be reported otherwise`. I'm wondering if such occurrences are known to be happening in the wild? Or is it just an assumption that if someone would modify the tar somehow? (either way we could just add a reason why we need to handle such a case if we know -- is manual alternation the only source of such state?). For the record, I've tested crafting custom archives with out of sequence WAL archives and the code seems to work (it was done using: tar --append -f pg_wal.tar --format=ustar ..) h. Anyway, in case of typo/wrong LSN, 0004 emits wrong error message I think: $ /usr/pgsql19/bin/pg_waldump --path=/tmp/base/pg_wal.tar --stats -s 0/50000358 pg_waldump: error: WAL files are not archived in sequential order pg_waldump: detail: Expecting segment number 80 but found 89. it's just that the 50000358 LSN above is below the minimal LSN present in the WAL segments (first segment is 000000010000000000000059 there, i've just intentionally provided a bad value 50.. as a typo and it causes the wrong message). Now it might not be an issue as with 0005 patch the same test behaves OK (`pg_waldump: error: could not find a valid record after 0/50000358`). It is just relevant if this would be committed not all at once. i. If I give wrong --timeline=999 to pg_waldump it fails with misleading error message: could not read WAL data from "pg_wal.tar" archive: read -1 of 8192 0005: a. I'm wondering if we shouldn't log (to stderr?) some kind of notification message (just once) that non-sequential WAL files were discovered and that pg_waldump is starting to write to $somewhere as it may be causing bigger I/O than anticipated when running the command. This can easily help when troubleshooting why it is not fast, and also having set TMPDIR to usually /tmp can be slow or too small. b. IMHO member_prepare_tmp_write() / get_tmp_wal_file_path() with TMPDIR can be prone to symlink attack. Consider setting TMPDIR=/tmp . We are writing to e.g. /tmp/<WALsegment>.waldump.tmp in 0004 , but that path is completely guessable. If an attacker prepares some symlinks and links those to some other places, I think the code will happily open and overwrite the contents of the rogue symlink. I think using mkstemp(3)/tmpfile(3) would be a safer choice if TMPDIR needs to be in play. Consider that pg_waldump can be run as root (there's no mechanism preventing it from being used that way). c. IMHO that unlink() might be not guaranteed to always remove files, as in case of any trouble and exit() , those files might be left over. I think we need some atexit() handlers. This can be triggered with combo of options of nonsequential files in tar + wrong LSN given: $ tar tf pg_wal.tar 00000001000000000000005A 00000001000000000000005B 00000001000000000000005C [..] 000000010000000000000060 000000010000000000000059 <-- out of order, appended last $ ls -lh 0* ls: cannot access '0*': No such file or directory $ /usr/pgsql19/bin/pg_waldump --path=/tmp/ble/pg_wal.tar --stats -s 0/10000358 #wrong LSN pg_waldump: error: could not find a valid record after 0/10000358 $ ls -lh 0* -rw------- 1 postgres postgres 16M Sep 8 14:44 000000010000000000000059.waldump.tmp -rw------- 1 postgres postgres 16M Sep 8 14:44 00000001000000000000005A.waldump.tmp [..] 0006: LGTM 0007: a. Commit message says `Future patches to pg_waldump will enable it to decode WAL directly` , but those pg_waldump are earlier patches, right? b. pg_verifybackup should print some info with --progress that it is spawning pg_waldump (pg_verifybackup --progress mode does not display anything related to verifing WALs, but it could) c. I'm wondering, but pg_waldump seems to be not complaining if --end=LSN is made into such a future that it doesn't exist. E.g. If the latest WAL segment is 60 (with end LSN 0/60A77A59), but I run pg_waldump `--end=0/7000000` , it will return code 0 and nothing on stderr. So how sure are we that the necessary WAL segments (as per backup_manifest) are actually inside the tar? It's supposed to be verified, but it isn't for this use case? Same happens if craft special tar and remove just one WAL segment from pg_wal.tar (simulate missing WAL segment), but ask the pg_verifybackup/pg_waldump to verify it to exact last LSN sequence, e.g.: $ /usr/pgsql19/bin/pg_waldump --quiet --path=/tmp/missing/pg_wal.tar --timeline=1 --start=0/59000028 --end=0/60A77A58 && echo OK # but it is not OK OK $ /usr/pgsql19/bin/pg_waldump --stats --path=/tmp/missing/pg_wal.tar --timeline=1 --start=0/59000028 --end=0/60A77A58 WAL statistics between 0/59000028 and 0/5CFFFFD0: # <-- 0/5C LSN maximum detected [..] Notice it has read till 0/5C (but I've asked till 0/60), because I've removed 0D: $ tar tf /tmp/missing/pg_wal.tar| grep ^0 000000010000000000000059 00000001000000000000005A 00000001000000000000005B 00000001000000000000005C 00000001000000000000005E <-- missing 5D Yet it reported no errors. 0008: LGTM Another open question I have is this: shouldn't backup_manifest come with CRC checksum for the archived WALs? Or does that guarantee that backup_manifest WAL-Ranges are present in pg_wal.tar is good enough because individual WAL files are CRC-protected itself? -J.
pgsql-hackers by date: