Re: Making pg_rewind faster - Mailing list pgsql-hackers

From Srinath Reddy Sadipiralla
Subject Re: Making pg_rewind faster
Date
Msg-id CAFC+b6oSeW=hy6h8JxqJWVeEE9jc9A+cCiKsGT9RfzSYH7rF9Q@mail.gmail.com
Whole thread Raw
In response to Re: Making pg_rewind faster  (John H <johnhyvr@gmail.com>)
List pgsql-hackers
Hi John , I have reviewed your latest patch(v8-0001).

On Thu, Oct 9, 2025 at 11:27 PM John H <johnhyvr@gmail.com> wrote:
> ...
> integrated. Let's rename the isRelDataFile() to getFileContentType()
> and put all the logic in that function, including appropriate tests of
> the path name. Second, in decide_file_action(), I think we should
> check that entry->target_size == entry->source_size and return
> FILE_ACTION_COPY if not. This case really shouldn't happen, but it's
> very cheap to check and might offer a bit of protection in some weird
> scenario.

Updated the patch to reflect that.

in getFileContentType as you moved the logic of validating
whether they are xlog files or not using IsXLogFileName ,
i think it's better to validate it the kind of similar way as its
done for relfiles by parsing the path using sscanf this will
help to get the tli,logno,segno of the xlog file and using
XLogFromFileName which gives us logSegNo ,so that we
can rebuild the xlog file path again using these values with
XLogFilePath , then validate this new path with the given path
,this helps to catch invalid xlog files like pg_wal/00000001FFFFFFFFFFFFFF10.

maybe like this
+ if (strncmp("pg_wal/", path, 7) == 0)
+ {
+ const char *filename = path + 7;
+ XLogFromFileName(filename,&tli,logSegNo,WalSegSz);
+ XLogFilePath(check_path,tli,*logSegNo,WalSegSz);
+ if (strcmp(check_path, path) == 0)
+ return FILE_CONTENT_TYPE_WAL;
+ else
+ return FILE_CONTENT_TYPE_OTHER;
+ }
 

> I am not a huge fan of the way that the patch stuffs last_common_segno
> into a global variable that is then accessed by
> ...

I forgot why I bothered with a global instead since it was straightforward
to actually pass in the argument. Updated.

instead of passing last_common_segno down the call stack directly,
we can have a struct maybe called "rewindState" which will have the common
information related to both clusters involved in rewind ,so that in future
if we want to pass more such information down we won't be increasing
the number of arguments that need to be passed down to all functions in stack.
 

> It does not appear to me that the test case tests what it purports to
> test, and I don't think that what it purports to test is the right
> thing anyway. It claims that it is testing that a certain WAL file
> (which it erroneously calls a "WAL file entry") is not copied to the
> target, but I see no logic to actually check this.
> ...

That's a fair point. The test does:

1. Write some data -> WAL 1
2. SELECT pg_switch_wal() -> pg_current_wal_lsn() is at WAL 2
3. CHECKPOINT; -> Makes WAL 2 the last common checkpoint

Then it checks that "WAL 1 not copied over" was logged which isn't
the best since if it was logged elsewhere then it would still pass.

we can improve the TAP test file header comment as 
# Test the situation where we skip copying the same WAL files from source to target
# except if WAL file size differs.

let me put it this way
1) WALs are written to 000000010000000000000002 in primary.
2) then SELECT pg_switch_wal() now the current WAL is 000000010000000000000003
3) CHECKPOINT; this is the last common checkpoint and is written in
000000010000000000000003.
4) until this point we have the same WAL files in primary and standby.
5) promote standby ,this leads to timeline switch and lets say the divergence point
is at segment boundary , so the first WAL segment in the new timeline is 000000020000000000000004.
6) when we stop standby then primary , in primary we get checkpoint shutdown which
is greater than the divergence point ,means there is divergence so pg_rewind is needed.
7) so the last common WAL segment was 000000010000000000000003.

I have mentioned below about how this updated TAP test is testing whether
common WAL file is being copied or not and about one more test we need to cover
in this TAP test.
 

> ...
> first. If that's correct, I don't quite know what a test case here can
> usefully verify, since the only difference would be performance, but
> maybe I'm misunderstanding something?

I updated the test to run stat and get the modification time of the common
WAL segment before and after pg_rewind and verify it is the same.

In filemap.c in decide_wal_file_action, if you comment out
> return FILE_ACTION_NONE;

the test fails.

This test makes sense to me, by checking whether
last common segment(000000010000000000000003 in our case)
been copied or not using the stat->mtime after the pg_rewind, if its not
copied then stat->mtime will be the same ,else it will differ with the
one we noted before rewind, but we also need to test the case
where entry->target_size != entry->source_size and we do copy,
like if stat->mtime is changed (which means file has been copied)
and target_stat->size != soucrce_stat->size here no error expected.


--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/

pgsql-hackers by date:

Previous
From: Bertrand Drouvot
Date:
Subject: Preserve index stats during ALTER TABLE ... TYPE ...
Next
From: Daniel Gustafsson
Date:
Subject: Re: [PATCH] ecpg: check return value of replace_variables()