Re: Making pg_rewind faster - Mailing list pgsql-hackers

From Japin Li
Subject Re: Making pg_rewind faster
Date
Msg-id ME0P300MB044589E4E6A073D3859980CCB643A@ME0P300MB0445.AUSP300.PROD.OUTLOOK.COM
Whole thread Raw
In response to Making pg_rewind faster  (vignesh ravichandran <admin@viggy28.dev>)
Responses Re: Making pg_rewind faster
List pgsql-hackers
On Wed, 02 Jul 2025 at 11:21, John H <johnhyvr@gmail.com> wrote:
> Hi,
>
> Thanks for the quick review.
>
> On Tue, Jul 1, 2025 at 8:16 PM wenhui qiu <qiuwenhuifx@gmail.com> wrote:
>> > Perhaps decide_wal_file_action() could be defined in filemap.c.
>>
>
> That's a good point. I updated the patch to reflect that.
>

Thanks for updating the patch.

>> >  While this is unrelated to WAL logging, it could also contribute to faster
>> > pg_rewind operations.  Should we consider ignoring log files under PGDATA
>> > (e.g., those in the default log/ directory)?
>> Agree ,Usually the log file directory also takes up a lot of space, and the number of log files is quite large
>>
>
> Should we handle this use case? I do agree that for the more common
> use-cases of pg_rewind which is synchronizing an old writer to the new
> leader after failover, avoiding syncing the logging directory is
> useful.
> At the same time, pg_rewind is intended to make the same copy of the
> source cluster as efficiently as possible which would include "all"
> directories if they exist in $PGDATA. By default logging_collector is
> off as well. I'd also think you would want to avoid putting the logs
> in $PGDATA to have smaller backups if you are using tools like
> pg_basebackup.
>

Splitting the logs from $PGDATA is definitely better. The question is whether
it's worth implementing this directly in core or if a prominent note in the
documentation would suffice.

>> On Wed, Jul 2, 2025 at 10:21 AM Japin Li <japinli@hotmail.com> wrote:
>>>
>>> Hi, John
>>>
>>> Thanks for updating the patch.
>>>
>>> 1.
>>> +/* Determine the type of file content (relation, WAL, or other) */
>>> +static file_content_type_t
>>> +getFileType(const char *path)
>>>
>>> Considering the existence of file_type_t, would getFileContentType() be a
>>> suitable function for handling file content types?
>
> Do you mean naming getFileType to getFileContentType?
>

Exactly!  It's confusing that getFileType() returns file_content_type_t
instead of file_type_t.

For v5 patch:

1.
We could simply use the global WalSegSz variable within decide_file_action(),
eliminating the need to pass wal_segsz_bytes as an argument.

2.
For last_common_segno, we could implement it similarly to WalSegSz, avoiding a
signature change for decide_file_actions() and decide_file_action().  I'm not
insisting on this approach, however.

--
Regards,
Japin Li



pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: A assert failure when initdb with track_commit_timestamp=on
Next
From: Shinya Kato
Date:
Subject: Re: Extend COPY FROM with HEADER to skip multiple lines