Thread: Don't keep closed WAL segment in page cache after replay

Don't keep closed WAL segment in page cache after replay

From
Anthonin Bonnefoy
Date:
Hi,

I've been looking at page cache usage as some of our replicas were
under memory pressure (no inactive pages available) which led to WAL
replay lag as the recovery process had to read from disk. One thing
I've noticed was that the last WAL files are in the pagecache even
after having been replayed.

This can be checked with vmtouch:
vmtouch  pg_wal/*
           Files: 141
     Directories: 2
  Resident Pages: 290816/290816  1G/1G  100%

And page-types shows a replayed WAL file in the active LRU:
page-types  -Cl -f 000000010000001B00000076
page-count       MB  long-symbolic-flags
      4096       16  referenced,uptodate,lru,active

From my understanding, once replayed on a replica, WAL segment files
won't be re-read. So keeping it in the pagecache seems like an
unnecessary strain on the memory (more so that they appear to be in
the active LRU).

This patch adds a POSIX_FADV_DONTNEED before closing a WAL segment,
immediately releasing cached pages. With this, the page cache usage of
pg_wal stays under the wal_segment_size:

vmtouch  pg_wal/*
           Files: 88
     Directories: 2
  Resident Pages: 3220/262144  12M/1G  1.23%

Regards,
Anthonin

Attachment

Re: Don't keep closed WAL segment in page cache after replay

From
Fujii Masao
Date:

On 2025/07/02 19:10, Anthonin Bonnefoy wrote:
> Hi,
> 
> I've been looking at page cache usage as some of our replicas were
> under memory pressure (no inactive pages available) which led to WAL
> replay lag as the recovery process had to read from disk. One thing
> I've noticed was that the last WAL files are in the pagecache even
> after having been replayed.
> 
> This can be checked with vmtouch:
> vmtouch  pg_wal/*
>             Files: 141
>       Directories: 2
>    Resident Pages: 290816/290816  1G/1G  100%
> 
> And page-types shows a replayed WAL file in the active LRU:
> page-types  -Cl -f 000000010000001B00000076
> page-count       MB  long-symbolic-flags
>        4096       16  referenced,uptodate,lru,active
> 
>  From my understanding, once replayed on a replica, WAL segment files
> won't be re-read. So keeping it in the pagecache seems like an
> unnecessary strain on the memory (more so that they appear to be in
> the active LRU).

WAL files that have already been replayed can still be read again
for WAL archiving (if archive_mode = always) or for replication
(if the standby is acting as a streaming replication sender or
a logical replication publisher). No?


> This patch adds a POSIX_FADV_DONTNEED before closing a WAL segment,
> immediately releasing cached pages.

Maybe we should do this only on a standby where WAL archiving
isn't working and it isn't acting as a sender or publisher.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation




Re: Don't keep closed WAL segment in page cache after replay

From
Fujii Masao
Date:

On 2025/07/02 22:24, Fujii Masao wrote:
> 
> 
> On 2025/07/02 19:10, Anthonin Bonnefoy wrote:
>> Hi,
>>
>> I've been looking at page cache usage as some of our replicas were
>> under memory pressure (no inactive pages available) which led to WAL
>> replay lag as the recovery process had to read from disk. One thing
>> I've noticed was that the last WAL files are in the pagecache even
>> after having been replayed.
>>
>> This can be checked with vmtouch:
>> vmtouch  pg_wal/*
>>             Files: 141
>>       Directories: 2
>>    Resident Pages: 290816/290816  1G/1G  100%
>>
>> And page-types shows a replayed WAL file in the active LRU:
>> page-types  -Cl -f 000000010000001B00000076
>> page-count       MB  long-symbolic-flags
>>        4096       16  referenced,uptodate,lru,active
>>
>>  From my understanding, once replayed on a replica, WAL segment files
>> won't be re-read. So keeping it in the pagecache seems like an
>> unnecessary strain on the memory (more so that they appear to be in
>> the active LRU).
> 
> WAL files that have already been replayed can still be read again
> for WAL archiving (if archive_mode = always) or for replication
> (if the standby is acting as a streaming replication sender or
> a logical replication publisher). No?

Also, the WAL summarizer might read those WAL files as well.

Regards,

-- 
Fujii Masao
NTT DATA Japan Corporation