Thread: Don't keep closed WAL segment in page cache after replay
Hi, I've been looking at page cache usage as some of our replicas were under memory pressure (no inactive pages available) which led to WAL replay lag as the recovery process had to read from disk. One thing I've noticed was that the last WAL files are in the pagecache even after having been replayed. This can be checked with vmtouch: vmtouch pg_wal/* Files: 141 Directories: 2 Resident Pages: 290816/290816 1G/1G 100% And page-types shows a replayed WAL file in the active LRU: page-types -Cl -f 000000010000001B00000076 page-count MB long-symbolic-flags 4096 16 referenced,uptodate,lru,active From my understanding, once replayed on a replica, WAL segment files won't be re-read. So keeping it in the pagecache seems like an unnecessary strain on the memory (more so that they appear to be in the active LRU). This patch adds a POSIX_FADV_DONTNEED before closing a WAL segment, immediately releasing cached pages. With this, the page cache usage of pg_wal stays under the wal_segment_size: vmtouch pg_wal/* Files: 88 Directories: 2 Resident Pages: 3220/262144 12M/1G 1.23% Regards, Anthonin
Attachment
On 2025/07/02 19:10, Anthonin Bonnefoy wrote: > Hi, > > I've been looking at page cache usage as some of our replicas were > under memory pressure (no inactive pages available) which led to WAL > replay lag as the recovery process had to read from disk. One thing > I've noticed was that the last WAL files are in the pagecache even > after having been replayed. > > This can be checked with vmtouch: > vmtouch pg_wal/* > Files: 141 > Directories: 2 > Resident Pages: 290816/290816 1G/1G 100% > > And page-types shows a replayed WAL file in the active LRU: > page-types -Cl -f 000000010000001B00000076 > page-count MB long-symbolic-flags > 4096 16 referenced,uptodate,lru,active > > From my understanding, once replayed on a replica, WAL segment files > won't be re-read. So keeping it in the pagecache seems like an > unnecessary strain on the memory (more so that they appear to be in > the active LRU). WAL files that have already been replayed can still be read again for WAL archiving (if archive_mode = always) or for replication (if the standby is acting as a streaming replication sender or a logical replication publisher). No? > This patch adds a POSIX_FADV_DONTNEED before closing a WAL segment, > immediately releasing cached pages. Maybe we should do this only on a standby where WAL archiving isn't working and it isn't acting as a sender or publisher. Regards, -- Fujii Masao NTT DATA Japan Corporation
On 2025/07/02 22:24, Fujii Masao wrote: > > > On 2025/07/02 19:10, Anthonin Bonnefoy wrote: >> Hi, >> >> I've been looking at page cache usage as some of our replicas were >> under memory pressure (no inactive pages available) which led to WAL >> replay lag as the recovery process had to read from disk. One thing >> I've noticed was that the last WAL files are in the pagecache even >> after having been replayed. >> >> This can be checked with vmtouch: >> vmtouch pg_wal/* >> Files: 141 >> Directories: 2 >> Resident Pages: 290816/290816 1G/1G 100% >> >> And page-types shows a replayed WAL file in the active LRU: >> page-types -Cl -f 000000010000001B00000076 >> page-count MB long-symbolic-flags >> 4096 16 referenced,uptodate,lru,active >> >> From my understanding, once replayed on a replica, WAL segment files >> won't be re-read. So keeping it in the pagecache seems like an >> unnecessary strain on the memory (more so that they appear to be in >> the active LRU). > > WAL files that have already been replayed can still be read again > for WAL archiving (if archive_mode = always) or for replication > (if the standby is acting as a streaming replication sender or > a logical replication publisher). No? Also, the WAL summarizer might read those WAL files as well. Regards, -- Fujii Masao NTT DATA Japan Corporation