Thread: WAL Replay Buffer Invalidation Conflicts During Page Truncation on Read Replicas
WAL Replay Buffer Invalidation Conflicts During Page Truncation on Read Replicas
From
Dharin Shah
Date:
Hello All,
I'm experiencing significant read query blocking on Aurora PostgreSQL read replicas during VACUUM relation truncation, particularly with TOAST tables. This affects a high-traffic service (~3000 req/sec) and causes application downtime.
Problem Summary:
WAL replay of relation truncation operations on read replicas triggers buffer invalidation that requires AccessExclusive locks, blocking concurrent read queries for extended periods.
Environment Details:
- Aurora PostgreSQL (read replica setup)
- Workload: Async writes to primary, read-only queries on replica
- TOAST table with ~4KB average compressed column size
- maintenance_work_mem: 2087MB
Observed Behavior:
[23541]: select gzipped_dto from <table> dl1_0 where (dl1_0.entity_id,dl1_0.language_code) in (($1,$2))
2025-06-28 11:57:34 UTC: process 23574 still waiting for AccessShareLock on relation 20655 after 1000.035 ms
The blocking coincides with substantial TOAST table truncation:
2025-06-28 11:57:39 UTC::@:[8399]:LOG: automatic vacuum of table "delivery.pg_toast.pg_toast_20652": index scans: 1
pages: 212964 removed, 434375 remain, 78055 scanned (12.06% of total)
tuples: 198922 removed, 2015440 remain, 866 are dead but not yet removable
removable cutoff: 1066590201, which was 783 XIDs old when operation ended
frozen: 3 pages from table (0.00% of total) had 19 tuples frozen
index scan needed: 39600 pages from table (6.12% of total) had 199413 dead item identifiers removed
index "pg_toast_20652_index": pages: 16131 in total, 35 newly deleted, 7574 currently deleted, 7539 reusable
I/O timings: read: 173469.911 ms, write: 0.000 ms
avg read rate: 9.198 MB/s, avg write rate: 0.000 MB/s
buffer usage: 220870 hits, 213040 misses, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 2.97 s, system: 1.86 s, elapsed: 180.95 s
Analysis:
The vacuum reclaimed 212,964 pages (33% of the relation), indicating legitimate space reclamation. With maintenance_work_mem set to 2087MB, memory constraints aren't limiting the vacuum process. However, WAL replay of the truncation
operation on the read replica requires invalidating these pages from shared_buffers, which conflicts with ongoing read queries.
Questions for Discussion:
1. Batch Buffer Invalidation: Could buffer invalidation during WAL replay be batched or deferred to reduce lock contention duration?
2. Replica-Specific Truncation Policy: Should read replicas have different truncation thresholds (REL_TRUNCATE_MINIMUM/REL_TRUNCATE_FRACTION) to balance space reclamation against query availability?
3. Cloud-Native Considerations: In cloud environments like Aurora with separate storage layers, is immediate buffer invalidation during truncation replay necessary, or could this be optimized?
4. Lock Duration Optimization: The current truncation process holds AccessExclusive locks during the entire invalidation. Could this be shortened through incremental processing?
Potential Approaches:
- Implement configurable truncation behavior for standby servers
- Add batching/throttling to buffer invalidation during WAL replay
- Provide a way to defer truncation replay during high read activity periods
This issue particularly affects TOAST tables due to their chunked storage pattern creating more opportunities for dead space, but the core problem applies to any significant relation truncation on read replicas.
Has anyone else encountered this issue? Are there existing configuration options or patches that address WAL replay buffer invalidation conflicts?
I'm experiencing significant read query blocking on Aurora PostgreSQL read replicas during VACUUM relation truncation, particularly with TOAST tables. This affects a high-traffic service (~3000 req/sec) and causes application downtime.
Problem Summary:
WAL replay of relation truncation operations on read replicas triggers buffer invalidation that requires AccessExclusive locks, blocking concurrent read queries for extended periods.
Environment Details:
- Aurora PostgreSQL (read replica setup)
- Workload: Async writes to primary, read-only queries on replica
- TOAST table with ~4KB average compressed column size
- maintenance_work_mem: 2087MB
Observed Behavior:
[23541]: select gzipped_dto from <table> dl1_0 where (dl1_0.entity_id,dl1_0.language_code) in (($1,$2))
2025-06-28 11:57:34 UTC: process 23574 still waiting for AccessShareLock on relation 20655 after 1000.035 ms
The blocking coincides with substantial TOAST table truncation:
2025-06-28 11:57:39 UTC::@:[8399]:LOG: automatic vacuum of table "delivery.pg_toast.pg_toast_20652": index scans: 1
pages: 212964 removed, 434375 remain, 78055 scanned (12.06% of total)
tuples: 198922 removed, 2015440 remain, 866 are dead but not yet removable
removable cutoff: 1066590201, which was 783 XIDs old when operation ended
frozen: 3 pages from table (0.00% of total) had 19 tuples frozen
index scan needed: 39600 pages from table (6.12% of total) had 199413 dead item identifiers removed
index "pg_toast_20652_index": pages: 16131 in total, 35 newly deleted, 7574 currently deleted, 7539 reusable
I/O timings: read: 173469.911 ms, write: 0.000 ms
avg read rate: 9.198 MB/s, avg write rate: 0.000 MB/s
buffer usage: 220870 hits, 213040 misses, 0 dirtied
WAL usage: 0 records, 0 full page images, 0 bytes
system usage: CPU: user: 2.97 s, system: 1.86 s, elapsed: 180.95 s
Analysis:
The vacuum reclaimed 212,964 pages (33% of the relation), indicating legitimate space reclamation. With maintenance_work_mem set to 2087MB, memory constraints aren't limiting the vacuum process. However, WAL replay of the truncation
operation on the read replica requires invalidating these pages from shared_buffers, which conflicts with ongoing read queries.
Questions for Discussion:
1. Batch Buffer Invalidation: Could buffer invalidation during WAL replay be batched or deferred to reduce lock contention duration?
2. Replica-Specific Truncation Policy: Should read replicas have different truncation thresholds (REL_TRUNCATE_MINIMUM/REL_TRUNCATE_FRACTION) to balance space reclamation against query availability?
3. Cloud-Native Considerations: In cloud environments like Aurora with separate storage layers, is immediate buffer invalidation during truncation replay necessary, or could this be optimized?
4. Lock Duration Optimization: The current truncation process holds AccessExclusive locks during the entire invalidation. Could this be shortened through incremental processing?
Potential Approaches:
- Implement configurable truncation behavior for standby servers
- Add batching/throttling to buffer invalidation during WAL replay
- Provide a way to defer truncation replay during high read activity periods
This issue particularly affects TOAST tables due to their chunked storage pattern creating more opportunities for dead space, but the core problem applies to any significant relation truncation on read replicas.
Has anyone else encountered this issue? Are there existing configuration options or patches that address WAL replay buffer invalidation conflicts?
Thanks for any insights.
Thanks,
Dharin
Thanks,
Dharin
Re: WAL Replay Buffer Invalidation Conflicts During Page Truncation on Read Replicas
From
Álvaro Herrera
Date:
On 2025-Jul-08, Dharin Shah wrote: > *Problem Summary:* > WAL replay of relation truncation operations on read replicas triggers > buffer invalidation that requires AccessExclusive locks, blocking > concurrent read queries for extended periods. Hmm, sounds like disabling truncate of the TOAST relation by vacuum could help. We have configuration options for that -- one is per table and was added in Postgres 12, changed with ALTER TABLE ... SET (vacuum_truncate=off); I think you can also do ALTER TABLE ... SET (toast.vacuum_truncate=off); to disable it for the TOAST table. Postgres 18 added a global parameter of the same name which you can change in postgresql.conf, and from the commit message it sound like it was added to cope with scenarios precisely like yours. But if for you it's always the same toast table (or a small number of them) then I would think it'd be better to change the per-table param for those. (Also, this won't require that you upgrade to Postgres 18 just yet, which sounds particularly helpful in case Aurora doesn't offer that version.) Here it's the commit message for the change in 18, see the "Discussion" link for more info: commit 0164a0f9ee12e0eff9e4c661358a272ecd65c2d4 Author: Nathan Bossart <nathan@postgresql.org> [] AuthorDate: Thu Mar 20 10:16:50 2025 -0500 CommitDate: Thu Mar 20 10:16:50 2025 -0500 Add vacuum_truncate configuration parameter. This new parameter works just like the storage parameter of the same name: if set to true (which is the default), autovacuum and VACUUM attempt to truncate any empty pages at the end of the table. It is primarily intended to help users avoid locking issues on hot standbys. The setting can be overridden with the storage parameter or VACUUM's TRUNCATE option. Since there's presently no way to determine whether a Boolean storage parameter is explicitly set or has just picked up the default value, this commit also introduces an isset_offset member to relopt_parse_elt. Suggested-by: Will Storey <will@summercat.com> Author: Nathan Bossart <nathandbossart@gmail.com> Co-authored-by: Gurjeet Singh <gurjeet@singh.im> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Robert Treat <rob@xzilla.net> Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ Al principio era UNIX, y UNIX habló y dijo: "Hello world\n". No dijo "Hello New Jersey\n", ni "Hello USA\n".
Re: WAL Replay Buffer Invalidation Conflicts During Page Truncation on Read Replicas
From
Dharin Shah
Date:
Thanks Alvaro,
I read the thread and actually disabled truncate on my table with a large toast table which mitigated the issue. Unsure what happens with the empty pages now, I guess they would be reused for new inserts.
I would like to see if there are better opportunities to improve this truncation process. Perhaps identify why we need this arbitrary threshold to determine a full buffer scan?
https://github.com/postgres/postgres/blob/e03c95287764158941d317972a332565729b6af2/src/backend/storage/buffer/bufmgr.c#L91
/*
* This is the size (in the number of blocks) above which we scan the
* entire buffer pool to remove the buffers for all the pages of relation
* being dropped. For the relations with size below this threshold, we find
* the buffers by doing lookups in BufMapping table.
*/
#define BUF_DROP_FULL_SCAN_THRESHOLD (uint64) (NBuffers / 32)
As this can cause significant issues as we scale memory for shared buffers. (Very often the case with Aurora)
Thanks,
Thanks,
Dharin
On Tue, Jul 8, 2025 at 4:05 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Jul-08, Dharin Shah wrote:
> *Problem Summary:*
> WAL replay of relation truncation operations on read replicas triggers
> buffer invalidation that requires AccessExclusive locks, blocking
> concurrent read queries for extended periods.
Hmm, sounds like disabling truncate of the TOAST relation by vacuum
could help. We have configuration options for that -- one is per table
and was added in Postgres 12, changed with
ALTER TABLE ... SET (vacuum_truncate=off);
I think you can also do
ALTER TABLE ... SET (toast.vacuum_truncate=off);
to disable it for the TOAST table.
Postgres 18 added a global parameter of the same name which you can
change in postgresql.conf, and from the commit message it sound like it
was added to cope with scenarios precisely like yours. But if for you
it's always the same toast table (or a small number of them) then I
would think it'd be better to change the per-table param for those.
(Also, this won't require that you upgrade to Postgres 18 just yet,
which sounds particularly helpful in case Aurora doesn't offer that version.)
Here it's the commit message for the change in 18, see the "Discussion"
link for more info:
commit 0164a0f9ee12e0eff9e4c661358a272ecd65c2d4
Author: Nathan Bossart <nathan@postgresql.org> []
AuthorDate: Thu Mar 20 10:16:50 2025 -0500
CommitDate: Thu Mar 20 10:16:50 2025 -0500
Add vacuum_truncate configuration parameter.
This new parameter works just like the storage parameter of the
same name: if set to true (which is the default), autovacuum and
VACUUM attempt to truncate any empty pages at the end of the table.
It is primarily intended to help users avoid locking issues on hot
standbys. The setting can be overridden with the storage parameter
or VACUUM's TRUNCATE option.
Since there's presently no way to determine whether a Boolean
storage parameter is explicitly set or has just picked up the
default value, this commit also introduces an isset_offset member
to relopt_parse_elt.
Suggested-by: Will Storey <will@summercat.com>
Author: Nathan Bossart <nathandbossart@gmail.com>
Co-authored-by: Gurjeet Singh <gurjeet@singh.im>
Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
Reviewed-by: Robert Treat <rob@xzilla.net>
Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null
--
Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".