Re: WAL Replay Buffer Invalidation Conflicts During Page Truncation on Read Replicas - Mailing list pgsql-performance

From Dharin Shah
Subject Re: WAL Replay Buffer Invalidation Conflicts During Page Truncation on Read Replicas
Date
Msg-id CAOj6k6dnMDNUrsLbOsZFequ1UJ8N95Nq9hVRBu-Bj0rrbSD0ug@mail.gmail.com
Whole thread Raw
In response to Re: WAL Replay Buffer Invalidation Conflicts During Page Truncation on Read Replicas  (Álvaro Herrera <alvherre@kurilemu.de>)
List pgsql-performance
Thanks Alvaro,

I read the thread and actually disabled truncate on my table with a large toast table which mitigated the issue. Unsure what happens with the empty pages now, I guess they would be reused for new inserts.
I would like to see if there are better opportunities to improve this truncation process. Perhaps identify why we need this arbitrary threshold to determine a full buffer scan?

/*
* This is the size (in the number of blocks) above which we scan the
* entire buffer pool to remove the buffers for all the pages of relation
* being dropped. For the relations with size below this threshold, we find
* the buffers by doing lookups in BufMapping table.
*/
#define BUF_DROP_FULL_SCAN_THRESHOLD (uint64) (NBuffers / 32)

https://github.com/postgres/postgres/blob/e03c95287764158941d317972a332565729b6af2/src/backend/storage/buffer/bufmgr.c#L91

As this can cause significant issues as we scale memory for shared buffers. (Very often the case with Aurora)

Thanks,
Dharin

On Tue, Jul 8, 2025 at 4:05 PM Álvaro Herrera <alvherre@kurilemu.de> wrote:
On 2025-Jul-08, Dharin Shah wrote:

> *Problem Summary:*
>   WAL replay of relation truncation operations on read replicas triggers
> buffer invalidation that requires AccessExclusive locks, blocking
> concurrent read queries for extended periods.

Hmm, sounds like disabling truncate of the TOAST relation by vacuum
could help.  We have configuration options for that -- one is per table
and was added in Postgres 12, changed with
  ALTER TABLE ... SET (vacuum_truncate=off);
I think you can also do
  ALTER TABLE ... SET (toast.vacuum_truncate=off);
to disable it for the TOAST table.

Postgres 18 added a global parameter of the same name which you can
change in postgresql.conf, and from the commit message it sound like it
was added to cope with scenarios precisely like yours.  But if for you
it's always the same toast table (or a small number of them) then I
would think it'd be better to change the per-table param for those.
(Also, this won't require that you upgrade to Postgres 18 just yet,
which sounds particularly helpful in case Aurora doesn't offer that version.)

Here it's the commit message for the change in 18, see the "Discussion"
link for more info:

commit 0164a0f9ee12e0eff9e4c661358a272ecd65c2d4
Author:     Nathan Bossart <nathan@postgresql.org> []
AuthorDate: Thu Mar 20 10:16:50 2025 -0500
CommitDate: Thu Mar 20 10:16:50 2025 -0500

    Add vacuum_truncate configuration parameter.

    This new parameter works just like the storage parameter of the
    same name: if set to true (which is the default), autovacuum and
    VACUUM attempt to truncate any empty pages at the end of the table.
    It is primarily intended to help users avoid locking issues on hot
    standbys.  The setting can be overridden with the storage parameter
    or VACUUM's TRUNCATE option.

    Since there's presently no way to determine whether a Boolean
    storage parameter is explicitly set or has just picked up the
    default value, this commit also introduces an isset_offset member
    to relopt_parse_elt.

    Suggested-by: Will Storey <will@summercat.com>
    Author: Nathan Bossart <nathandbossart@gmail.com>
    Co-authored-by: Gurjeet Singh <gurjeet@singh.im>
    Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at>
    Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com>
    Reviewed-by: Robert Treat <rob@xzilla.net>
    Discussion: https://postgr.es/m/Z2DE4lDX4tHqNGZt%40dev.null


--
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Al principio era UNIX, y UNIX habló y dijo: "Hello world\n".
No dijo "Hello New Jersey\n", ni "Hello USA\n".

pgsql-performance by date:

Previous
From: Álvaro Herrera
Date:
Subject: Re: WAL Replay Buffer Invalidation Conflicts During Page Truncation on Read Replicas