Re: Batch TIDs lookup in ambulkdelete - Mailing list pgsql-hackers

From John Naylor
Subject Re: Batch TIDs lookup in ambulkdelete
Date
Msg-id CANWCAZbiJcwSBCczLfbfiPe1mET+V9PjTZv5VvUBwarLvx1Hfg@mail.gmail.com
Whole thread Raw
In response to Batch TIDs lookup in ambulkdelete  (Masahiko Sawada <sawada.mshk@gmail.com>)
List pgsql-hackers
On Thu, May 1, 2025 at 5:36 AM Masahiko Sawada <sawada.mshk@gmail.com> wrote:

>                   HEAD         PATCHED    DIFF
> case-1:     3,021 ms      2.818 ms    93.29%
> case-2:     5, 697 ms     5.545 ms    97.34%
> case-3:     2,833 ms      2.790 ms    98.48%
> case-4:     2,564 ms      2.279 ms    88.86%
> case-5:     4,657 ms      4.706 ms   101.04%

1 and 4 look significant -- do the other cases have reproducible
differences or is it just noise?

> Here are the summary of each attached patch:
>
> 0001: Introduce TIdStoreIsMemberMulti() where we can do IsMember check
> for multiple TIDs in one function call. If the given TIDs are sorted
> (at least in block number), we can save radix tree lookup for the same
> page entry.

My only comment is that TidStoreIsMember() is now unused in core (or
maybe just the tests?). It seems like we could just change the API for
it rather than introduce a new function?

> 0003: Use batch TIDs lookup in btree index bulk-deletion.
>
> In patch 0003, we implement batch TID lookups for both each
> deduplicated index tuple and remaining all regular index tuples, which
> seems to be the most straightforward approach.

Seems like a good approach. btvacuumpage() needs to sort if there is a
mix of posting tuples and regular index tuples. Was that covered by
any of the tests above?

> While further
> optimizations are possible, such as performing batch TID lookups for
> all index tuples on a single page, these could introduce additional
> overhead from sorting and re-sorting TIDs. Moreover, considering that
> radix tree lookups are relatively inexpensive, the benefits of sorting
> TIDs and using TidStoreIsMemberMulti() might be minimal. Nevertheless,
> these potential optimizations warrant further evaluation to determine
> their actual impact on performance.

My guess is that always sorting by TID and than back by index tuple
offset is too much overhead to be worth it, but I'm not sure.

--
John Naylor
Amazon Web Services



pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: Non-reproducible AIO failure
Next
From: Amit Kapila
Date:
Subject: Re: Conflict detection for update_deleted in logical replication