Re: Make TID Scans recalculate the TIDs less often - Mailing list pgsql-hackers

From Chao Li
Subject Re: Make TID Scans recalculate the TIDs less often
Date
Msg-id CAEoWx2nVdTGPQDOW20S9NYLqj5_SwfgP2Op=Eqh2q7jm7iLo=g@mail.gmail.com
Whole thread Raw
In response to Re: Make TID Scans recalculate the TIDs less often  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers

On Sep 17, 2025, at 18:31, David Rowley <dgrowleyml@gmail.com> wrote:

On Wed, 17 Sept 2025 at 22:13, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
Occasionally (when dealing with corruption) I do stuff like

begin;
update public.tablename set description = description where ctid in (select ('('||b.blkno::text||','||(x::text)||')')::tid from generate_series(1,300) x, blocks b);

in some forms they are actually joins. Also, pageinspecting things out is always a join (CTAS a copy of table rows that have particular infomask bits). But, fortunately, it's not that frequent case. It's always "plumbing", not a "regular database usage".

Thanks for sharing that one.  If that UPDATE did do a Nested Loop join
with a TID Scan on the inner side, the optimisation I have in the
patch *wouldn't* be applied as a parameter is changing that genuinely
does need the TidList to be recalculated over again.

David


I tried to trace this patch again today.

With David’s example with “million" and “empty" tables, TidScan.tidparamids is NULL, so that in ExecRescanTidScan(), bms_overlap() will return false, and TidList is not free-ed.

Same thing for Andrey’s example.

Based on David’s example, I build this SQL:

```
# Insert tuples to empty first
select sum(c) from million m left join lateral (select a c from empty e where e.ctid = m.ctid) on 1=1;
```

With this SQL, outer scan passes a parameter to inter TidScan, so that “tidparamids” is not NULL now. Then I noticed one thing: now it needs to re-evaluate TidList. However, the new TidList always has the same length of the old TidList, so that we don’t need to free the old TidList, instead, we can actually reuse memory of the old TidList, which could be a small optimization.

For that, I created 0002. But TBH, with 0002, and with psql timing on, I don’t see obvious improvement wrt execution time.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/




Attachment

pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: [Patch] add new parameter to pg_replication_origin_session_setup
Next
From: Anthonin Bonnefoy
Date:
Subject: Re: [PATCH] jit: fix build with LLVM-21