Home > mailing lists

Re: Adding skip scan (including MDAM style range skip scan) to nbtree - Mailing list pgsql-hackers

From	BharatDB
Subject	Re: Adding skip scan (including MDAM style range skip scan) to nbtree
Date	September 8 06:54:46
Msg-id	CAAh00ETuqwZnJXKEAmW80sYxPez-Cc2p_ZzHx_O__RaZgq=SCg@mail.gmail.com Whole thread Raw
In response to	Adding skip scan (including MDAM style range skip scan) to nbtree (Peter Geoghegan <pg@bowt.ie>)
List	pgsql-hackers

Tree view

Dear Team,

With reference to the conversation ongoing in message ID : c562dc2a-6e36-46f3-a5ea-cd42eebd7118, I am writing to express my interest in contributing to the ongoing work on fixing the bug related to Adding skip scan (including MDAM style range skip scan) to nbtree.

I have been following this discussion on the regression related to commit 92fe23d93aa (skip scan in nbtree), and I ran some tests on my side to understand it better.

Observations :

I reproduced Tomas’s pgbench test with a simple workload on a single-column index,
```
SELECT count(*) FROM pgbench_accounts WHERE bid = 0;
```
Throughput with the skip-scan build was consistently ~40–50% lower compared to pre-patch builds.
After setting MALLOC_TOP_PAD_= 64MB, the performance gap disappeared almost entirely, confirming that the issue is allocator overhead from frequent malloc/free calls rather than the skip-scan logic itself.

Reproduction steps :

Here is the exact setup I used (very close to Tomas’s):


# init database
pg_ctl -D data init
pg_ctl -D data -l pg.log start
createdb test

# create table and index
psql test -c 'CREATE TABLE pgbench_accounts (aid int, bid int, abalance int, filler text);'
psql test -c 'CREATE INDEX ON pgbench_accounts(bid);'

# load pgbench data (scale 1)
pgbench -i -s 1 test

# custom query file (select.sql)
echo "SELECT count(*) FROM pgbench_accounts WHERE bid = 0;" > select.sql

# run benchmarks
for m in simple prepared; do  for c in 1 4 32; do    pgbench -n -f select.sql -M $m -T 10 -c $c -j $c test | grep tps;  done;
done

When running the above, the skip-scan build consistently showed ~50% lower tps compared to pre-patch, unless MALLOC_TOP_PAD_ was increased.

Thoughts on causes :

The increase in IndexAmRoutine size seems to push the cache structures past glibc’s small-heap thresholds, forcing more system allocations.
As Tomas noted, this is fragile: even if we drop the unused options support proc, future extensions to the struct could trigger the same issue again.

Suggestions / possible directions :

Short term (PG18) :
- If we want a low-risk change, removing the unused options support function may be acceptable, but I agree it feels like a temporary band-aid.
- Alternatively, shipping PG18 as-is with a release note warning about allocator sensitivity might be the safest option.
Longer term (PG19) :
- Explore static allocation of IndexAmRoutine instead of per-AM dynamic allocation. This should eliminate repeated malloc churn.
- Add a micro-benchmark or regression test that stresses catalog cache growth and malloc behavior (similar to pgbench with many partitions), so allocator-driven regressions are detected earlier.
- Consider documenting allocator tuning (MALLOC_TOP_PAD_) as a workaround until the structural fix lands.

Closing :

I don’t have a final patch proposal at this stage, but I would like to help test any candidate fixes or prototypes. If there’s interest, I can also contribute a self-contained benchmark script for regression testing.

Regards,
Athiyaman

pgsql-hackers by date:

From: jian he
Date: 08 September, 05:53:00
Subject: let ALTER TABLE DROP COLUMN drop whole-row referenced object

From: Alyona Vinter
Date: 08 September, 07:07:22
Subject: Re: Resetting recovery target parameters in pg_createsubscriber