Home > mailing lists

Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Should we update the random_page_cost default value?
Date	October 8 13:12:10
Msg-id	e47dadd6-d283-42ff-b537-72e3aab7d349@vondra.me Whole thread Raw
In response to	Re: Should we update the random_page_cost default value? (Tomas Vondra <tomas@vondra.me>)
List	pgsql-hackers

Tree view

On 10/8/25 02:04, Tomas Vondra wrote:
> 
> 
> On 10/7/25 23:08, Peter Geoghegan wrote:
>> On Tue, Oct 7, 2025 at 3:46 PM Andres Freund <andres@anarazel.de> wrote:
>>> I think this discrepancy is largely due to the fact that Tomas' is testing
>>> with a cold cache (he has numbers for both), whereas most production workloads
>>> have very high cache hit ratios.
>>
>> Any test case that fails to ensure that all relevant indexes at least
>> have all of their internal B-Tree pages in shared_buffers is extremely
>> unrealistic. That only requires that we cache only a fraction of 1% of
>> all index pages, which is something that production workloads manage
>> to do approximately all the time.
>>
>> I wonder how much the "cold" numbers would change if Tomas made just
>> that one tweak (prewarming only the internal index pages). I don't
>> think that there's a convenient way of running that experiment right
>> now -- but it would be relatively easy to invent one.
>>
>> I'm not claiming that this extra step would make the "cold" numbers
>> generally representative. Just that it might be enough on its own to
>> get wildly better results, which would put the existing "cold" numbers
>> in context.
>>
> 
> Why would you expect that?
> 
> The index size is about 5% of the table size, so why would the internal
> index pages make any meaningful difference beyond that?
> 
> Also, it's true the test starts from "cold" cache, but is still uses
> shared buffers - and I'd expect the internal pages to be very hot,
> compared to the heap. Heap pages are read ~21x, but very far apart. So
> the internal index pages are likely cached even in the cold runs.
> 
> I can do some runs after prewarming the (whole) index, just to see if it
> makes any difference.
> 

I tried measuring this. I pre-warmed the internal index pages using
pageinspect and pg_prewarm like this:

1) use bt_multi_page_stats to select non-leaf pages
2) use pg_prewarm to load these pages

The query looks like this:

  SELECT 'SELECT pg_prewarm(''idx'', ''buffer'', ''main'', ' || blkno ||
', ' || blkno || ');' AS sql
  FROM (
    -- list of non-leaf index pages
    SELECT blkno FROM bt_multi_page_stats('idx', 1, (select relpages - 1
from pg_class where relname = 'idx')) where type != 'l'
  )

The generated SQL is written to a SQL script, and then executed after
restarting the instance / dropping caches.

As I expected, it made absolutely no difference. These are the results
for the NVMe RAID0:

                 seqscan (s)      index scan (s)  random_page_cost
 -----------------------------------------------------------------
  prewarming: no         24               25462             49.3
  prewarming: yes        24               25690             49.7

No surprise here. It's a little bit slower, but that's well within a
run-to-run variability.

There's only ~5000 non-leaf index pages, It'd be very surprising if it
made any difference on a table with 23809524 pages (and when the
indexscan does 500M page accesses).


regards

-- 
Tomas Vondra

pgsql-hackers by date:

From: David Rowley
Date: 08 October, 13:09:30
Subject: VACUUM (PARALLEL) option processing not using DefElem the way it was intended

From: vignesh C
Date: 08 October, 13:19:57
Subject: Re: Logical Replication of sequences

Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers

Previous

Next