Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Should we update the random_page_cost default value? |
Date | |
Msg-id | e47dadd6-d283-42ff-b537-72e3aab7d349@vondra.me Whole thread Raw |
In response to | Re: Should we update the random_page_cost default value? (Tomas Vondra <tomas@vondra.me>) |
List | pgsql-hackers |
On 10/8/25 02:04, Tomas Vondra wrote: > > > On 10/7/25 23:08, Peter Geoghegan wrote: >> On Tue, Oct 7, 2025 at 3:46 PM Andres Freund <andres@anarazel.de> wrote: >>> I think this discrepancy is largely due to the fact that Tomas' is testing >>> with a cold cache (he has numbers for both), whereas most production workloads >>> have very high cache hit ratios. >> >> Any test case that fails to ensure that all relevant indexes at least >> have all of their internal B-Tree pages in shared_buffers is extremely >> unrealistic. That only requires that we cache only a fraction of 1% of >> all index pages, which is something that production workloads manage >> to do approximately all the time. >> >> I wonder how much the "cold" numbers would change if Tomas made just >> that one tweak (prewarming only the internal index pages). I don't >> think that there's a convenient way of running that experiment right >> now -- but it would be relatively easy to invent one. >> >> I'm not claiming that this extra step would make the "cold" numbers >> generally representative. Just that it might be enough on its own to >> get wildly better results, which would put the existing "cold" numbers >> in context. >> > > Why would you expect that? > > The index size is about 5% of the table size, so why would the internal > index pages make any meaningful difference beyond that? > > Also, it's true the test starts from "cold" cache, but is still uses > shared buffers - and I'd expect the internal pages to be very hot, > compared to the heap. Heap pages are read ~21x, but very far apart. So > the internal index pages are likely cached even in the cold runs. > > I can do some runs after prewarming the (whole) index, just to see if it > makes any difference. > I tried measuring this. I pre-warmed the internal index pages using pageinspect and pg_prewarm like this: 1) use bt_multi_page_stats to select non-leaf pages 2) use pg_prewarm to load these pages The query looks like this: SELECT 'SELECT pg_prewarm(''idx'', ''buffer'', ''main'', ' || blkno || ', ' || blkno || ');' AS sql FROM ( -- list of non-leaf index pages SELECT blkno FROM bt_multi_page_stats('idx', 1, (select relpages - 1 from pg_class where relname = 'idx')) where type != 'l' ) The generated SQL is written to a SQL script, and then executed after restarting the instance / dropping caches. As I expected, it made absolutely no difference. These are the results for the NVMe RAID0: seqscan (s) index scan (s) random_page_cost ----------------------------------------------------------------- prewarming: no 24 25462 49.3 prewarming: yes 24 25690 49.7 No surprise here. It's a little bit slower, but that's well within a run-to-run variability. There's only ~5000 non-leaf index pages, It'd be very surprising if it made any difference on a table with 23809524 pages (and when the indexscan does 500M page accesses). regards -- Tomas Vondra
pgsql-hackers by date: