Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Should we update the random_page_cost default value? |
Date | |
Msg-id | 46307eb5-f691-4251-b71e-f33bcf5e5375@vondra.me Whole thread Raw |
In response to | Re: Should we update the random_page_cost default value? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: Should we update the random_page_cost default value?
|
List | pgsql-hackers |
On 10/8/25 17:20, Robert Haas wrote: > On Tue, Oct 7, 2025 at 3:15 PM Greg Sabino Mullane <htamfids@gmail.com> wrote: >> I've been doing this sort of thing for clients a long time, and I always test both directions when I come across a querythat should be faster. For real-world queries, 99% of them have no change or improve with a lowered rpc, and 99% getworse via a raised rpc. So color me unconvinced. > > Color me equally unconvinced. Tomas seems quite convinced that we > ought to be raising random_page_cost rather than lowering it, and that > absolutely does not correspond to my experience in any way. It's not > that I've actually tested raising the random page cost on very many > systems, mind you. It's that my experience with real-world systems is > that we tend to pick sequential scans when we should use an index, not > the other way around. And obviously that problem will never be fixed > by raising random_page_cost, since that will tend to favor sequential > scans even more. > > Tomas's test involves scanning big tables that don't fit in RAM, and I > agree that if that's most of what you do, you might benefit from a > higher random_page_cost. However, even users who have some tables that > are a lot bigger than RAM also tend to have frequently-accessed tables > that are much smaller. For example, if you join a fact table to a > bunch of dimension tables, the fact table may not fit in RAM, but the > dimension tables probably do, and we're using the same > random_page_cost for all of those tables. Moreover, we're least likely > to make a wrong decision about what to do about the big table. It will > often be the case that the query will be phrased such that a > sequential scan on the big table is unavoidable, and if it is possible > to avoid a sequential scan, we're going to want to do so almost > regardless of random_page_cost because the number of pages accesses we > can save will tend to be large. On a smaller table, it's more likely > that both a full table scan and an index-based approach will be > competitive, so there the value of random_page_cost will matter more > to the final outcome. So, in this scenario, it's more important that > the random_page_cost is close to accurate for the smaller tables > rather than for the larger ones, and those are the ones that are most > likely to benefit from caching. > I don't think there's all that much disagreement, actually. This is a pretty good illustration that we're using random_page_cost to account for things other than "I/O cost" (like the expected cache hit ratios), because we simply don't have a better knob for that. > One of the planner changes that I think would be worth exploring is to > have the system try to estimate the percentage of a given table that > is likely to be in cache, and that could be configured via a reloption > or estimated based on the size of the table (or maybe even the > frequency of access, though that is fraught, since it can change > precipitously on short time scales and is thus not great to use for > planning). If we estimated that small tables are likely to be cached > and bigger ones are likely to be less-cached or, if very big, > completely uncached, then it would probably be right to raise > random_page_cost as well. But without that kind of a change, the > correct value of random_page_cost is the one that takes into account > both the possibility of caching and the possibility of nothing being > cached, and this test is rotated all the way toward one end of that > spectrum. > Isn't this somewhat what effective_cache_size was meant to do? That obviously does not know about what fraction of individual tables is cached, but it does impose size limit. I think in the past we mostly assumed we can't track cache size per table, because we have no visibility into page cache. But maybe direct I/O would change this? regards -- Tomas Vondra
pgsql-hackers by date: