Re: Should we update the random_page_cost default value? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Should we update the random_page_cost default value? |
Date | |
Msg-id | CA+TgmoYD7BHAHt7_qBKonAb6CpEdKhksN5E2tvMgbbU4qvmgdw@mail.gmail.com Whole thread Raw |
In response to | Re: Should we update the random_page_cost default value? (Greg Sabino Mullane <htamfids@gmail.com>) |
Responses |
Re: Should we update the random_page_cost default value?
|
List | pgsql-hackers |
On Tue, Oct 7, 2025 at 3:15 PM Greg Sabino Mullane <htamfids@gmail.com> wrote: > I've been doing this sort of thing for clients a long time, and I always test both directions when I come across a querythat should be faster. For real-world queries, 99% of them have no change or improve with a lowered rpc, and 99% getworse via a raised rpc. So color me unconvinced. Color me equally unconvinced. Tomas seems quite convinced that we ought to be raising random_page_cost rather than lowering it, and that absolutely does not correspond to my experience in any way. It's not that I've actually tested raising the random page cost on very many systems, mind you. It's that my experience with real-world systems is that we tend to pick sequential scans when we should use an index, not the other way around. And obviously that problem will never be fixed by raising random_page_cost, since that will tend to favor sequential scans even more. Tomas's test involves scanning big tables that don't fit in RAM, and I agree that if that's most of what you do, you might benefit from a higher random_page_cost. However, even users who have some tables that are a lot bigger than RAM also tend to have frequently-accessed tables that are much smaller. For example, if you join a fact table to a bunch of dimension tables, the fact table may not fit in RAM, but the dimension tables probably do, and we're using the same random_page_cost for all of those tables. Moreover, we're least likely to make a wrong decision about what to do about the big table. It will often be the case that the query will be phrased such that a sequential scan on the big table is unavoidable, and if it is possible to avoid a sequential scan, we're going to want to do so almost regardless of random_page_cost because the number of pages accesses we can save will tend to be large. On a smaller table, it's more likely that both a full table scan and an index-based approach will be competitive, so there the value of random_page_cost will matter more to the final outcome. So, in this scenario, it's more important that the random_page_cost is close to accurate for the smaller tables rather than for the larger ones, and those are the ones that are most likely to benefit from caching. One of the planner changes that I think would be worth exploring is to have the system try to estimate the percentage of a given table that is likely to be in cache, and that could be configured via a reloption or estimated based on the size of the table (or maybe even the frequency of access, though that is fraught, since it can change precipitously on short time scales and is thus not great to use for planning). If we estimated that small tables are likely to be cached and bigger ones are likely to be less-cached or, if very big, completely uncached, then it would probably be right to raise random_page_cost as well. But without that kind of a change, the correct value of random_page_cost is the one that takes into account both the possibility of caching and the possibility of nothing being cached, and this test is rotated all the way toward one end of that spectrum. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: