Hi,
On 2025-10-08 13:23:33 -0400, Robert Haas wrote:
> On Wed, Oct 8, 2025 at 12:24 PM Tomas Vondra <tomas@vondra.me> wrote:
> > Isn't this somewhat what effective_cache_size was meant to do? That
> > obviously does not know about what fraction of individual tables is
> > cached, but it does impose size limit.
>
> Not really, because effective_cache_size only models the fact that
> when you iterate the same index scan within the execution of a single
> query, it will probably hit some pages more than once.
That's indeed today's use, but I wonder whether we ought to expand that. One
of the annoying things about *_page_cost effectively needing to be set "too
low" to handle caching effects is that that completely breaks down for larger
relations. Which has unwelcome effects like making a > memory sequential scan
seem like a reasonable plan.
It's a generally reasonable assumption that a scan processing a smaller amount
of data than effective_cache_size is more likely to cached than a scan that is
processing much more data than effective_cache_size. In the latter case,
assuming an accurate effective_cache_size, we *know* that a good portion of
the data cannot be cached.
Which leads me to wonder if we ought to interpolate between a "cheaper" access
cost for data << effective_cache_size and the "more real" access costs the
closer the amount of data gets to effective_cache_size.
Greetings,
Andres Freund