On 10/8/25 19:23, Robert Haas wrote:
> On Wed, Oct 8, 2025 at 12:24 PM Tomas Vondra <tomas@vondra.me> wrote:
>> I don't think there's all that much disagreement, actually. This is a
>> pretty good illustration that we're using random_page_cost to account
>> for things other than "I/O cost" (like the expected cache hit ratios),
>> because we simply don't have a better knob for that.
>
> I agree with that conclusion.
>
>> Isn't this somewhat what effective_cache_size was meant to do? That
>> obviously does not know about what fraction of individual tables is
>> cached, but it does impose size limit.
>
> Not really, because effective_cache_size only models the fact that
> when you iterate the same index scan within the execution of a single
> query, it will probably hit some pages more than once. It doesn't have
> any idea that anything other than an index scan might hit the same
> pages more than once, and it doesn't have any idea that a query might
> find data in cache as a result of previous queries. Also, when it
> thinks the same page is accessed more than once, the cost of
> subsequent accesses is 0.
>
> I could be wrong, but I kind of doubt that there is any future in
> trying to generalize effective_cache_size. It's an extremely
> special-purpose mechanism, and what we need here is more of a general
> approach that can cut across the whole planner -- or alternatively we
> can decide that things are fine and that having rpc/spc implicitly
> model caching behavior is good enough.
>
>> I think in the past we mostly assumed we can't track cache size per
>> table, because we have no visibility into page cache. But maybe direct
>> I/O would change this?
>
> I think it's probably going to work out really poorly to try to use
> cache contents for planning. The plan may easily last much longer than
> the cache contents.
>
Why wouldn't that trigger invalidations / replanning just like other
types of stats? I imagine we'd regularly collect stats about what's
cached, etc. and we'd invalidate stale plans just like after ANALYZE.
Just a random idea, though. We'd need some sort of summary anyway, it's
not plausible each backend would collect this info on it's own.
regards
--
Tomas Vondra