Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers
From | Tomas Vondra |
---|---|
Subject | Re: Protect syscache from bloating with negative cache entries |
Date | |
Msg-id | 9ebe0ac4-b59e-6397-0586-4e7125de7d5b@2ndquadrant.com Whole thread Raw |
In response to | RE: Protect syscache from bloating with negative cache entries ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>) |
Responses |
RE: Protect syscache from bloating with negative cache entries
Re: Protect syscache from bloating with negative cache entries |
List | pgsql-hackers |
On 2/12/19 1:49 AM, Tsunakawa, Takayuki wrote: > From: Tomas Vondra <tomas.vondra@2ndquadrant.com> >> I'm not sure what you mean by "necessary" and "unnecessary" here. What >> matters is how often an entry is accessed - if it's accessed often, it makes sense >> to keep it in the cache. Otherwise evict it. Entries not accessed for 5 minutes are >> clearly not accessed very often, so and getting rid of them will not hurt the >> cache hit ratio very much. > > Right, "necessary" and "unnecessary" were imprecise, and it matters > how frequent the entries are accessed. What made me say "unnecessary" > is the pg_statistic entry left by CREATE/DROP TEMP TABLE which is never > accessed again. > OK, understood. >> So I agree with Robert a time-based approach should work well here. It does >> not have the issues with setting exact syscache size limit, it's kinda self-adaptive >> etc. >> >> In a way, this is exactly what the 5 minute rule [1] says about caching. >> >> [1] http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf > > Then, can we just set 5min to syscache_prune_min_age? Otherwise, > how can users set the expiration period? > I believe so. >>> The idea of expiration applies to the case where we want possibly >>> stale entries to vanish and load newer data upon the next access. >>> For example, the TTL (time-to-live) of Memcached, Redis, DNS, ARP. >>> Is the catcache based on the same idea with them? No. >>> >> >> I'm not sure what has this to do with those other databases. > > I meant that the time-based eviction is not very good, because it > could cause less frequently entries to vanish even when memory is not > short. Time-based eviction reminds me of Memcached, Redis, DNS, etc. > that evicts long-lived entries to avoid stale data, not to free space > for other entries. I think size-based eviction is sufficient like > shared_buffers, OS page cache, CPU cache, disk cache, etc. > Right. But the logic behind time-based approach is that evicting such entries should not cause any issues exactly because they are accessed infrequently. It might incur some latency when we need them for the first time after the eviction, but IMHO that's acceptable (although I see Andres did not like that). FWIW we might even evict entries after some time passes since inserting them into the cache - that's what memcached et al do, IIRC. The logic is that frequently accessed entries will get immediately loaded back (thus keeping cache hit ratio high). But there are reasons why the other dbs do that - like not having any cache invalidation (unlike us). That being said, having a "minimal size" threshold before starting with the time-based eviction may be a good idea. >> I'm certainly worried about the performance aspect of it. The syscache is in a >> plenty of hot paths, so adding overhead may have significant impact. But that >> depends on how complex the eviction criteria will be. > > The LRU chain manipulation, dlist_move_head() in > SearchCatCacheInternal(), may certainly incur some overhead. If it has > visible impact, then we can do the manipulation only when the user set > an upper limit on the cache size. > I think the benchmarks done so far suggest the extra overhead is within noise. So unless we manage to make it much more expensive, we should be OK I think. >> And then there may be cases conflicting with the criteria, i.e. running into >> just-evicted entries much more often. This is the issue with the initially >> proposed hard limits on cache sizes, where it'd be trivial to under-size it just a >> little bit. > > In that case, the user can just enlarge the catcache. > IMHO the main issues with this are (a) It's not quite clear how to determine the appropriate limit. I can probably apply a bit of perf+gdb, but I doubt that's what very nice. (b) It's not adaptive, so systems that grow over time (e.g. by adding schemas and other objects) will keep hitting the limit over and over. > >> Not sure which mail you're referring to - this seems to be the first e-mail from >> you in this thread (per our archives). > > Sorry, MauMau is me, Takayuki Tsunakawa. > Ah, of course! > >> I personally don't find explicit limit on cache size very attractive, because it's >> rather low-level and difficult to tune, and very easy to get it wrong (at which >> point you fall from a cliff). All the information is in backend private memory, so >> how would you even identify syscache is the thing you need to tune, or how >> would you determine the correct size? > > Just like other caches, we can present a view that shows the hits, misses, and the hit ratio of the entire catcaches. If the hit ratio is low, the user can enlarge the catcache size. That's what Oracle and MySQL do as I referred to in thisthread. The tuning parameter is the size. That's all. How will that work, considering the caches are in private backend memory? And each backend may have quite different characteristics, even if they are connected to the same database? > Besides, the v13 patch has as many as 4 parameters: cache_memory_target, cache_prune_min_age, cache_entry_limit, cache_entry_limit_prune_ratio. I don't think I can give the user good intuitive advice on how to tune these. > Isn't that more an argument for not having 4 parameters? > >>> https://en.wikipedia.org/wiki/Cache_(computing) >>> >>> "To be cost-effective and to enable efficient use of data, caches must >>> be relatively small." >>> >> >> Relatively small compared to what? It's also a question of how expensive cache >> misses are. > > I guess the author meant that the cache is "relatively small" compared to the underlying storage: CPU cache is smallerthan DRAM, DRAM is smaller than SSD/HDD. In our case, we have to pay more attention to limit the catcache memoryconsumption, especially because they are duplicated in multiple backend processes. > I don't think so. IMHO the focus there in on "cost-effective", i.e. caches are generally more expensive than the storage, so to make them worth it you need to make them much smaller than the main storage. That's pretty much what the 5 minute rule is about, I think. But I don't see how this applies to the problem at hand, because the system is already split into storage + cache (represented by RAM). The challenge is how to use RAM to cache various pieces of data to get the best behavior. The problem is, we don't have a unified cache, but multiple smaller ones (shared buffers, page cache, syscache) competing for the same resource. Of course, having multiple (different) copies of syscache makes it even more difficult. (Does this make sense, or am I just babbling nonsense?) > >> I don't know, but that does not seem very attractive. Each memory context has >> some overhead, and it does not solve the issue of never releasing memory to >> the OS. So we'd still have to rebuild the contexts at some point, I'm afraid. > > I think there is little additional overhead on each catcache access > -- processing overhead is the same as when using aset, and the memory > overhead is as much as several dozens (which is the number of catcaches) > of MemoryContext structure. Hmmm. That doesn't seem particularly terrible, I guess. > The slab context (slab.c) returns empty blocks to OS unlike the > allocation context (aset.c). Slab can do that, but it requires certain allocation pattern, and I very much doubt syscache has it. It'll be trivial to end with one active entry on each block (which means slab can't release it). BTW doesn't syscache store the full on-disk tuple? That doesn't seem like a fixed-length entry, which is a requirement for slab. No? regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
pgsql-hackers by date: