Home > mailing lists

Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers

From	Tomas Vondra
Subject	Re: Protect syscache from bloating with negative cache entries
Date	February 12, 2019 01:53:40
Msg-id	9ebe0ac4-b59e-6397-0586-4e7125de7d5b@2ndquadrant.com Whole thread Raw
In response to	RE: Protect syscache from bloating with negative cache entries ("Tsunakawa, Takayuki" <tsunakawa.takay@jp.fujitsu.com>)
Responses	RE: Protect syscache from bloating with negative cache entries Re: Protect syscache from bloating with negative cache entries
List	pgsql-hackers

Tree view

On 2/12/19 1:49 AM, Tsunakawa, Takayuki wrote:
> From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> I'm not sure what you mean by "necessary" and "unnecessary" here. What
>> matters is how often an entry is accessed - if it's accessed often, it makes sense
>> to keep it in the cache. Otherwise evict it. Entries not accessed for 5 minutes are
>> clearly not accessed very often, so and getting rid of them will not hurt the
>> cache hit ratio very much.
> 
> Right, "necessary" and "unnecessary" were imprecise, and it matters
> how frequent the entries are accessed.  What made me say "unnecessary"
> is the pg_statistic entry left by CREATE/DROP TEMP TABLE which is never
> accessed again.
> 

OK, understood.

>> So I agree with Robert a time-based approach should work well here. It does
>> not have the issues with setting exact syscache size limit, it's kinda self-adaptive
>> etc.
>>
>> In a way, this is exactly what the 5 minute rule [1] says about caching.
>>
>> [1] http://www.hpl.hp.com/techreports/tandem/TR-86.1.pdf
> 
> Then, can we just set 5min to syscache_prune_min_age?  Otherwise,
> how can users set the expiration period?
> 

I believe so.

>>> The idea of expiration applies to the case where we want possibly
>>> stale entries to vanish and load newer data upon the next access.
>>> For example, the TTL (time-to-live) of Memcached, Redis, DNS, ARP.
>>> Is the catcache based on the same idea with them?  No.
>>>
>>
>> I'm not sure what has this to do with those other databases.
> 
> I meant that the time-based eviction is not very good, because it
> could cause less frequently entries to vanish even when memory is not
> short.  Time-based eviction reminds me of Memcached, Redis, DNS, etc.
> that evicts long-lived entries to avoid stale data, not to free space
> for other entries.  I think size-based eviction is sufficient like
> shared_buffers, OS page cache, CPU cache, disk cache, etc.
> 

Right. But the logic behind time-based approach is that evicting such
entries should not cause any issues exactly because they are accessed
infrequently. It might incur some latency when we need them for the
first time after the eviction, but IMHO that's acceptable (although I
see Andres did not like that).

FWIW we might even evict entries after some time passes since inserting
them into the cache - that's what memcached et al do, IIRC. The logic is
that frequently accessed entries will get immediately loaded back (thus
keeping cache hit ratio high). But there are reasons why the other dbs
do that - like not having any cache invalidation (unlike us).

That being said, having a "minimal size" threshold before starting with
the time-based eviction may be a good idea.

>> I'm certainly worried about the performance aspect of it. The syscache is in a
>> plenty of hot paths, so adding overhead may have significant impact. But that
>> depends on how complex the eviction criteria will be.
> 
> The LRU chain manipulation, dlist_move_head() in
> SearchCatCacheInternal(), may certainly incur some overhead.  If it has
> visible impact, then we can do the manipulation only when the user set
> an upper limit on the cache size.
> 

I think the benchmarks done so far suggest the extra overhead is within
noise. So unless we manage to make it much more expensive, we should be
OK I think.

>> And then there may be cases conflicting with the criteria, i.e. running into
>> just-evicted entries much more often. This is the issue with the initially
>> proposed hard limits on cache sizes, where it'd be trivial to under-size it just a
>> little bit.
> 
> In that case, the user can just enlarge the catcache.
> 

IMHO the main issues with this are

(a) It's not quite clear how to determine the appropriate limit. I can
probably apply a bit of perf+gdb, but I doubt that's what very nice.

(b) It's not adaptive, so systems that grow over time (e.g. by adding
schemas and other objects) will keep hitting the limit over and over.

> 
>> Not sure which mail you're referring to - this seems to be the first e-mail from
>> you in this thread (per our archives).
> 
> Sorry, MauMau is me, Takayuki Tsunakawa.
> 

Ah, of course!

> 
>> I personally don't find explicit limit on cache size very attractive, because it's
>> rather low-level and difficult to tune, and very easy to get it wrong (at which
>> point you fall from a cliff). All the information is in backend private memory, so
>> how would you even identify syscache is the thing you need to tune, or how
>> would you determine the correct size?
> 
> Just like other caches, we can present a view that shows the hits, misses, and the hit ratio of the entire catcaches.
If the hit ratio is low, the user can enlarge the catcache size.  That's what Oracle and MySQL do as I referred to in
thisthread.  The tuning parameter is the size.  That's all.

How will that work, considering the caches are in private backend
memory? And each backend may have quite different characteristics, even
if they are connected to the same database?

>  Besides, the v13 patch has as many as 4 parameters: cache_memory_target, cache_prune_min_age, cache_entry_limit,
cache_entry_limit_prune_ratio. I don't think I can give the user good intuitive advice on how to tune these.

> 

Isn't that more an argument for not having 4 parameters?

> 
>>> https://en.wikipedia.org/wiki/Cache_(computing)
>>>
>>> "To be cost-effective and to enable efficient use of data, caches must
>>> be relatively small."
>>>
>>
>> Relatively small compared to what? It's also a question of how expensive cache
>> misses are.
> 
> I guess the author meant that the cache is "relatively small" compared to the underlying storage: CPU cache is
smallerthan DRAM, DRAM is smaller than SSD/HDD.  In our case, we have to pay more attention to limit the catcache
memoryconsumption, especially because they are duplicated in multiple backend processes.

> 

I don't think so. IMHO the focus there in on "cost-effective", i.e.
caches are generally more expensive than the storage, so to make them
worth it you need to make them much smaller than the main storage.
That's pretty much what the 5 minute rule is about, I think.

But I don't see how this applies to the problem at hand, because the
system is already split into storage + cache (represented by RAM). The
challenge is how to use RAM to cache various pieces of data to get the
best behavior. The problem is, we don't have a unified cache, but
multiple smaller ones (shared buffers, page cache, syscache) competing
for the same resource.

Of course, having multiple (different) copies of syscache makes it even
more difficult.

(Does this make sense, or am I just babbling nonsense?)

> 
>> I don't know, but that does not seem very attractive. Each memory context has
>> some overhead, and it does not solve the issue of never releasing memory to
>> the OS. So we'd still have to rebuild the contexts at some point, I'm afraid.
> 
> I think there is little additional overhead on each catcache access
> -- processing overhead is the same as when using aset, and the memory
> overhead is as much as several dozens (which is the number of catcaches)
> of MemoryContext structure.

Hmmm. That doesn't seem particularly terrible, I guess.

> The slab context (slab.c) returns empty blocks to OS unlike the
> allocation context (aset.c).

Slab can do that, but it requires certain allocation pattern, and I very
much doubt syscache has it. It'll be trivial to end with one active
entry on each block (which means slab can't release it).

BTW doesn't syscache store the full on-disk tuple? That doesn't seem
like a fixed-length entry, which is a requirement for slab. No?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

From: Stephen Frost
Date: 12 February 2019, 01:41:05
Subject: Re: [PATCH v20] GSSAPI encryption support

From: Michael Paquier
Date: 12 February 2019, 02:09:41
Subject: Re: Reporting script runtimes in pg_regress

Re: Protect syscache from bloating with negative cache entries - Mailing list pgsql-hackers

Previous

Next