Reducing overhead for repeat de-TOASTing - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Reducing overhead for repeat de-TOASTing |
Date | |
Msg-id | 6861.1213644926@sss.pgh.pa.us Whole thread Raw |
Responses |
Re: Reducing overhead for repeat de-TOASTing
Re: Reducing overhead for repeat de-TOASTing Re: Reducing overhead for repeat de-TOASTing Re: Reducing overhead for repeat de-TOASTing |
List | pgsql-hackers |
Recent discussions with the PostGIS hackers led me to think about ways to reduce overhead when the same TOAST value is repeatedly detoasted. In the example shown here http://archives.postgresql.org/pgsql-hackers/2008-06/msg00384.php 90% of the runtime is being consumed by repeated detoastings of a single datum. That is certainly an outlier case, but we've heard complaints of TOAST being slow before. The solution I'm about to propose should fix this, and as a bonus it will reduce the problem of memory leakage when a detoasted value doesn't get pfreed. What I am imagining is that the tuple toaster will maintain a cache of recently-detoasted values, and that pg_detoast_datum() returns a pointer to a cache entry rather than a freshly palloc'd value. The cache lookup key is the toast table OID plus value OID. Now pg_detoast_datum() has no idea where the pointer it returns will go, so the big problem with this scheme is that it's hard to tell how long a cache entry needs to live. But we can resolve that by ruling that the required lifetime is the same as the value would have had if it'd really been palloc'd --- IOW, until the memory context that was current at the time gets deleted or reset. This can be implemented by allowing the toaster cache to hook into the memory context delete/reset calls. (We have no such capability now, but there have been some previous cases where it would've come in handy, so I think this would be a useful extension anyhow. It should be an actual hook that can be used by anyone, not something where mcxt.c is specifically passing control to tuptoaster.) Once a cache entry is no longer required anywhere, it can be removed, and will be if needed to shrink the cache to some target size (work_mem perhaps, or is it worth inventing a separate GUC for this?). But we can hang onto the value longer if it's not taking up needed space. This is important since the use pattern exhibited in the PostGIS problem involves repeat detoastings that are separated by MemoryContextResets --- if we free the cache entry as soon as possible, we'll not gain anything. The other cache management problem that would have to be solved is to invalidate entries when the underlying TOAST table or individual TOAST value is deleted. This is exactly like the problem for tuples in the syscaches, and can be solved via sinval messaging. (The reason we need to worry about this is to guard against the possibility that the same identifier is re-used for a new TOAST value; otherwise we could just let dead values age out of the cache.) A change of this sort has the potential to break a lot of code, but I think the only thing we'd really have to do is turn PG_FREE_IF_COPY() into a no-op. In general, functions are not supposed to scribble on pass-by-reference input datums, and since most functions don't actually distinguish whether their inputs got detoasted (except perhaps by using PG_FREE_IF_COPY()), that means that they won't be scribbling on the toast cache entry either. It would be possible to extend this concept to caching toast slice fetches (pg_detoast_datum_slice() calls) if we add the offset/length arguments to the cache lookup key. I'm not certain if that's worth the trouble though --- any feelings about that? Also, that case is much more likely to have callers that think they can scribble on the result, since they "know" it must be a palloc'd value. One unsolved problem is that this scheme doesn't provide any way to cache the result of decompressing an inline-compressed datum, because those have no unique ID that could be used for a lookup key. This puts a bit of a damper on the idea of making PG_FREE_IF_COPY() a no-op --- if it is, then detoasting a datum of that type would indeed cause a memory leak. The best idea I have at the moment is to continue to have inline decompression produce a palloc'd value, leave PG_FREE_IF_COPY() as-is, and arrange for pfree() on toast cache entries to be a no-op. (Which we can do by setting up a special memory context methods pointer for them.) It'd be cool if there were a way to cache decompression though. Ideas? Comments, better ideas? Anyone think this is too much trouble to take for the problem? regards, tom lane
pgsql-hackers by date: