Home > mailing lists
Re: Proposal for fixing intra-query memory leaks - Mailing list pgsql-hackers

From	Bruce Momjian
Subject	Re: Proposal for fixing intra-query memory leaks
Date	June 12, 2000 20:32:56
Msg-id	200006130015.UAA12057@candle.pha.pa.us Whole thread Raw
Responses	Re: Proposal for fixing intra-query memory leaks
List	pgsql-hackers
Tree view
FYI, Tom, is this still relivant?

> This issue seems to have been on the back burner for a while,
> but I think we need to put it on the front burner again for 7.1.
> Here is a think-piece I just did.  I'd appreciate comments,
> particularly about possible interactions with TOAST --- Jan,
> did you have any particular plan in mind for freeing datums created
> by de-TOASTing?
> 
>             regards, tom lane
> 
> 
> Proposal for memory allocation fixes            29-Apr-2000
> ------------------------------------
> 
> We know that Postgres has serious problems with memory leakage during
> large queries that process a lot of pass-by-reference data.  There is
> no provision for recycling memory until end of query.  This needs to be
> fixed, even more so with the advent of TOAST which will allow very
> large chunks of data to be passed around in the system.  Furthermore,
> 7.1 is an ideal time for fixing it since TOAST and the function-manager
> interface changes will require visiting a lot of the same code that needs
> to be cleaned up.  So, here is a proposal.
> 
> 
> Background
> ----------
> 
> We already do most of our memory allocation in "memory contexts", which
> are usually AllocSets as implemented by backend/utils/mmgr/aset.c.
> (Is there any value in allowing for other memory context types?  We could
> save some cycles by getting rid of a level of indirection here.)  What
> we need to do is create more contexts and define proper rules about when
> they can be freed.
> 
> The basic operations on a memory context are:
> 
> * create a context
> 
> * delete a context (including freeing all the memory allocated therein)
> 
> * reset a context (free all memory allocated in the context, but not the
>   context object itself)
> 
> Given a context, one can allocate a chunk of memory within it, free a
> previously allocated chunk, or realloc a previously allocated chunk larger
> or smaller.  (These operations correspond directly to standard C's
> malloc(), free(), and realloc() routines.)  At all times there is a
> "current" context denoted by the CurrentMemoryContext global variable.
> The backend macros palloc(), pfree(), prealloc() implicitly allocate space
> in that context.  The MemoryContextSwitchTo() operation selects a new
> current context (and returns the previous context, so that the caller can
> restore the previous context before exiting).
> 
> Note: there is no really good reason for pfree() to be tied to the current
> memory context; it ought to be possible to pfree() a chunk of memory no
> matter which context it was allocated from.  Currently we cannot do that
> because of the possibility that there is more than one kind of memory
> context.  If they were all AllocSets then the problem goes away, which is
> one reason I'd like to eliminate the provision for other kinds of
> contexts.
> 
> The main advantage of memory contexts over plain use of malloc/free is
> that the entire contents of a memory context can be freed easily, without
> having to request freeing of each individual chunk within it.  This is
> both faster and more reliable than per-chunk bookkeeping.  We already use
> this fact to clean up at transaction end: by resetting all the active
> contexts, we reclaim all memory.  What we need are additional contexts
> that can be reset or deleted at strategic times within a query, such as
> after each tuple.
> 
> 
> Additions to the memory-context mechanism
> -----------------------------------------
> 
> If we are going to have more contexts, we need more mechanism for keeping
> track of them; else we risk leaking whole contexts under error conditions.
> We can do this as follows:
> 
> 1. There will be two kinds of contexts, "permanent" and "temporary".
> Permanent contexts are never reset or deleted except by explicit caller
> command (in practice, they probably won't ever be, period).  There will
> not be very many of these --- perhaps only the existing TopMemoryContext
> and CacheMemoryContext.  We should avoid having very much code run with
> CurrentMemoryContext pointing at a permanent context, since any forgotten
> palloc() represents a permanent memory leak.
> 
> 2. Temporary contexts are remembered by the context manager and are
> guaranteed to be deleted at transaction end.  (If we ever have nested
> transactions, we'd probably want to tie each temporary context to a
> particular transaction, but for now that's not necessary.)  Most activity
> will happen in temporary contexts.
> 
> 3. When a context is created, an existing context can be specified as its
> parent; thus a tree of contexts is created.  Resetting or deleting any
> particular context resets or deletes all its direct and indirect children
> as well.  This feature allows us to manage a lot of contexts without fear
> that some will be leaked; we just have to make sure everything descends
> from one context that we remember to zap at transaction end.
> 
> In practice, point #2 doesn't require any special support in the context
> manager as long as it supports point #3.  We simply start a new context
> for each transaction and delete it at transaction end.  All temporary
> contexts created within the transaction must be direct or indirect
> children of this "transaction top context".
> 
> Note: it would probably be possible to adapt the existing "portal" memory
> management mechanism to do what we need.  I am instead proposing setting
> up a totally new mechanism, because the portal code strikes me as
> extremely crufty and unwieldy.  It may be that we can eventually remove
> portals entirely, or perhaps reimplement them with this mechanism
> underneath.
> 
> 
> Top-level (permanent) memory contexts
> -------------------------------------
> 
> We currently have TopMemoryContext and CacheMemoryContext as permanent
> memory contexts.  The existing usages of these are probably OK, although
> it might be a good idea to examine usages of TopMemoryContext to see if
> they should go somewhere else.
> 
> It might also be a good idea to set up a permanent ErrorMemoryContext that
> elog() can switch into for processing an error; this would ensure that
> there is at least ~8K of memory available for error processing, even if
> we've run out otherwise.  (ErrorMemoryContext could be reset, but not
> deleted, after each successful error recovery.)
> 
> We will also create a global variable TransactionTopMemoryContext, which
> is valid at all times.  Memory recovery at end of transaction is done by
> deleting and immediately recreating this context.  All transaction-local
> contexts are created as children of TransactionTopMemoryContext, so that
> they go away at transaction end too.  (If we implement nested
> transactions, it could be that TransactionTopMemoryContext will itself be
> a child of some outer transaction's top context, but that's beyond the
> scope of this proposal.)
> 
> 
> Transaction-local memory contexts
> ---------------------------------
> 
> Relatively little stuff should get allocated directly in
> TransactionTopMemoryContext; the bulk of the action should happen in
> sub-contexts.  I propose the following:
> 
> QueryTopMemoryContext: this child of TransactionTopMemoryContext is
> created at the start of each query cycle and deleted upon successful
> completion.  (On error, of course, it goes away because it is a child of
> TransactionTopMemoryContext.)  The query input buffer is allocated in this
> context, as well as anything else that should live just till end of query.
> 
> ParsePlanMemoryContext: this child of QueryTopMemoryContext is working
> space for the parse/rewrite/plan/optimize pipeline.  After completion
> of planning, the final query plan is copied via copyObject() back into
> QueryTopMemoryContext, and then the ParsePlanMemoryContext can be deleted.
> This allows us to recycle the (perhaps large) amount of memory used by
> planning before actual query execution starts.
> 
> Execution per-run memory contexts: at startup, the executor will create a
> child of QueryTopMemoryContext to hold data that should live until
> ExecEndPlan; an example is the plan-node-local execution state.  Some plan
> node types may want to create shorter-lived contexts that are children of
> their parent's per-run context.  For example, a subplan node would create
> its own "per run" context so that memory could be freed at completion of
> each invocation of the subplan.
> 
> Execution per-tuple memory contexts: each per-run context will have a
> child context that the executor will reset (not delete) each time through
> the node's per-tuple loop.  This per-tuple context will be the active
> CurrentMemoryContext most of the time during execution.
> 
> By resetting the per-tuple context, we will be able to free memory after
> each tuple is processed, rather than only after the whole plan is
> processed.  This should solve our memory leakage problems pretty well;
> yet we do not need to add very much new bookkeeping logic to do it.
> In particular, we do *not* need to try to keep track of individual values
> palloc'd during expression evaluation.
> 
> Note we assume that resetting a context is a cheap operation.  This is
> true already, and we can make it even more true with a little bit of
> tuning in aset.c.
> 
> 
> Coding rules required
> ---------------------
> 
> Functions that return pass-by-reference values will be required always
> to palloc the returned space in the caller's memory context (ie, the
> context that was CurrentMemoryContext at the time of call).  It is not
> OK to pass back an input pointer, even if we are returning an input value
> verbatim, because we do not know the lifespan of the context the input
> pointer points to.  An example showing why this is necessary is provided
> by aggregate-function execution.  The aggregate function executor must
> retain state values returned by state-transition functions from one tuple
> to the next.  Yet it does not want to keep them till end of run; that
> would be a memory leak.  The solution nodeAgg.c will use is to have two
> per-tuple memory contexts that are used alternately.  At each tuple,
> an old state value existing in one context is passed to the state
> transition function, which will return its result in the other context
> (since that'll be where CurrentMemoryContext points).  Then the first
> context is reset and used as the target for the next cycle.  This solution
> works as long as the transition function always returns a newly palloc'd
> datum, and never simply returns a pointer to its input data.
> 
> Thus, a function must use the passed-in CurrentMemoryContext for
> allocating its result data, and can use it for any temporary storage it
> needs as well.  pfree'ing such temporary data before return is possible
> but not essential.
> 
> Executor routines that switch the active CurrentMemoryContext may need
> to copy data into their caller's current memory context before returning.
> I think there will be relatively little need for that, if we use a
> convention of resetting the per-tuple context at the *start* of an
> execution cycle rather than at its end.  With that rule, an execution
> node can return a tuple that is palloc'd in its per-tuple context, and
> the tuple will remain good until the node is called for another tuple
> or told to end execution.  This is pretty much the same state of affairs
> that exists now, since a scan node can return a direct pointer to a tuple
> in a disk buffer that is only guaranteed to remain good that long.
> 
> A more common reason for copying data will be to transfer a result from
> per-tuple context to per-run context; for example, a Unique node will
> save the last distinct tuple value in its per-run context, requiring a
> copy step.  (Actually, Unique could use the same trick with two per-tuple
> contexts as described above for Agg, but there will probably be other
> cases where doing an extra copy step is the right thing.)
> 
> 
> Other notes
> -----------
> 
> It might be that the executor per-run contexts described above should
> be tied directly to executor "EState" nodes, that is, one context per
> EState.  I'm not real clear on the lifespan of EStates or the situations
> where we have just one or more than one, so I'm not sure.  Comments?
> 
> With so many contexts running around, I think it will be almost essential
> to allow pfree() to work on chunks belonging to contexts other than the
> current one.  If we don't get rid of the notion of multiple allocation
> context types then some other work will have to be expended to make this
> possible.  Also, should we allow prealloc() to work on a chunk not
> belonging to the current context?  I'm less excited about allowing that,
> but it may prove useful.
> 


--  Bruce Momjian                        |  http://www.op.net/~candle pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
pgsql-hackers by date:
From: Bruce Momjian
Date: 12 June 2000, 20:32:28
Subject: Re: Notice in logg file
From: Bruce Momjian
Date: 12 June 2000, 20:33:03
Subject: Re: memory management suggestion
Re: Proposal for fixing intra-query memory leaks - Mailing list pgsql-hackers

Previous

Next