Re: Using ProcSignal to get memory context stats from a running backend - Mailing list pgsql-hackers
From | Craig Ringer |
---|---|
Subject | Re: Using ProcSignal to get memory context stats from a running backend |
Date | |
Msg-id | CAMsr+YGh+sso5N6Q+FmYHLWC=BPCzA+5GbhYZSGruj2d0c7Vvg@mail.gmail.com Whole thread Raw |
In response to | Re: Using ProcSignal to get memory context stats from a runningbackend (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Using ProcSignal to get memory context stats from a running backend
|
List | pgsql-hackers |
On 21 December 2017 at 15:24, Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2017-12-21 15:13:13 +0800, Craig Ringer wrote:
> There tons of callers to enlargeStringInfo, so a 'noerror' parameter would
> be viable.
Not sure what you mean with that sentence?
Mangled in editing and sent prematurely, disregard.
There are NOT tons of callers to enlargeStringInfo, so adding a parameter that allowed it to return a failure result rather than ERROR on OOM seems to be a reasonable option. But it relies on repalloc(), which will ERROR on OOM. AFAICS we don't have "no error" variants of the memory allocation routines and I'm not eager to add them. Especially since we can't trust that we're not on misconfigured Linux where the OOM killer will go wild instead of giving us a nice NULL result.
So I guess that means we should probably just do individual elog(...)s and live with the ugliness of scraping the resulting mess out of the logs. After all, a log destination that could possibly copy and modify the string being logged a couple of times it's not a good idea to try to drop the whole thing into the logs in one blob. And we can't trust things like syslog with large messages.
I'll follow up with a revised patch that uses individual elog()s soon.
BTW, I also pgindented it in my tree, so it'll have formatting fixed up. pgindent's handling of
static void printwrapper_stringinfo(void *extra, const char *fmt,...) pg_attribute_printf(2, 3);
upsets me a little, since if I break the line it mangles it into
static void
printwrapper_stringinfo(void *extra, const char *fmt,...)
pg_attribute_printf(2, 3);
so it'll break line-length rules a little, but otherwise conform.
> But I'm not convinced it's worth it personally. If we OOM in response to a
> ProcSignal request for memory context output, we're having pretty bad luck.
> The output is 8k in my test. But even if it were a couple of hundred kb,
> happening to hit OOM just then isn't great luck on modern systems with many
> gigabytes of RAM.
I've seen plenty memory dumps in the dozens to hundreds of
megabytes. And imo such cases are more likely to invite use of this
facility.
OK, that's more concerning then. Also impressively huge.
As written it's using MemoryContextStatsDetail(TopMemoryContext, 100) so it shouldn't really be *getting* multimegabyte dumps, but I'm not thrilled by hardcoding that. It doesn't merit a GUC though, so I'm not sure what to do about it and figured I'd make it a "later" problem.
> If that *does* happen, repalloc(...) will call
> MemoryContextStats(TopMemoryContext) before returning NULL. So we'll get
> our memory context dump anyway, albeit to stderr.
That would still abort the query that might otherwise continue to work,
so that seems no excuse.
Eh. It'd probably die soon enough anyway. But yeah, I agree it's not safe to hope we can sling around up to a couple of hundred MB under presumed memory pressure.
pgsql-hackers by date: