Thread: Re: pgsql: Add function to log the memory contexts of specified backend pro
On Tue, Apr 6, 2021 at 12:45 AM Fujii Masao <fujii@postgresql.org> wrote: > Add function to log the memory contexts of specified backend process. Hi, I think this might need a recursion guard. I tried this: diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index dc4c600922d..b219a934034 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -3532,7 +3532,7 @@ ProcessInterrupts(void) if (ParallelMessagePending) ProcessParallelMessages(); - if (LogMemoryContextPending) + if (true) ProcessLogMemoryContextInterrupt(); if (PublishMemoryContextPending) diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h index 72f5655fb34..867fd7b0ad5 100644 --- a/src/include/miscadmin.h +++ b/src/include/miscadmin.h @@ -112,7 +112,7 @@ extern void ProcessInterrupts(void); /* Test whether an interrupt is pending */ #ifndef WIN32 #define INTERRUPTS_PENDING_CONDITION() \ - (unlikely(InterruptPending)) + (unlikely(InterruptPending) || true) #else #define INTERRUPTS_PENDING_CONDITION() \ (unlikely(UNBLOCKED_SIGNAL_QUEUE()) ? pgwin32_dispatch_queued_signals() : 0, \ That immediately caused infinite recursion, ending in a core dump: frame #13: 0x0000000104607b00 postgres`errfinish(filename=<unavailable>, lineno=<unavailable>, funcname=<unavailable>) at elog.c:543:2 [opt] frame #14: 0x0000000104637078 postgres`ProcessLogMemoryContextInterrupt at mcxt.c:1392:2 [opt] frame #15: 0x00000001044a901c postgres`ProcessInterrupts at postgres.c:3536:3 [opt] frame #16: 0x0000000104607b54 postgres`errfinish(filename=<unavailable>, lineno=<unavailable>, funcname=<unavailable>) at elog.c:608:2 [opt] [artificial] frame #17: 0x0000000104637078 postgres`ProcessLogMemoryContextInterrupt at mcxt.c:1392:2 [opt] frame #18: 0x00000001044a901c postgres`ProcessInterrupts at postgres.c:3536:3 [opt] <repeat until we have 174241 frames on the stack, then dump core> It might be unlikely that a process can be signalled fast enough to actually fail in this way, but I'm not sure it's impossible, and I think we should be defending against it. The most trivial recursion guard would be HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around ProcessLogMemoryContextInterrupt(), but I think that's probably not quite good enough because it would make the backend impervious to pg_terminate_backend() while it's dumping memory contexts, and that could be a long time if the write blocks. -- Robert Haas EDB: http://www.enterprisedb.com
On 2025/05/01 2:15, Robert Haas wrote: > On Tue, Apr 6, 2021 at 12:45 AM Fujii Masao <fujii@postgresql.org> wrote: >> Add function to log the memory contexts of specified backend process. > > Hi, > > I think this might need a recursion guard. I tried this: > > diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c > index dc4c600922d..b219a934034 100644 > --- a/src/backend/tcop/postgres.c > +++ b/src/backend/tcop/postgres.c > @@ -3532,7 +3532,7 @@ ProcessInterrupts(void) > if (ParallelMessagePending) > ProcessParallelMessages(); > > - if (LogMemoryContextPending) > + if (true) > ProcessLogMemoryContextInterrupt(); > > if (PublishMemoryContextPending) > diff --git a/src/include/miscadmin.h b/src/include/miscadmin.h > index 72f5655fb34..867fd7b0ad5 100644 > --- a/src/include/miscadmin.h > +++ b/src/include/miscadmin.h > @@ -112,7 +112,7 @@ extern void ProcessInterrupts(void); > /* Test whether an interrupt is pending */ > #ifndef WIN32 > #define INTERRUPTS_PENDING_CONDITION() \ > - (unlikely(InterruptPending)) > + (unlikely(InterruptPending) || true) > #else > #define INTERRUPTS_PENDING_CONDITION() \ > (unlikely(UNBLOCKED_SIGNAL_QUEUE()) ? > pgwin32_dispatch_queued_signals() : 0, \ > > That immediately caused infinite recursion, ending in a core dump: > > frame #13: 0x0000000104607b00 > postgres`errfinish(filename=<unavailable>, lineno=<unavailable>, > funcname=<unavailable>) at elog.c:543:2 [opt] > frame #14: 0x0000000104637078 > postgres`ProcessLogMemoryContextInterrupt at mcxt.c:1392:2 [opt] > frame #15: 0x00000001044a901c postgres`ProcessInterrupts at > postgres.c:3536:3 [opt] > frame #16: 0x0000000104607b54 > postgres`errfinish(filename=<unavailable>, lineno=<unavailable>, > funcname=<unavailable>) at elog.c:608:2 [opt] [artificial] > frame #17: 0x0000000104637078 > postgres`ProcessLogMemoryContextInterrupt at mcxt.c:1392:2 [opt] > frame #18: 0x00000001044a901c postgres`ProcessInterrupts at > postgres.c:3536:3 [opt] > <repeat until we have 174241 frames on the stack, then dump core> > > It might be unlikely that a process can be signalled fast enough to > actually fail in this way, but I'm not sure it's impossible, and I > think we should be defending against it. The most trivial recursion > guard would be HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around > ProcessLogMemoryContextInterrupt(), but I think that's probably not > quite good enough because it would make the backend impervious to > pg_terminate_backend() while it's dumping memory contexts, and that > could be a long time if the write blocks. Just idea, what do you think about adding a flag to indicate whether ProcessLogMemoryContextInterrupt() is currently running? Then, when a backend receives a signal and ProcessLogMemoryContextInterrupt() is invoked, it can simply return immediately if the flag is already set like this: ------------------------------ @ -1383,8 +1383,14 @@ HandleGetMemoryContextInterrupt(void) void ProcessLogMemoryContextInterrupt(void) { + static bool loggingMemoryContext = false; + LogMemoryContextPending = false; + if (loggingMemoryContext) + return; + loggingMemoryContext = true; + /* * Use LOG_SERVER_ONLY to prevent this message from being sent to the * connected client. @@ -1406,6 +1412,8 @@ ProcessLogMemoryContextInterrupt(void) * details about individual siblings beyond 100 will not be large. */ MemoryContextStatsDetail(TopMemoryContext, 100, 100, false); + + loggingMemoryContext = false; } ------------------------------ This way, we can safely ignore overlapping pg_log_backend_memory_contexts() requests while the function is already running. Thoughts? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On Thu, May 1, 2025 at 3:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: > Just idea, what do you think about adding a flag to indicate whether > ProcessLogMemoryContextInterrupt() is currently running? Then, > when a backend receives a signal and ProcessLogMemoryContextInterrupt() > is invoked, it can simply return immediately if the flag is already set > like this: I think that something like this could work, but you would need more than this. Otherwise, if the function errors out, the flag would remain permanently set. -- Robert Haas EDB: http://www.enterprisedb.com
On 2025/05/01 21:42, Robert Haas wrote: > On Thu, May 1, 2025 at 3:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >> Just idea, what do you think about adding a flag to indicate whether >> ProcessLogMemoryContextInterrupt() is currently running? Then, >> when a backend receives a signal and ProcessLogMemoryContextInterrupt() >> is invoked, it can simply return immediately if the flag is already set >> like this: > > I think that something like this could work, but you would need more > than this. Otherwise, if the function errors out, the flag would > remain permanently set. Yes, we need to either use PG_TRY()/PG_FINALLY() or handle the flag as a global variable and reset it in the error handling path. I think using PG_TRY()/PG_FINALLY() is the simpler option. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
On 2025/05/02 2:27, Fujii Masao wrote: > > > On 2025/05/01 21:42, Robert Haas wrote: >> On Thu, May 1, 2025 at 3:53 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote: >>> Just idea, what do you think about adding a flag to indicate whether >>> ProcessLogMemoryContextInterrupt() is currently running? Then, >>> when a backend receives a signal and ProcessLogMemoryContextInterrupt() >>> is invoked, it can simply return immediately if the flag is already set >>> like this: >> >> I think that something like this could work, but you would need more >> than this. Otherwise, if the function errors out, the flag would >> remain permanently set. > > Yes, we need to either use PG_TRY()/PG_FINALLY() or handle the flag as > a global variable and reset it in the error handling path. I think using > PG_TRY()/PG_FINALLY() is the simpler option. I've implemented the patch in that way. Patch attached. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Attachment
On 2025-05-02 09:02, Fujii Masao wrote: > On 2025/05/02 2:27, Fujii Masao wrote: >> >> >> On 2025/05/01 21:42, Robert Haas wrote: >>> On Thu, May 1, 2025 at 3:53 AM Fujii Masao >>> <masao.fujii@oss.nttdata.com> wrote: >>>> Just idea, what do you think about adding a flag to indicate whether >>>> ProcessLogMemoryContextInterrupt() is currently running? Then, >>>> when a backend receives a signal and >>>> ProcessLogMemoryContextInterrupt() >>>> is invoked, it can simply return immediately if the flag is already >>>> set >>>> like this: >>> >>> I think that something like this could work, but you would need more >>> than this. Otherwise, if the function errors out, the flag would >>> remain permanently set. >> >> Yes, we need to either use PG_TRY()/PG_FINALLY() or handle the flag as >> a global variable and reset it in the error handling path. I think >> using >> PG_TRY()/PG_FINALLY() is the simpler option. > > I've implemented the patch in that way. Patch attached. Thank you for the patch! I confirmed that with this patch applied, the process no longer crashes even after applying the change Robert used to trigger the crash. a small nitpick: + * requested repeatedly and rapidly, potentially leading to infinite It looks like there are two spaces between "requested" and "repeatedly". -- Regards, -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.
On 2025/05/02 14:58, torikoshia wrote: > I confirmed that with this patch applied, the process no longer crashes even after applying the change Robert used to triggerthe crash. > > a small nitpick: > > + * requested repeatedly and rapidly, potentially leading to infinite > > It looks like there are two spaces between "requested" and "repeatedly". Thanks for the review and testing! I've fixed the issue you pointed out. Updated patch attached. Since git cherry-pick didn't work cleanly for v16 and earlier, I've also prepared a separate patch for those versions. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION