Thread: Is RecoveryConflictInterrupt() entirely safe in a signal handler?
Hi, Unlike most "procsignal" handler routines, RecoveryConflictInterrupt() doesn't just set a sig_atomic_t flag and poke the latch. Is the extra stuff it does safe? For example, is this call stack OK (to pick one that jumps out, but not the only one)? procsignal_sigusr1_handler -> RecoveryConflictInterrupt -> HoldingBufferPinThatDelaysRecovery -> GetPrivateRefCount -> GetPrivateRefCountEntry -> hash_search(...hash table that might be in the middle of an update...) (I noticed this incidentally while trying to follow along with the nearby thread on 031_recovery_conflict.pl, but the question of why we really need this of interest to me for a back-burner project I have to try to remove all use of signals except for latches, and then remove the signal emulation for Windows. It may turn out to be a pipe dream, but this stuff is one of the subproblems.)
Thomas Munro <thomas.munro@gmail.com> writes: > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt() > doesn't just set a sig_atomic_t flag and poke the latch. Is the extra > stuff it does safe? For example, is this call stack OK (to pick one > that jumps out, but not the only one)? > procsignal_sigusr1_handler > -> RecoveryConflictInterrupt > -> HoldingBufferPinThatDelaysRecovery > -> GetPrivateRefCount > -> GetPrivateRefCountEntry > -> hash_search(...hash table that might be in the middle of an update...) Ugh. That one was safe before somebody decided we needed a hash table for buffer refcounts, but it's surely not safe now. Which, of course, demonstrates the folly of allowing signal handlers to call much of anything; but especially doing so without clearly marking the called functions as needing to be signal safe. regards, tom lane
Hi, On 2022-04-09 17:00:41 -0400, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt() > > doesn't just set a sig_atomic_t flag and poke the latch. Is the extra > > stuff it does safe? For example, is this call stack OK (to pick one > > that jumps out, but not the only one)? > > > procsignal_sigusr1_handler > > -> RecoveryConflictInterrupt > > -> HoldingBufferPinThatDelaysRecovery > > -> GetPrivateRefCount > > -> GetPrivateRefCountEntry > > -> hash_search(...hash table that might be in the middle of an update...) > > Ugh. That one was safe before somebody decided we needed a hash table > for buffer refcounts, but it's surely not safe now. Mea culpa. This is 4b4b680c3d6d - from 2014. > Which, of course, demonstrates the folly of allowing signal handlers to call > much of anything; but especially doing so without clearly marking the called > functions as needing to be signal safe. Yea. Particularly when just going through bufmgr and updating places that look at pin counts, it's not immediately obvious that HoldingBufferPinThatDelaysRecovery() runs in a signal handler. Partially because RecoveryConflictInterrupt() - which is mentioned in the comment above HoldingBufferPinThatDelaysRecovery() - sounds a lot like it's called from ProcessInterrupts(), which doesn't run in a signal handler... RecoveryConflictInterrupt() calls a lot of functions, some of which quite plausibly could be changed to not be signal safe, even if they currently are. Is there really a reason for RecoveryConflictInterrupt() to run in a signal handler? Given that we only react to conflicts in ProcessInterrupts(), it's not immediately obvious that we need to do anything in RecoveryConflictInterrupt() but set some flags. There's probably some minor efficiency gains, but that seems unconvincing. The comments really need a rewrite - it sounds like RecoveryConflictInterrupt() will error out itself: /* * If we can abort just the current subtransaction then we are * OK to throw an ERROR to resolve the conflict. Otherwise * drop through to the FATAL case. * * XXX other times that we can throw just an ERROR *may* be * PROCSIG_RECOVERY_CONFLICT_LOCK if no locks are held in * parent transactions * * PROCSIG_RECOVERY_CONFLICT_SNAPSHOT if no snapshots are held * by parent transactions and the transaction is not * transaction-snapshot mode * * PROCSIG_RECOVERY_CONFLICT_TABLESPACE if no temp files or * cursors open in parent transactions */ it's technically not *wrong* because it's setting up state that then leads to ERROR / FATAL being thrown, but ... Greetings, Andres Freund
Hi, On 2022-04-09 14:39:16 -0700, Andres Freund wrote: > On 2022-04-09 17:00:41 -0400, Tom Lane wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt() > > > doesn't just set a sig_atomic_t flag and poke the latch. Is the extra > > > stuff it does safe? For example, is this call stack OK (to pick one > > > that jumps out, but not the only one)? > > > > > procsignal_sigusr1_handler > > > -> RecoveryConflictInterrupt > > > -> HoldingBufferPinThatDelaysRecovery > > > -> GetPrivateRefCount > > > -> GetPrivateRefCountEntry > > > -> hash_search(...hash table that might be in the middle of an update...) > > > > Ugh. That one was safe before somebody decided we needed a hash table > > for buffer refcounts, but it's surely not safe now. > > Mea culpa. This is 4b4b680c3d6d - from 2014. Whoa. There's way worse: StandbyTimeoutHandler() calls SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends(), which acquires lwlocks etc. Which very plausibly is the cause for the issue I'm investigating in https://www.postgresql.org/message-id/20220409220054.fqn5arvbeesmxdg5%40alap3.anarazel.de Greetings, Andres Freund
On Sun, Apr 10, 2022 at 11:00 AM Andres Freund <andres@anarazel.de> wrote: > On 2022-04-09 14:39:16 -0700, Andres Freund wrote: > > On 2022-04-09 17:00:41 -0400, Tom Lane wrote: > > > Thomas Munro <thomas.munro@gmail.com> writes: > > > > Unlike most "procsignal" handler routines, RecoveryConflictInterrupt() > > > > doesn't just set a sig_atomic_t flag and poke the latch. Is the extra > > > > stuff it does safe? For example, is this call stack OK (to pick one > > > > that jumps out, but not the only one)? > > > > > > > procsignal_sigusr1_handler > > > > -> RecoveryConflictInterrupt > > > > -> HoldingBufferPinThatDelaysRecovery > > > > -> GetPrivateRefCount > > > > -> GetPrivateRefCountEntry > > > > -> hash_search(...hash table that might be in the middle of an update...) > > > > > > Ugh. That one was safe before somebody decided we needed a hash table > > > for buffer refcounts, but it's surely not safe now. > > > > Mea culpa. This is 4b4b680c3d6d - from 2014. > > Whoa. There's way worse: StandbyTimeoutHandler() calls > SendRecoveryConflictWithBufferPin(), which calls CancelDBBackends(), which > acquires lwlocks etc. > > Which very plausibly is the cause for the issue I'm investigating in > https://www.postgresql.org/message-id/20220409220054.fqn5arvbeesmxdg5%40alap3.anarazel.de Huh. I wouldn't have started a separate thread for this if I'd realised I was getting close to the cause of the CI failure... I thought this was an incidental observation. Anyway, I made a first attempt at fixing this SIGUSR1 problem (I think Andres is looking at the SIGALRM problem in the other thread). Instead of bothering to create N different XXXPending variables for the different conflict "reasons", I used an array. Other than that, it's much like existing examples. The existing use of the global variable RecoveryConflictReason seems a little woolly. Doesn't it get clobbered every time a signal arrives, even if we determine that there is no conflict? Not sure why that's OK, but anyway, this patch always sets it together with RecoveryConflictPending = true.
Attachment
Hi, On 2022-04-12 10:33:28 +1200, Thomas Munro wrote: > Huh. I wouldn't have started a separate thread for this if I'd > realised I was getting close to the cause of the CI failure... :) > Instead of bothering to create N different XXXPending variables for > the different conflict "reasons", I used an array. Other than that, > it's much like existing examples. It kind of bothers me that each pending conflict has its own external function call. It doesn't actually cost anything, because it's quite unlikely that there's more than one pending conflict. Besides aesthetically displeasing, it also leads to an unnecessarily large amount of code needed, because the calls to RecoveryConflictInterrupt() can't be merged... But that's perhaps best fixed separately. What might actually make more sense is to just have a bitmask or something? > The existing use of the global variable RecoveryConflictReason seems a > little woolly. Doesn't it get clobbered every time a signal arrives, > even if we determine that there is no conflict? Not sure why that's > OK, but anyway, this patch always sets it together with > RecoveryConflictPending = true. Yea. It's probably ok, kind of, because there shouldn't be multiple outstanding conflicts with very few exceptions (deadlock and buffer pin). And it doesn't matter that much which of those gets handled. And we'll retry again. But brrr. > +/* > + * Check one recovery conflict reason. This is called when the corresponding > + * RecoveryConflictInterruptPending flags is set. If we decide that a conflict > + * exists, then RecoveryConflictReason and RecoveryConflictPending will be set, > + * to be handled later in the same invocation of ProcessInterrupts(). > + */ > +static void > +ProcessRecoveryConflictInterrupt(ProcSignalReason reason) > +{ > /* > * Don't joggle the elbow of proc_exit > */ > if (!proc_exit_inprogress) > { > - RecoveryConflictReason = reason; > switch (reason) > { > case PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK: > @@ -3084,9 +3094,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason) > if (IsAbortedTransactionBlockState()) > return; > > + RecoveryConflictReason = reason; > RecoveryConflictPending = true; > QueryCancelPending = true; > - InterruptPending = true; > break; > } > > @@ -3094,9 +3104,9 @@ RecoveryConflictInterrupt(ProcSignalReason reason) > /* FALLTHROUGH */ > > case PROCSIG_RECOVERY_CONFLICT_DATABASE: > + RecoveryConflictReason = reason; > RecoveryConflictPending = true; > ProcDiePending = true; > - InterruptPending = true; > break; > > default: > @@ -3115,15 +3125,6 @@ RecoveryConflictInterrupt(ProcSignalReason reason) > if (reason == PROCSIG_RECOVERY_CONFLICT_DATABASE) > RecoveryConflictRetryable = false; > } It's pretty weird that we have all this stuff that we then just check a short while later in ProcessInterrupts() whether they've been set. Seems like it'd make more sense to throw the error in ProcessRecoveryConflictInterrupt(), now that it's not in a a signal handler anymore? > /* > @@ -3147,6 +3148,22 @@ ProcessInterrupts(void) > return; > InterruptPending = false; > > + /* > + * Have we been asked to check for a recovery conflict? Processing these > + * interrupts may result in RecoveryConflictPending and related variables > + * being set, to be handled further down. > + */ > + for (int i = PROCSIG_RECOVERY_CONFLICT_FIRST; > + i <= PROCSIG_RECOVERY_CONFLICT_LAST; > + ++i) > + { > + if (RecoveryConflictInterruptPending[i]) > + { > + RecoveryConflictInterruptPending[i] = false; > + ProcessRecoveryConflictInterrupt(i); > + } > + } Hm. This seems like it shouldn't be in ProcessInterrupts(). How about checking calling a wrapper doing all this if RecoveryConflictPending? Greetings, Andres Freund
On Tue, Apr 12, 2022 at 10:50 AM Andres Freund <andres@anarazel.de> wrote: > On 2022-04-12 10:33:28 +1200, Thomas Munro wrote: > > Instead of bothering to create N different XXXPending variables for > > the different conflict "reasons", I used an array. Other than that, > > it's much like existing examples. > > It kind of bothers me that each pending conflict has its own external function > call. It doesn't actually cost anything, because it's quite unlikely that > there's more than one pending conflict. Besides aesthetically displeasing, > it also leads to an unnecessarily large amount of code needed, because the > calls to RecoveryConflictInterrupt() can't be merged... Ok, in this version there's two levels of flag: RecoveryConflictPending, so we do nothing if that's not set, and then the loop over RecoveryConflictPendingReasons is moved into ProcessRecoveryConflictInterrupts(). Better? > What might actually make more sense is to just have a bitmask or something? Yeah, in fact I'm exploring something like that in later bigger redesign work[1] that gets rid of signal handlers. Here I'm looking for something simple and potentially back-patchable and I don't want to have to think about async signal safety of bit-level manipulations. > It's pretty weird that we have all this stuff that we then just check a short > while later in ProcessInterrupts() whether they've been set. > > Seems like it'd make more sense to throw the error in > ProcessRecoveryConflictInterrupt(), now that it's not in a a signal handler > anymore? Yeah. The thing that was putting me off doing that (and caused me to get kinda stuck in the valley of indecision for a while here, sorry about that) aside from trying to keep the diff small, was the need to replicate this self-loathing code in a second place: if (QueryCancelPending && QueryCancelHoldoffCount != 0) { /* * Re-arm InterruptPending so that we process the cancel request as * soon as we're done reading the message. (XXX this is seriously * ugly: it complicates INTERRUPTS_CAN_BE_PROCESSED(), and it means we * can't use that macro directly as the initial test in this function, * meaning that this code also creates opportunities for other bugs to * appear.) */ But I have now tried doing that anyway, and I hope the simplification in other ways makes it worth it. Thoughts, objections? > > /* > > @@ -3147,6 +3148,22 @@ ProcessInterrupts(void) > > return; > > InterruptPending = false; > > > > + /* > > + * Have we been asked to check for a recovery conflict? Processing these > > + * interrupts may result in RecoveryConflictPending and related variables > > + * being set, to be handled further down. > > + */ > > + for (int i = PROCSIG_RECOVERY_CONFLICT_FIRST; > > + i <= PROCSIG_RECOVERY_CONFLICT_LAST; > > + ++i) > > + { > > + if (RecoveryConflictInterruptPending[i]) > > + { > > + RecoveryConflictInterruptPending[i] = false; > > + ProcessRecoveryConflictInterrupt(i); > > + } > > + } > > Hm. This seems like it shouldn't be in ProcessInterrupts(). How about checking > calling a wrapper doing all this if RecoveryConflictPending? I moved the loop into ProcessRecoveryConflictInterrupt() and added an "s" to the latter's name. It already had the right indentation level to contain a loop, once I realised that the test of proc_exit_inprogress must be redundant. Better? [1] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2B3MkS21yK4jL4cgZywdnnGKiBg0jatoV6kzaniBmcqbQ%40mail.gmail.com
Attachment
Hi, It'd be cool to commit and backpatch this - I'd like to re-enable the conflict tests in the backbranches, and I don't think we want to do so with this issue in place. On 2022-05-10 16:39:11 +1200, Thomas Munro wrote: > On Tue, Apr 12, 2022 at 10:50 AM Andres Freund <andres@anarazel.de> wrote: > > On 2022-04-12 10:33:28 +1200, Thomas Munro wrote: > > > Instead of bothering to create N different XXXPending variables for > > > the different conflict "reasons", I used an array. Other than that, > > > it's much like existing examples. > > > > It kind of bothers me that each pending conflict has its own external function > > call. It doesn't actually cost anything, because it's quite unlikely that > > there's more than one pending conflict. Besides aesthetically displeasing, > > it also leads to an unnecessarily large amount of code needed, because the > > calls to RecoveryConflictInterrupt() can't be merged... > > Ok, in this version there's two levels of flag: > RecoveryConflictPending, so we do nothing if that's not set, and then > the loop over RecoveryConflictPendingReasons is moved into > ProcessRecoveryConflictInterrupts(). Better? I think so. I don't particularly like the Handle/ProcessRecoveryConflictInterrupt() split, naming-wise. I don't think Handle vs Process indicates something meaningful? Maybe s/Interrupt/Signal/ for the signal handler one could help? It *might* look a tad cleaner to have the loop in a separate function from the existing code. I.e. a +ProcessRecoveryConflictInterrupts() that calls ProcessRecoveryConflictInterrupts(). > > What might actually make more sense is to just have a bitmask or something? > > Yeah, in fact I'm exploring something like that in later bigger > redesign work[1] that gets rid of signal handlers. Here I'm looking > for something simple and potentially back-patchable and I don't want > to have to think about async signal safety of bit-level manipulations. Makes sense. > /* > @@ -3146,6 +3192,9 @@ ProcessInterrupts(void) > return; > InterruptPending = false; > > + if (RecoveryConflictPending) > + ProcessRecoveryConflictInterrupts(); > + > if (ProcDiePending) > { > ProcDiePending = false; Should the ProcessRecoveryConflictInterrupts() call really be before the ProcDiePending check? Greetings, Andres Freund
On Sun, May 22, 2022 at 05:10:01PM -0700, Andres Freund wrote: > On 2022-05-10 16:39:11 +1200, Thomas Munro wrote: >> Ok, in this version there's two levels of flag: >> RecoveryConflictPending, so we do nothing if that's not set, and then >> the loop over RecoveryConflictPendingReasons is moved into >> ProcessRecoveryConflictInterrupts(). Better? > > I think so. > > I don't particularly like the Handle/ProcessRecoveryConflictInterrupt() split, > naming-wise. I don't think Handle vs Process indicates something meaningful? > Maybe s/Interrupt/Signal/ for the signal handler one could help? Handle is more consistent with the other types of interruptions in the SIGUSR1 handler, so the name proposed in the patch in not that confusing to me. And so does Process, in spirit with ProcessProcSignalBarrier() and ProcessLogMemoryContextInterrupt(). While on it, is postgres.c the best home for HandleRecoveryConflictInterrupt()? That's a very generic file, for starters. Not related to the actual bug, just asking. > It *might* look a tad cleaner to have the loop in a separate function from the > existing code. I.e. a +ProcessRecoveryConflictInterrupts() that calls > ProcessRecoveryConflictInterrupts(). Agreed that it would be a bit cleaner to keep the internals in a different routine. >> Yeah, in fact I'm exploring something like that in later bigger >> redesign work[1] that gets rid of signal handlers. Here I'm looking >> for something simple and potentially back-patchable and I don't want >> to have to think about async signal safety of bit-level manipulations. > > Makes sense. +1. Also note that bufmgr.c mentions RecoveryConflictInterrupt() in the top comment of HoldingBufferPinThatDelaysRecovery(). Should the processing of PROCSIG_RECOVERY_CONFLICT_DATABASE mention that FATAL is used because we are never going to retry the conflict as the database has been dropped? Getting rid of RecoveryConflictRetryable makes the code easier to go through. -- Michael
Attachment
On Wed, Jun 15, 2022 at 1:51 AM Michael Paquier <michael@paquier.xyz> wrote: > Handle is more consistent with the other types of interruptions in the > SIGUSR1 handler, so the name proposed in the patch in not that > confusing to me. And so does Process, in spirit with > ProcessProcSignalBarrier() and ProcessLogMemoryContextInterrupt(). > While on it, is postgres.c the best home for > HandleRecoveryConflictInterrupt()? That's a very generic file, for > starters. Not related to the actual bug, just asking. Yeah, there's existing precedent for this kind of split in, for example, HandleCatchupInterrupt() and ProcessCatchupInterrupt(). I think the idea is that "process" is supposed to sound like the more involved part of the operation, whereas "handle" is supposed to sound like the initial response to the signal. I'm not sure it's the clearest possible naming, but naming things is hard, and this patch is apparently not inventing a new way to do it. -- Robert Haas EDB: http://www.enterprisedb.com
Here's a new version, but there's something wrong that I haven't figured out yet (see CI link below). Here's one thing I got a bit confused about along the way, but it seems the comment was just wrong: + /* + * If we can abort just the current subtransaction then we are OK + * to throw an ERROR to resolve the conflict. Otherwise drop + * through to the FATAL case. ... + if (!IsSubTransaction()) ... + ereport(ERROR, Surely this was meant to say, "If we're not in a subtransaction then ...", right? Changed. I thought of a couple more simplifications for the big switch statement in ProcessRecoveryConflictInterrupt(). The special case for DoingCommandRead can be changed to fall through, instead of needing a separate ereport(FATAL). I also folded the two ereport(FATAL) calls in the CONFLICT_DATABASE case into one, since they differ only in errcode(). + (errcode(reason == PROCSIG_RECOVERY_CONFLICT_DATABASE ? + ERRCODE_DATABASE_DROPPED : + ERRCODE_T_R_SERIALIZATION_FAILURE), Now we're down to just one ereport(FATAL), one ereport(ERROR), and a couple of ways to give up without erroring. I think this makes the logic a lot easier to follow? I'm confused about proc->recoveryConflictPending: the startup process sets it, and sometimes the interrupt receiver sets it too, and it causes errdetail() to be clobbered on abort (for any reason), even though we bothered to set it carefully for the recovery conflict ereport calls. Or something like that. I haven't changed anything about that in this patch, though. Problem: I saw 031_recovery_conflict.pl time out while waiting for a buffer pin conflict, but so far once only, on CI: https://cirrus-ci.com/task/5956804860444672 timed out waiting for match: (?^:User was holding shared buffer pin for too long) at t/031_recovery_conflict.pl line 367. Hrmph. Still trying to reproduce that, which may be a bug in this patch, a bug in the test or a pre-existing problem. Note that recovery didn't say something like: 2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting after 11.197 ms: recovery conflict on buffer pin (That's what I'd expect to see in https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log if the startup process had decided to send the signal). ... so it seems like the problem in that run is upstream of the interrupt stuff. Other things changed in response to feedback (quoting from several recent messages): On Thu, Jun 16, 2022 at 5:01 AM Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Jun 15, 2022 at 1:51 AM Michael Paquier <michael@paquier.xyz> wrote: > > Handle is more consistent with the other types of interruptions in the > > SIGUSR1 handler, so the name proposed in the patch in not that > > confusing to me. And so does Process, in spirit with > > ProcessProcSignalBarrier() and ProcessLogMemoryContextInterrupt(). > > While on it, is postgres.c the best home for > > HandleRecoveryConflictInterrupt()? That's a very generic file, for > > starters. Not related to the actual bug, just asking. > > Yeah, there's existing precedent for this kind of split in, for > example, HandleCatchupInterrupt() and ProcessCatchupInterrupt(). I > think the idea is that "process" is supposed to sound like the more > involved part of the operation, whereas "handle" is supposed to sound > like the initial response to the signal. Thanks both for looking. Yeah, I was trying to keep with the existing convention here (though admittedly we're not 100% consistent on this, something to tidy up separately perhaps). On Wed, Jun 15, 2022 at 5:51 PM Michael Paquier <michael@paquier.xyz> wrote: > On Sun, May 22, 2022 at 05:10:01PM -0700, Andres Freund wrote: > > It *might* look a tad cleaner to have the loop in a separate function from the > > existing code. I.e. a +ProcessRecoveryConflictInterrupts() that calls > > ProcessRecoveryConflictInterrupts(). > > Agreed that it would be a bit cleaner to keep the internals in a > different routine. Alright, I split it into two functions: one with an 's' in the name to do the looping, and one without 's' to process an individual interrupt reason. Makes the patch harder to read because the indentation level changes... > Also note that bufmgr.c mentions RecoveryConflictInterrupt() in the > top comment of HoldingBufferPinThatDelaysRecovery(). Fixed. > Should the processing of PROCSIG_RECOVERY_CONFLICT_DATABASE mention > that FATAL is used because we are never going to retry the conflict as > the database has been dropped? OK, note added. > Getting rid of > RecoveryConflictRetryable makes the code easier to go through. Yeah, all the communication through global variables was really confusing, and also subtly wrong (that global reason gets clobbered with incorrect values), and that retryable variable was hard to follow. On Mon, May 23, 2022 at 12:10 PM Andres Freund <andres@anarazel.de> wrote: > On 2022-05-10 16:39:11 +1200, Thomas Munro wrote: > > @@ -3146,6 +3192,9 @@ ProcessInterrupts(void) > > return; > > InterruptPending = false; > > > > + if (RecoveryConflictPending) > > + ProcessRecoveryConflictInterrupts(); > > + > > if (ProcDiePending) > > { > > ProcDiePending = false; > > Should the ProcessRecoveryConflictInterrupts() call really be before the > ProcDiePending check? I don't think it's important which of (say) a statement timeout and a recovery conflict that arrive around the same time takes priority, but on reflection it was an ugly place to put it, and it seems tidier to move it down the function a bit further, where other various special interrupts are handled after the "main" and original die/cancel ones.
Attachment
On Tue, Jun 21, 2022 at 05:22:05PM +1200, Thomas Munro wrote: > Here's one thing I got a bit confused about along the way, but it > seems the comment was just wrong: > > + /* > + * If we can abort just the current > subtransaction then we are OK > + * to throw an ERROR to resolve the conflict. > Otherwise drop > + * through to the FATAL case. > ... > + if (!IsSubTransaction()) > ... > + ereport(ERROR, > > Surely this was meant to say, "If we're not in a subtransaction then > ...", right? Changed. Indeed, the code does something else than what the comment says, aka generating an ERROR if the process is not in a subtransaction, ignoring already aborted transactions (aborted subtrans go to FATAL) and throwing a FATAL in the other cases. So your change looks right. > I thought of a couple more simplifications for the big switch > statement in ProcessRecoveryConflictInterrupt(). The special case for > DoingCommandRead can be changed to fall through, instead of needing a > separate ereport(FATAL). The extra business with QueryCancelHoldoffCount and DoingCommandRead is the only addition for the snapshot, lock and tablespace conflict handling part. I don't see why a reason why that could be wrong on a close lookup. Anyway, why don't you check QueryCancelPending on top of QueryCancelHoldoffCount? > Now we're down to just one ereport(FATAL), one ereport(ERROR), and a > couple of ways to give up without erroring. I think this makes the > logic a lot easier to follow? Agreed that it looks like a gain in clarity. -- Michael
Attachment
On Tue, Jun 21, 2022 at 7:44 PM Michael Paquier <michael@paquier.xyz> wrote: > The extra business with QueryCancelHoldoffCount and DoingCommandRead > is the only addition for the snapshot, lock and tablespace conflict > handling part. I don't see why a reason why that could be wrong on a > close lookup. Anyway, why don't you check QueryCancelPending on top > of QueryCancelHoldoffCount? The idea of this patch is to make ProcessRecoveryConflictInterrupt() throw its own ERROR, instead of setting QueryCancelPending (as an earlier version of the patch did). It still has to respect QueryCancelHoldoffCount, though, to avoid emitting an ERROR at bad times for the fe/be protocol.
On Tue, Jun 21, 2022 at 11:02:57PM +1200, Thomas Munro wrote: > On Tue, Jun 21, 2022 at 7:44 PM Michael Paquier <michael@paquier.xyz> wrote: >> The extra business with QueryCancelHoldoffCount and DoingCommandRead >> is the only addition for the snapshot, lock and tablespace conflict >> handling part. I don't see why a reason why that could be wrong on a >> close lookup. Anyway, why don't you check QueryCancelPending on top >> of QueryCancelHoldoffCount? > > The idea of this patch is to make ProcessRecoveryConflictInterrupt() > throw its own ERROR, instead of setting QueryCancelPending (as an > earlier version of the patch did). It still has to respect > QueryCancelHoldoffCount, though, to avoid emitting an ERROR at bad > times for the fe/be protocol. Yeah, I was reading through v3 and my brain questioned the inconsistency, but I can see that v2 already did that and I have also looked at it. Anyway, my concern here is that the code becomes more dependent on the ordering of ProcessRecoveryConflictInterrupt() and the code path checking for QueryCancelPending in ProcessInterrupts(). With the patch, we should always have QueryCancelPending set to false, as long as there are no QueryCancelHoldoffCount. Perhaps an extra assertion for QueryCancelPending could be added at the beginning of ProcessRecoveryConflictInterrupts(), in combination of the one already present for InterruptHoldoffCount. I agree that's a minor point, though. -- Michael
Attachment
On Wed, Jun 22, 2022 at 1:04 PM Michael Paquier <michael@paquier.xyz> wrote: > With the patch, we should always have QueryCancelPending set to false, > as long as there are no QueryCancelHoldoffCount. Perhaps an extra > assertion for QueryCancelPending could be added at the beginning of > ProcessRecoveryConflictInterrupts(), in combination of the one already > present for InterruptHoldoffCount. I agree that's a minor point, > though. But QueryCancelPending can be set to true at any time by StatementCancelHandler(), if we receive SIGINT.
Hi, On 2022-06-21 17:22:05 +1200, Thomas Munro wrote: > Problem: I saw 031_recovery_conflict.pl time out while waiting for a > buffer pin conflict, but so far once only, on CI: > > https://cirrus-ci.com/task/5956804860444672 > > timed out waiting for match: (?^:User was holding shared buffer pin > for too long) at t/031_recovery_conflict.pl line 367. > > Hrmph. Still trying to reproduce that, which may be a bug in this > patch, a bug in the test or a pre-existing problem. Note that > recovery didn't say something like: > > 2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting > after 11.197 ms: recovery conflict on buffer pin > > (That's what I'd expect to see in > https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log > if the startup process had decided to send the signal). > > ... so it seems like the problem in that run is upstream of the interrupt stuff. Odd. The only theory I have so far is that the manual vacuum on the primary somehow decided to skip the page, and thus didn't trigger a conflict. Because clearly replay progressed past the records of the VACUUM. Perhaps we should use VACUUM VERBOSE? In contrast to pg_regress tests that should be unproblematic? Greetings, Andres Freund
Hi, On 2022-06-21 19:33:01 -0700, Andres Freund wrote: > On 2022-06-21 17:22:05 +1200, Thomas Munro wrote: > > Problem: I saw 031_recovery_conflict.pl time out while waiting for a > > buffer pin conflict, but so far once only, on CI: > > > > https://cirrus-ci.com/task/5956804860444672 > > > > timed out waiting for match: (?^:User was holding shared buffer pin > > for too long) at t/031_recovery_conflict.pl line 367. > > > > Hrmph. Still trying to reproduce that, which may be a bug in this > > patch, a bug in the test or a pre-existing problem. Note that > > recovery didn't say something like: > > > > 2022-06-21 17:05:40.931 NZST [57674] LOG: recovery still waiting > > after 11.197 ms: recovery conflict on buffer pin > > > > (That's what I'd expect to see in > > https://api.cirrus-ci.com/v1/artifact/task/5956804860444672/log/src/test/recovery/tmp_check/log/031_recovery_conflict_standby.log > > if the startup process had decided to send the signal). > > > > ... so it seems like the problem in that run is upstream of the interrupt stuff. > > Odd. The only theory I have so far is that the manual vacuum on the primary > somehow decided to skip the page, and thus didn't trigger a conflict. Because > clearly replay progressed past the records of the VACUUM. Perhaps we should > use VACUUM VERBOSE? In contrast to pg_regress tests that should be > unproblematic? I saw a couple failures of 031 on CI for the meson patch - which obviously doesn't change anything around this. However it adds a lot more distributions, and the added ones run in docker containers on a shared host, presumably adding a lot of noise. I saw this more frequently when I accidentally had the test runs at the number of CPUs in the host, rather than the allotted CPUs in the container. That made me look more into these issues. I played with adding a pg_usleep() to RecoveryConflictInterrupt() to simulate slow signal delivery. Found a couple things: - the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can completely swamp the target(s) on a busy system. This surely is exascerbated by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace does seem like a bad idea. - we process the same recovery conflict (not a single signal, but a single "real conflict") multiple times in the target of a conflict, presumably while handling the error. That includes handling the same interrupt once as an ERROR and once as FATAL. E.g. 2022-07-01 12:19:14.428 PDT [2000572] LOG: recovery still waiting after 10.032 ms: recovery conflict on buffer pin 2022-07-01 12:19:14.428 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead100; blkref #0: rel 1663/16385/1> 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recoveryat character 15 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery. 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT: SELECT * FROM test_recovery_conflict_table2; 2022-07-01 12:19:54.778 PDT [2000572] LOG: recovery finished waiting after 40359.937 ms: recovery conflict on buffer pin 2022-07-01 12:19:54.778 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0 ndead100; blkref #0: rel 1663/16385/1> 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery. 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. 2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG: statement: SELECT 1; note that the startup process considers the conflict resolved by the time the backend handles the interrupt. I also see cases where a FATAL is repeated: 2022-07-01 12:43:18.190 PDT [2054721] LOG: recovery still waiting after 15.410 ms: recovery conflict on snapshot 2022-07-01 12:43:18.190 PDT [2054721] DETAIL: Conflicting process: 2054753. 2022-07-01 12:43:18.190 PDT [2054721] CONTEXT: WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732 nredirected 18ndead 0; blkref #0: rel 1663/16385/> 2054753: reporting recovery conflict 9 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versionsthat must be removed. 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. ... 2054753: reporting recovery conflict 9 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versionsthat must be removed. 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. the FATAL one seems like it might at least partially be due to RecoveryConflictPending not being reset in at least some of the FATAL recovery conflict paths. It seems pretty obvious that the proc_exit_inprogress check in RecoveryConflictInterrupt() is misplaced, and needs to be where the errors are thrown. But that won't help, because it turns out, we don't yet set that necessarily. Look at this stack from an assertion in ProcessInterrupts() ensuring that the same FATAL isn't raised twice: #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 #1 0x00007fd47897b546 in __GI_abort () at abort.c:79 #2 0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6 "FailedAssertion", fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259) at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69 #3 0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259 #4 0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0 <__func__.8>"internal_flush") at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683 #5 0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393 #6 0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340 #7 0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3283 #8 0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541 #9 0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0 <__func__.9>"ProcessInterrupts") at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592 #10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266 #11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:497 #12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192) reporting a FATAL error in process of reporting a FATAL error. Yeha. I assume this could lead to sending out the same message quite a few times. This is quite the mess. Greetings, Andres Freund
HHi, On 2022-07-01 13:14:23 -0700, Andres Freund wrote: > I saw a couple failures of 031 on CI for the meson patch - which obviously > doesn't change anything around this. However it adds a lot more distributions, > and the added ones run in docker containers on a shared host, presumably > adding a lot of noise. I saw this more frequently when I accidentally had the > test runs at the number of CPUs in the host, rather than the allotted CPUs in > the container. > > That made me look more into these issues. I played with adding a pg_usleep() > to RecoveryConflictInterrupt() to simulate slow signal delivery. > > Found a couple things: > > - the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can > completely swamp the target(s) on a busy system. This surely is exascerbated > by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace > does seem like a bad idea. This one is also implicated in https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql and related issues. Besides being very short, it also seems like a bad idea to wait when we might not need to? Seems we should only wait if we subsequently couldn't get the lock? Misleadingly WaitExceedsMaxStandbyDelay() also contains a usleep, which at least I wouldn't expect given its name. A minimal fix would be to increase the wait time, similar how it is done with standbyWait_us? Medium term it seems we ought to set the startup process's latch when handling a conflict, and use a latch wait. But avoiding races probably isn't quite trivial. > - we process the same recovery conflict (not a single signal, but a single > "real conflict") multiple times in the target of a conflict, presumably > while handling the error. That includes handling the same interrupt once as > an ERROR and once as FATAL. > > E.g. > > 2022-07-01 12:19:14.428 PDT [2000572] LOG: recovery still waiting after 10.032 ms: recovery conflict on buffer pin > 2022-07-01 12:19:14.428 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0ndead 100; blkref #0: rel 1663/16385/1> > 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl ERROR: canceling statement due to conflict with recoveryat character 15 > 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery. > 2022-07-01 12:19:54.597 PDT [2000578] 031_recovery_conflict.pl STATEMENT: SELECT * FROM test_recovery_conflict_table2; > 2022-07-01 12:19:54.778 PDT [2000572] LOG: recovery finished waiting after 40359.937 ms: recovery conflict on buffer pin > 2022-07-01 12:19:54.778 PDT [2000572] CONTEXT: WAL redo at 0/344E098 for Heap2/PRUNE: latestRemovedXid 0 nredirected 0ndead 100; blkref #0: rel 1663/16385/1> > 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery > 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery. > 2022-07-01 12:19:54.788 PDT [2000578] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. > 2022-07-01 12:19:54.837 PDT [2001389] 031_recovery_conflict.pl LOG: statement: SELECT 1; > > note that the startup process considers the conflict resolved by the time > the backend handles the interrupt. I guess the reason we first get an ERROR and then a FATAL is that the second iteration hits the if (RecoveryConflictPending && DoingCommandRead) bit, because we end up there after handling the first error? And that's a FATAL. I suspect that Thomas' fix will address at least part of this, as the check whether we're still waiting for a lock will be made just before the error is thrown. > I also see cases where a FATAL is repeated: > > 2022-07-01 12:43:18.190 PDT [2054721] LOG: recovery still waiting after 15.410 ms: recovery conflict on snapshot > 2022-07-01 12:43:18.190 PDT [2054721] DETAIL: Conflicting process: 2054753. > 2022-07-01 12:43:18.190 PDT [2054721] CONTEXT: WAL redo at 0/344AB90 for Heap2/PRUNE: latestRemovedXid 732 nredirected18 ndead 0; blkref #0: rel 1663/16385/> > 2054753: reporting recovery conflict 9 > 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery > 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versionsthat must be removed. > 2022-07-01 12:43:18.482 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. > ... > 2054753: reporting recovery conflict 9 > 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery > 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl DETAIL: User query might have needed to see row versionsthat must be removed. > 2022-07-01 12:43:19.068 PDT [2054753] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. > > the FATAL one seems like it might at least partially be due to > RecoveryConflictPending not being reset in at least some of the FATAL > recovery conflict paths. > > It seems pretty obvious that the proc_exit_inprogress check in > RecoveryConflictInterrupt() is misplaced, and needs to be where the errors > are thrown. But that won't help, because it turns out, we don't yet set that > necessarily. > > Look at this stack from an assertion in ProcessInterrupts() ensuring that > the same FATAL isn't raised twice: > > #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49 > #1 0x00007fd47897b546 in __GI_abort () at abort.c:79 > #2 0x00005594c150b27a in ExceptionalCondition (conditionName=0x5594c16fe746 "!in_fatal", errorType=0x5594c16fd8f6 "FailedAssertion", > fileName=0x5594c16fdac0 "/home/andres/src/postgresql/src/backend/tcop/postgres.c", lineNumber=3259) > at /home/andres/src/postgresql/src/backend/utils/error/assert.c:69 > #3 0x00005594c134f6d2 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3259 > #4 0x00005594c150c671 in errfinish (filename=0x5594c16b8f2e "pqcomm.c", lineno=1393, funcname=0x5594c16b95e0 <__func__.8>"internal_flush") > at /home/andres/src/postgresql/src/backend/utils/error/elog.c:683 > #5 0x00005594c115e059 in internal_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1393 > #6 0x00005594c115df49 in socket_flush () at /home/andres/src/postgresql/src/backend/libpq/pqcomm.c:1340 > #7 0x00005594c15121af in send_message_to_frontend (edata=0x5594c18a5740 <errordata>) at /home/andres/src/postgresql/src/backend/utils/error/elog.c:3283 > #8 0x00005594c150f00e in EmitErrorReport () at /home/andres/src/postgresql/src/backend/utils/error/elog.c:1541 > #9 0x00005594c150c42e in errfinish (filename=0x5594c16fdaed "postgres.c", lineno=3266, funcname=0x5594c16ff5b0 <__func__.9>"ProcessInterrupts") > at /home/andres/src/postgresql/src/backend/utils/error/elog.c:592 > #10 0x00005594c134f770 in ProcessInterrupts () at /home/andres/src/postgresql/src/backend/tcop/postgres.c:3266 > #11 0x00005594c134b995 in ProcessClientReadInterrupt (blocked=true) at /home/andres/src/postgresql/src/backend/tcop/postgres.c:497 > #12 0x00005594c1153417 in secure_read (port=0x5594c2e7d620, ptr=0x5594c189ba60 <PqRecvBuffer>, len=8192) > > reporting a FATAL error in process of reporting a FATAL error. Yeha. > > I assume this could lead to sending out the same message quite a few > times. This seems like it needs to be fixed in elog.c. ISTM that at the very least we ought to HOLD_INTERRUPTS() before the EmitErrorReport() for FATAL. Greetings, Andres Freund
On Sat, Jul 2, 2022 at 11:18 AM Andres Freund <andres@anarazel.de> wrote: > On 2022-07-01 13:14:23 -0700, Andres Freund wrote: > > - the pg_usleep(5000) in ResolveRecoveryConflictWithVirtualXIDs() can > > completely swamp the target(s) on a busy system. This surely is exascerbated > > by the usleep I added RecoveryConflictInterrupt() but a 5ms signalling pace > > does seem like a bad idea. > > This one is also implicated in > https://postgr.es/m/20220701154105.jjfutmngoedgiad3%40alvherre.pgsql > and related issues. > > Besides being very short, it also seems like a bad idea to wait when we might > not need to? Seems we should only wait if we subsequently couldn't get the > lock? > > Misleadingly WaitExceedsMaxStandbyDelay() also contains a usleep, which at > least I wouldn't expect given its name. > > > A minimal fix would be to increase the wait time, similar how it is done with > standbyWait_us? > > Medium term it seems we ought to set the startup process's latch when handling > a conflict, and use a latch wait. But avoiding races probably isn't quite > trivial. Yeah, I had the same thought; it's easy to criticise the current collateral damage maximising design, but a whole project to come up with a good race-free precise design. We should do that, though. > I guess the reason we first get an ERROR and then a FATAL is that the second > iteration hits the if (RecoveryConflictPending && DoingCommandRead) bit, > because we end up there after handling the first error? And that's a FATAL. > > I suspect that Thomas' fix will address at least part of this, as the check > whether we're still waiting for a lock will be made just before the error is > thrown. That seems right. > > reporting a FATAL error in process of reporting a FATAL error. Yeha. > > > > I assume this could lead to sending out the same message quite a few > > times. > > This seems like it needs to be fixed in elog.c. ISTM that at the very least we > ought to HOLD_INTERRUPTS() before the EmitErrorReport() for FATAL. That seems to make sense. About my patch... even though it solves a couple of problems now identified, I found an architectural problem that I don't have a solution for yet, which stopped me in my tracks a few weeks back. I need to find a way forward that is back-patchable. Recap: The basic concept here is to kick all "real work" out of signal handlers, because that work is unsafe in that context. So instead of deciding whether we need to cancel the current query at the next CFI by setting QueryCancelPending, we defer the whole decision to the next CFI. Sometimes the decision is that we don't need to do anything, and the CFI returns and execution continues normally. The problem is that there are a couple of parts of our tree that don't use a standard CFI, but are interrupted by looking for QueryCancelPending directly. syncrep.c is one, but I don't believe you could be blocked there while recovery is in progress, and regcomp.c is another. (There was a third case relating to that posix_fallocate() problem report you mentioned above, but 4518c798 removed that). The regular expression machinery is capable of consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re) frequently to avoid getting stuck. With the patch as it stands, that would never be true.
Thomas Munro <thomas.munro@gmail.com> writes: > ... The regular expression machinery is capable of > consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re) > frequently to avoid getting stuck. With the patch as it stands, that > would never be true. Surely that can't be too hard to fix. We might have to refactor the code around QueryCancelPending a little bit so that callers can ask "do we need a query cancel now?" without actually triggering a longjmp ... but why would that be problematic? regards, tom lane
On Tue, Jul 26, 2022 at 07:22:52PM -0400, Tom Lane wrote: > Thomas Munro <thomas.munro@gmail.com> writes: > > ... The regular expression machinery is capable of > > consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re) > > frequently to avoid getting stuck. With the patch as it stands, that > > would never be true. > > Surely that can't be too hard to fix. We might have to refactor > the code around QueryCancelPending a little bit so that callers > can ask "do we need a query cancel now?" without actually triggering > a longjmp ... but why would that be problematic? It could work. The problems are like those of making code safe to run in a signal handler. You can use e.g. snprintf in rcancelrequested(), but you still can't use palloc() or ereport(). I see at least these strategies: 1. Accept that recovery conflict checks run after a regex call completes. 2. Have rcancelrequested() return true unconditionally if we need a conflict check. If there's no actual conflict, restart the regex. 3. Have rcancelrequested() run the conflict check, including elog-using PostgreSQL code. On longjmp(), accept the leak of regex mallocs. 4. Have rcancelrequested() run the conflict check, including elog-using PostgreSQL code. On longjmp(), escalate to FATAL. 5. Write the conflict check code to dutifully avoid longjmp(). 6. Convert src/backend/regex to use palloc, so longjmp() is fine. I would tend to pick (3). (6) could come later and remove the drawback of (3). Does one of those unblock the patch, or not? === I found this thread because $SUBJECT is causing more buildfarm failures lately. Here are just the ones with symptom "timed out waiting for match: (?^:User was holding a relation lock for too long)": sysname │ snapshot │ branch │ bfurl ───────────┼─────────────────────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────────────────── wrasse │ 2022-09-16 09:19:06 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-09-16%2009%3A19%3A06 francolin │ 2022-09-24 02:02:23 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=francolin&dt=2022-09-24%2002%3A02%3A23 wrasse │ 2022-10-19 08:49:16 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-10-19%2008%3A49%3A16 wrasse │ 2022-11-16 16:59:23 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-16%2016%3A59%3A23 wrasse │ 2022-11-17 09:58:48 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-17%2009%3A58%3A48 wrasse │ 2022-11-21 22:17:20 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-21%2022%3A17%3A20 wrasse │ 2022-11-22 21:52:26 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-22%2021%3A52%3A26 wrasse │ 2022-11-25 09:16:44 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-11-25%2009%3A16%3A44 wrasse │ 2022-12-04 23:33:26 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-04%2023%3A33%3A26 wrasse │ 2022-12-07 11:48:54 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-07%2011%3A48%3A54 wrasse │ 2022-12-07 20:58:49 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-07%2020%3A58%3A49 wrasse │ 2022-12-09 12:19:40 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-09%2012%3A19%3A40 wrasse │ 2022-12-09 15:29:45 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-09%2015%3A29%3A45 wrasse │ 2022-12-15 09:29:52 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-15%2009%3A29%3A52 wrasse │ 2022-12-23 07:37:06 │ REL_15_STABLE │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-23%2007%3A37%3A06 wrasse │ 2022-12-23 10:32:05 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-23%2010%3A32%3A05 wrasse │ 2022-12-23 17:47:17 │ HEAD │ https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=wrasse&dt=2022-12-23%2017%3A47%3A17 (17 rows) I can reproduce that symptom reliably, on GNU/Linux, with the attached patch adding sleeps. The key log bit: 2022-09-16 11:50:37.338 CEST [15022:4] 031_recovery_conflict.pl LOG: statement: BEGIN; 2022-09-16 11:50:37.339 CEST [15022:5] 031_recovery_conflict.pl LOG: statement: LOCK TABLE test_recovery_conflict_table1IN ACCESS SHARE MODE; 2022-09-16 11:50:37.341 CEST [15022:6] 031_recovery_conflict.pl LOG: statement: SELECT 1; 2022-09-16 11:50:38.076 CEST [14880:17] LOG: recovery still waiting after 11.482 ms: recovery conflict on lock 2022-09-16 11:50:38.076 CEST [14880:18] DETAIL: Conflicting process: 15022. 2022-09-16 11:50:38.076 CEST [14880:19] CONTEXT: WAL redo at 0/34243F0 for Standby/LOCK: xid 733 db 16385 rel 16386 2022-09-16 11:50:38.196 CEST [15022:7] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery 2022-09-16 11:50:38.196 CEST [15022:8] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery. 2022-09-16 11:50:38.196 CEST [15022:9] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. 2022-09-16 11:50:38.197 CEST [15022:10] 031_recovery_conflict.pl LOG: disconnection: session time: 0:00:01.041 user=nm database=test_dbhost=[local] 2022-09-16 11:50:38.198 CEST [14880:20] LOG: recovery finished waiting after 132.886 ms: recovery conflict on lock The second DETAIL should be "User was holding a relation lock for too long." The backend in question is idle in transaction. RecoveryConflictInterrupt() for PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK won't see IsWaitingForLock(), so it will find no conflict. However, RecoveryConflictReason remains clobbered, hence the wrong DETAIL message. Incidentally, the affected test contains comment "# DROP TABLE containing block which standby has in a pinned buffer". The standby holds no pin at that moment; the LOCK TABLE pins system catalog pages, but it drops every pin it acquires.
Attachment
On Thu, Dec 29, 2022 at 9:40 PM Noah Misch <noah@leadboat.com> wrote: > On Tue, Jul 26, 2022 at 07:22:52PM -0400, Tom Lane wrote: > > Thomas Munro <thomas.munro@gmail.com> writes: > > > ... The regular expression machinery is capable of > > > consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re) > > > frequently to avoid getting stuck. With the patch as it stands, that > > > would never be true. > > > > Surely that can't be too hard to fix. We might have to refactor > > the code around QueryCancelPending a little bit so that callers > > can ask "do we need a query cancel now?" without actually triggering > > a longjmp ... but why would that be problematic? > > It could work. The problems are like those of making code safe to run in a > signal handler. You can use e.g. snprintf in rcancelrequested(), but you > still can't use palloc() or ereport(). I see at least these strategies: > > 1. Accept that recovery conflict checks run after a regex call completes. > 2. Have rcancelrequested() return true unconditionally if we need a conflict > check. If there's no actual conflict, restart the regex. > 3. Have rcancelrequested() run the conflict check, including elog-using > PostgreSQL code. On longjmp(), accept the leak of regex mallocs. > 4. Have rcancelrequested() run the conflict check, including elog-using > PostgreSQL code. On longjmp(), escalate to FATAL. > 5. Write the conflict check code to dutifully avoid longjmp(). > 6. Convert src/backend/regex to use palloc, so longjmp() is fine. Thanks! I appreciate the help getting unstuck here. I'd thought about some of these but not all. I also considered a couple more: 7. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is true, and copy the error somewhere to be re-thrown later after the regexp code exits with REG_CANCEL. 8. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is true, and call a new regexp function that will free everything before re-throwing. After Tom's response I spent some time trying to figure out how to make a SOFT_CHECK_FOR_INTERRUPTS(), which would return a value to indicate that it would like to throw. I think it would need to re-arm various flags and introduce a programming rule for all interrupt processing routines that if they fired once under a soft check they must fire again later under a non-soft check. That all seems a bit complicated, and a general mechanism like that seemed like overkill for a single user, which led me to idea #7. Idea #8 is a realisation that twisting oneself into a pretzel to avoid having to change the regexp code or its REG_CANCEL control flow may be a bit silly. If the only thing it really needs to do is free some memory, maybe the regexp module should provide a function that frees everything that is safe to call from our rcancelrequested callback, so we can do so before we longjmp back to Kansas. Then the REG_CANCEL code paths would be effectively unreachable in PostgreSQL. I don't know if it's better or worse than your idea #6, "use palloc instead, it already has garbage collection, duh", but it's a different take on the same realisation that this is just about free(). I guess idea #6 must be pretty easy to try: just point that MALLOC() macro to palloc(), and do a plain old CFI() in rcancelrequested(). Why do you suggest #3 as an interim measure? Do we fear that palloc() might hurt regexp performance? > I can reproduce that symptom reliably, on GNU/Linux, with the attached patch > adding sleeps. The key log bit: > > 2022-09-16 11:50:37.338 CEST [15022:4] 031_recovery_conflict.pl LOG: statement: BEGIN; > 2022-09-16 11:50:37.339 CEST [15022:5] 031_recovery_conflict.pl LOG: statement: LOCK TABLE test_recovery_conflict_table1IN ACCESS SHARE MODE; > 2022-09-16 11:50:37.341 CEST [15022:6] 031_recovery_conflict.pl LOG: statement: SELECT 1; > 2022-09-16 11:50:38.076 CEST [14880:17] LOG: recovery still waiting after 11.482 ms: recovery conflict on lock > 2022-09-16 11:50:38.076 CEST [14880:18] DETAIL: Conflicting process: 15022. > 2022-09-16 11:50:38.076 CEST [14880:19] CONTEXT: WAL redo at 0/34243F0 for Standby/LOCK: xid 733 db 16385 rel 16386 > 2022-09-16 11:50:38.196 CEST [15022:7] 031_recovery_conflict.pl FATAL: terminating connection due to conflict with recovery > 2022-09-16 11:50:38.196 CEST [15022:8] 031_recovery_conflict.pl DETAIL: User transaction caused buffer deadlock with recovery. > 2022-09-16 11:50:38.196 CEST [15022:9] 031_recovery_conflict.pl HINT: In a moment you should be able to reconnect to thedatabase and repeat your command. > 2022-09-16 11:50:38.197 CEST [15022:10] 031_recovery_conflict.pl LOG: disconnection: session time: 0:00:01.041 user=nmdatabase=test_db host=[local] > 2022-09-16 11:50:38.198 CEST [14880:20] LOG: recovery finished waiting after 132.886 ms: recovery conflict on lock > > The second DETAIL should be "User was holding a relation lock for too long." > The backend in question is idle in transaction. RecoveryConflictInterrupt() > for PROCSIG_RECOVERY_CONFLICT_STARTUP_DEADLOCK won't see IsWaitingForLock(), > so it will find no conflict. However, RecoveryConflictReason remains > clobbered, hence the wrong DETAIL message. Aha. I'd speculated that RecoveryConflictReason must be capable of reporting bogus errors like that up-thread. > Incidentally, the affected test > contains comment "# DROP TABLE containing block which standby has in a pinned > buffer". The standby holds no pin at that moment; the LOCK TABLE pins system > catalog pages, but it drops every pin it acquires. Oh, I guess the comment is just wrong? There are earlier sections concerned with buffer pins, but the section "RECOVERY CONFLICT 3" is about locks.
On Sat, Dec 31, 2022 at 10:06:53AM +1300, Thomas Munro wrote: > On Thu, Dec 29, 2022 at 9:40 PM Noah Misch <noah@leadboat.com> wrote: > > On Tue, Jul 26, 2022 at 07:22:52PM -0400, Tom Lane wrote: > > > Thomas Munro <thomas.munro@gmail.com> writes: > > > > ... The regular expression machinery is capable of > > > > consuming a lot of CPU, and does CANCEL_REQUESTED(nfa->v->re) > > > > frequently to avoid getting stuck. With the patch as it stands, that > > > > would never be true. > > > > > > Surely that can't be too hard to fix. We might have to refactor > > > the code around QueryCancelPending a little bit so that callers > > > can ask "do we need a query cancel now?" without actually triggering > > > a longjmp ... but why would that be problematic? > > > > It could work. The problems are like those of making code safe to run in a > > signal handler. You can use e.g. snprintf in rcancelrequested(), but you > > still can't use palloc() or ereport(). I see at least these strategies: > > > > 1. Accept that recovery conflict checks run after a regex call completes. > > 2. Have rcancelrequested() return true unconditionally if we need a conflict > > check. If there's no actual conflict, restart the regex. > > 3. Have rcancelrequested() run the conflict check, including elog-using > > PostgreSQL code. On longjmp(), accept the leak of regex mallocs. > > 4. Have rcancelrequested() run the conflict check, including elog-using > > PostgreSQL code. On longjmp(), escalate to FATAL. > > 5. Write the conflict check code to dutifully avoid longjmp(). > > 6. Convert src/backend/regex to use palloc, so longjmp() is fine. > > Thanks! I appreciate the help getting unstuck here. I'd thought > about some of these but not all. I also considered a couple more: > > 7. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is > true, and copy the error somewhere to be re-thrown later after the > regexp code exits with REG_CANCEL. > 8. Do a CFI() in a try/catch if INTERRUPTS_PENDING_CONDITION() is > true, and call a new regexp function that will free everything before > re-throwing. > > After Tom's response I spent some time trying to figure out how to > make a SOFT_CHECK_FOR_INTERRUPTS(), which would return a value to > indicate that it would like to throw. I think it would need to re-arm > various flags and introduce a programming rule for all interrupt > processing routines that if they fired once under a soft check they > must fire again later under a non-soft check. That all seems a bit > complicated, and a general mechanism like that seemed like overkill > for a single user, which led me to idea #7. > > Idea #8 is a realisation that twisting oneself into a pretzel to avoid > having to change the regexp code or its REG_CANCEL control flow may be > a bit silly. If the only thing it really needs to do is free some > memory, maybe the regexp module should provide a function that frees > everything that is safe to call from our rcancelrequested callback, so > we can do so before we longjmp back to Kansas. Then the REG_CANCEL > code paths would be effectively unreachable in PostgreSQL. I don't > know if it's better or worse than your idea #6, "use palloc instead, > it already has garbage collection, duh", but it's a different take on > the same realisation that this is just about free(). PG_TRY() isn't free, so it's nice that (6) doesn't add one. If (6) fails in some not-yet-revealed way, (8) could get more relevant. > I guess idea #6 must be pretty easy to try: just point that MALLOC() > macro to palloc(), and do a plain old CFI() in rcancelrequested(). > Why do you suggest #3 as an interim measure? No strong reason. I think I suggested it because it's a strict subset of (6), but I didn't think through in detail. (I've never modified src/backend/regex and have barely read its code, for whatever that's worth.) > Do we fear that palloc() might hurt regexp performance? Nah. I don't recall any place in PostgreSQL where performance is an argument for raw malloc() calls. > > Incidentally, the affected test > > contains comment "# DROP TABLE containing block which standby has in a pinned > > buffer". The standby holds no pin at that moment; the LOCK TABLE pins system > > catalog pages, but it drops every pin it acquires. > > Oh, I guess the comment is just wrong? There are earlier sections > concerned with buffer pins, but the section "RECOVERY CONFLICT 3" is > about locks. Yes.
On Sat, Dec 31, 2022 at 6:36 PM Noah Misch <noah@leadboat.com> wrote: > On Sat, Dec 31, 2022 at 10:06:53AM +1300, Thomas Munro wrote: > > Idea #8 is a realisation that twisting oneself into a pretzel to avoid > > having to change the regexp code or its REG_CANCEL control flow may be > > a bit silly. If the only thing it really needs to do is free some > > memory, maybe the regexp module should provide a function that frees > > everything that is safe to call from our rcancelrequested callback, so > > we can do so before we longjmp back to Kansas. Then the REG_CANCEL > > code paths would be effectively unreachable in PostgreSQL. I don't > > know if it's better or worse than your idea #6, "use palloc instead, > > it already has garbage collection, duh", but it's a different take on > > the same realisation that this is just about free(). > > PG_TRY() isn't free, so it's nice that (6) doesn't add one. If (6) fails in > some not-yet-revealed way, (8) could get more relevant. > > > I guess idea #6 must be pretty easy to try: just point that MALLOC() > > macro to palloc(), and do a plain old CFI() in rcancelrequested(). It's not quite so easy: in RE_compile_and_cache we construct objects with arbitrary cache-managed lifetime, which suggests we need a cache memory context, but we could also fail mid construction, which suggests we'd need a dedicated per-regex object memory context that is made permanent with the MemoryContextSetParent() trick (as we see elsewhere for cached things that are constructed by code that might throw), or something like the try/catch thing from idea #8. Thinking...
On Mon, Jan 2, 2023 at 8:38 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Sat, Dec 31, 2022 at 6:36 PM Noah Misch <noah@leadboat.com> wrote: > > On Sat, Dec 31, 2022 at 10:06:53AM +1300, Thomas Munro wrote: > > > Idea #8 is a realisation that twisting oneself into a pretzel to avoid > > > having to change the regexp code or its REG_CANCEL control flow may be > > > a bit silly. If the only thing it really needs to do is free some > > > memory, maybe the regexp module should provide a function that frees > > > everything that is safe to call from our rcancelrequested callback, so > > > we can do so before we longjmp back to Kansas. Then the REG_CANCEL > > > code paths would be effectively unreachable in PostgreSQL. I don't > > > know if it's better or worse than your idea #6, "use palloc instead, > > > it already has garbage collection, duh", but it's a different take on > > > the same realisation that this is just about free(). > > > > PG_TRY() isn't free, so it's nice that (6) doesn't add one. If (6) fails in > > some not-yet-revealed way, (8) could get more relevant. > > > > > I guess idea #6 must be pretty easy to try: just point that MALLOC() > > > macro to palloc(), and do a plain old CFI() in rcancelrequested(). > > It's not quite so easy: in RE_compile_and_cache we construct objects > with arbitrary cache-managed lifetime, which suggests we need a cache > memory context, but we could also fail mid construction, which > suggests we'd need a dedicated per-regex object memory context that is > made permanent with the MemoryContextSetParent() trick (as we see > elsewhere for cached things that are constructed by code that might > throw), ... Here's an experiment-grade attempt at idea #6 done that way, for discussion. You can see how much memory is wasted by each regex_t, which I guess is probably on the order of a couple of hundred KB if you use all 32 regex cache slots using ALLOCSET_SMALL_SIZES as I did here: postgres=# select 'x' ~ 'hello world .*'; -[ RECORD 1 ] ?column? | f postgres=# select * from pg_backend_memory_contexts where name = 'RegexpMemoryContext'; -[ RECORD 1 ]-+------------------------- name | RegexpMemoryContext ident | hello world .* parent | RegexpCacheMemoryContext level | 2 total_bytes | 13376 total_nblocks | 5 free_bytes | 5144 free_chunks | 8 used_bytes | 8232 There's some more memory allocated in regc_pg_locale.c with raw malloc() that could probably benefit from a pallocisation just to be able to measure it, but I didn't touch that here. The recovery conflict patch itself is unchanged, except that I removed the claim in the commit message that this would be back-patched. It's pretty clear that this would need to spend a decent amount of time on master only.
Attachment
Hi, On 2022-12-29 00:40:52 -0800, Noah Misch wrote: > Incidentally, the affected test contains comment "# DROP TABLE containing > block which standby has in a pinned buffer". The standby holds no pin at > that moment; the LOCK TABLE pins system catalog pages, but it drops every > pin it acquires. I guess that comment survived from an earlier version of that test (or another test where it was copied from). I'm inclined to just delete it. Greetings, Andres Freund
Hi, On 2023-01-04 16:46:05 +1300, Thomas Munro wrote: > postgres=# select 'x' ~ 'hello world .*'; > -[ RECORD 1 ] > ?column? | f > > postgres=# select * from pg_backend_memory_contexts where name = > 'RegexpMemoryContext'; > -[ RECORD 1 ]-+------------------------- > name | RegexpMemoryContext > ident | hello world .* > parent | RegexpCacheMemoryContext > level | 2 > total_bytes | 13376 > total_nblocks | 5 Hm, if a trivial re uses 13kB, using ALLOCSET_SMALL_SIZES might actually increase memory usage by increasing the number of blocks. > free_bytes | 5144 > free_chunks | 8 > used_bytes | 8232 Hm. So we actually have a bunch of temporary allocations in here. I assume that's all the stuff from the "non-compact" representation that src/backend/regex/README talks about? I doesn't immedialy look trivial to use a separate memory context for the "final" representation and scratch memory though. > There's some more memory allocated in regc_pg_locale.c with raw > malloc() that could probably benefit from a pallocisation just to be > able to measure it, but I didn't touch that here. It might also effectively reduce the overhead of using palloc, by filling the context up further. > diff --git a/src/backend/regex/regcomp.c b/src/backend/regex/regcomp.c > index bb8c240598..c0f8e77b49 100644 > --- a/src/backend/regex/regcomp.c > +++ b/src/backend/regex/regcomp.c > @@ -2471,17 +2471,17 @@ rfree(regex_t *re) > /* > * rcancelrequested - check for external request to cancel regex operation > * > - * Return nonzero to fail the operation with error code REG_CANCEL, > - * zero to keep going > - * > - * The current implementation is Postgres-specific. If we ever get around > - * to splitting the regex code out as a standalone library, there will need > - * to be some API to let applications define a callback function for this. > + * The current implementation always returns 0, if CHECK_FOR_INTERRUPTS() > + * doesn't exit non-locally via ereport(). Memory allocated while compiling is > + * expected to be cleaned up by virtue of being allocated using palloc in a > + * suitable memory context. > */ > static int > rcancelrequested(void) > { > - return InterruptPending && (QueryCancelPending || ProcDiePending); > + CHECK_FOR_INTERRUPTS(); > + > + return 0; > } Hm. Seems confusing for this to continue being called rcancelrequested() and to be called via if(CANCEL_REQUESTED()), if we're not even documenting that it's intended to be usable that way? Seems at the minimum we ought to keep more of the old comment, to explain the somewhat odd API? > + /* Set up the cache memory on first go through. */ > + if (unlikely(RegexpCacheMemoryContext == NULL)) > + RegexpCacheMemoryContext = > + AllocSetContextCreate(TopMemoryContext, > + "RegexpCacheMemoryContext", > + ALLOCSET_SMALL_SIZES); I think it might be nicer to create this below CacheMemoryContext? Just so the "memory context tree" stays nicely ordered. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > Hm. Seems confusing for this to continue being called rcancelrequested() and > to be called via if(CANCEL_REQUESTED()), if we're not even documenting that > it's intended to be usable that way? Yeah. I'm not very happy with this line of development at all, because I think we are painting ourselves into a corner by not allowing code to detect whether a cancel is pending without having it happen immediately. (That is, I do not believe that backend/regex/ is the only code that will ever wish for that.) But if that is the direction we're going to go in, we should probably revise these APIs to make them less odd. I'm not sure why we'd keep the REG_CANCEL error code at all. > I think it might be nicer to create this below CacheMemoryContext? Meh ... CacheMemoryContext might not exist yet, especially for the use-cases in the login logic. regards, tom lane
Hi, On 2023-01-04 17:55:43 -0500, Tom Lane wrote: > I'm not very happy with this line of development at all, because I think we > are painting ourselves into a corner by not allowing code to detect whether > a cancel is pending without having it happen immediately. (That is, I do > not believe that backend/regex/ is the only code that will ever wish for > that.) I first wrote that this is hard to make work without introducing overhead (like a PG_TRY in rcancelrequested()), for a bunch of reasons discussed upthread (see [1]). But now I wonder if we didn't recently introduce most of the framework to make this less hard / expensive. What about using a version of errsave() that can save FATALs too? We could have something roughly like the ProcessInterrupts() in the proposed patch that is used from within rcancelrequested(). But instead of actually throwing the error, we'd just remember the to-be-thrown-later error, that the next "real" CFI would throw. That still leaves us with some increased likelihood of erroring out within the regex machinery, e.g. if there's an out-of-memory error within elog.c processing. But I'd not be too worried about leaking memory in that corner case. Which also could be closed using the approach in Thomas' patch, except that it normally would still return in rcancelrequested(). Insane? Greetings, Andres Freund [1] https://postgr.es/m/CA%2BhUKG%2BqtNxDQAzC20AnUxuigKYb%3D7shtmsuSyMekjni%3Dik6BA%40mail.gmail.com
On Thu, Jan 5, 2023 at 12:33 PM Andres Freund <andres@anarazel.de> wrote: > What about using a version of errsave() that can save FATALs too? We could > have something roughly like the ProcessInterrupts() in the proposed patch that > is used from within rcancelrequested(). But instead of actually throwing the > error, we'd just remember the to-be-thrown-later error, that the next > "real" CFI would throw. Right, I contemplated variations on that theme. I'd be willing to code something like that to kick the tyres, but it seems like it would make back-patching more painful? We're trying to fix bugs here... Deciding to proceed with #6 (palloc) wouldn't mean we can't eventually also implement two phase/soft CFI() when we have a potential user, so I don't really get the painted-into-a-corner argument. However, it's all moot if the #6 isn't good enough on its own merits independent of other hypothetical future users (eg if the per regex_t MemoryContext overheads are considered too high and can't be tuned acceptably).
Hi, On 2023-01-05 13:21:54 +1300, Thomas Munro wrote: > Right, I contemplated variations on that theme. I'd be willing to > code something like that to kick the tyres, but it seems like it would > make back-patching more painful? We're trying to fix bugs here... I think we need to accept that this mess can't be fixed in the back branches. I'd rather get a decent fix sometime in PG16 than a crufty fix in PG 17 that we then backpatch a while later. > Deciding to proceed with #6 (palloc) wouldn't mean we can't eventually > also implement two phase/soft CFI() when we have a potential user, so > I don't really get the painted-into-a-corner argument. I think that's a fair point. > However, it's all moot if the #6 isn't good enough on its own merits > independent of other hypothetical future users (eg if the per regex_t > MemoryContext overheads are considered too high and can't be tuned > acceptably). I'm not too worried about that, particularly because it looks like it'd not be too hard to lower the overhead further. Arguably allocating memory outside of mcxt.c is actually a bad thing independent of error handing, because it's effectively "invisible" to our memory-usage-monitoring facilities. Greetings, Andres Freund
On Thu, Jan 5, 2023 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Andres Freund <andres@anarazel.de> writes: > > Hm. Seems confusing for this to continue being called rcancelrequested() and > > to be called via if(CANCEL_REQUESTED()), if we're not even documenting that > > it's intended to be usable that way? > > Yeah. I'm not very happy with this line of development at all, > because I think we are painting ourselves into a corner by not allowing > code to detect whether a cancel is pending without having it happen > immediately. (That is, I do not believe that backend/regex/ is the > only code that will ever wish for that.) But if that is the direction > we're going to go in, we should probably revise these APIs to make them > less odd. I'm not sure why we'd keep the REG_CANCEL error code at all. Ah, OK. I had the impression from the way the code is laid out with a wall between "PostgreSQL" bits and "vendored library" bits that we might have some reason to want to keep that callback interface the same (ie someone else is using this code and we want to stay in sync?), but your reactions are a clue that maybe I imagined a requirement that doesn't exist.
Thomas Munro <thomas.munro@gmail.com> writes: > On Thu, Jan 5, 2023 at 11:55 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> ... But if that is the direction >> we're going to go in, we should probably revise these APIs to make them >> less odd. I'm not sure why we'd keep the REG_CANCEL error code at all. > Ah, OK. I had the impression from the way the code is laid out with a > wall between "PostgreSQL" bits and "vendored library" bits that we > might have some reason to want to keep that callback interface the > same (ie someone else is using this code and we want to stay in > sync?), but your reactions are a clue that maybe I imagined a > requirement that doesn't exist. The rcancelrequested API is something that I devised out of whole cloth awhile ago. It's not in Tcl's copy of the code, which AFAIK is the only other project using this regex engine. I do still have vague hopes of someday seeing the engine as a standalone project, which is why I'd prefer to keep a bright line between the engine and Postgres. But there's no very strong reason to think that any hypothetical future external users who need a cancel API would want this API as opposed to one that requires exit() or longjmp() to get out of the engine. So if we're changing the way we use it, I think it's perfectly reasonable to redesign that API to make it simpler and less of an impedance mismatch. regards, tom lane
On Thu, Jan 5, 2023 at 2:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > The rcancelrequested API is something that I devised out of whole cloth > awhile ago. It's not in Tcl's copy of the code, which AFAIK is the > only other project using this regex engine. I do still have vague > hopes of someday seeing the engine as a standalone project, which is > why I'd prefer to keep a bright line between the engine and Postgres. > But there's no very strong reason to think that any hypothetical future > external users who need a cancel API would want this API as opposed to > one that requires exit() or longjmp() to get out of the engine. So if > we're changing the way we use it, I think it's perfectly reasonable to > redesign that API to make it simpler and less of an impedance mismatch. Thanks for that background. Alright then, here's a new iteration exploring this direction. It gets rid of CANCEL_REQUESTED() -> REG_CANCEL and the associated error and callback function, and instead has just "INTERRUPT(re);" at those cancellation points, which is a macro that defaults to nothing (for Tcl's benefit). Our regcustom.h defines it as CHECK_FOR_INTERRUPTS(). I dunno if it's worth passing the "re" argument... I was imagining that someone who wants to free memory explicitly and then longjmp would probably need it? (It might even be possible to expand to something that sets an error and returns, not investigated.) Better name or design very welcome. Another decision is to use the no-OOM version of palloc. (Not explored: could we use throwing palloc with attribute returns_nonnull to teach GCC and Clang to prune the failure handling from generated regex code?) (As for STACK_TOO_DEEP(): why follow a function pointer, when it could be macro-only too? But that's getting off track.) I split the patch in two: memory and interrupts. I also found a place in contrib/pg_trgm that did no-longer-needed try/finally.
Attachment
On Sat, Jan 14, 2023 at 3:23 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Thu, Jan 5, 2023 at 2:14 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > The rcancelrequested API is something that I devised out of whole cloth > > awhile ago. It's not in Tcl's copy of the code, which AFAIK is the > > only other project using this regex engine. I do still have vague > > hopes of someday seeing the engine as a standalone project, which is > > why I'd prefer to keep a bright line between the engine and Postgres. > > But there's no very strong reason to think that any hypothetical future > > external users who need a cancel API would want this API as opposed to > > one that requires exit() or longjmp() to get out of the engine. So if > > we're changing the way we use it, I think it's perfectly reasonable to > > redesign that API to make it simpler and less of an impedance mismatch. > > Thanks for that background. Alright then, here's a new iteration > exploring this direction. It gets rid of CANCEL_REQUESTED() -> > REG_CANCEL and the associated error and callback function, and instead > has just "INTERRUPT(re);" at those cancellation points, which is a > macro that defaults to nothing (for Tcl's benefit). Our regcustom.h > defines it as CHECK_FOR_INTERRUPTS(). I dunno if it's worth passing > the "re" argument... I was imagining that someone who wants to free > memory explicitly and then longjmp would probably need it? (It might > even be possible to expand to something that sets an error and > returns, not investigated.) Better name or design very welcome. I think this experiment worked out pretty well. I think it's a nice side-effect that you can see what memory the regexp subsystem is using, and that's likely to lead to more improvements. (Why is it limited to caching 32 entries? Why is it a linear search, not a hash table? Why is LRU implemented with memmove() and not a list? Could we have a GUC regex_cache_memory, so someone who uses a lot of regexes can opt into a large cache?) On the other hand it also uses a bit more RAM, like other code using the reparenting trick, which is a topic for future research. I vote for proceeding with this approach. I wish we didn't have to tackle either a regexp interface/management change (done here) or a CFI() redesign (not done, but probably also a good idea for other reasons) before getting this signal stuff straightened out, but here we are. This approach seems good to me. Anyone have a different take?
Thomas Munro <thomas.munro@gmail.com> writes: > I think this experiment worked out pretty well. I think it's a nice > side-effect that you can see what memory the regexp subsystem is > using, and that's likely to lead to more improvements. (Why is it > limited to caching 32 entries? Why is it a linear search, not a hash > table? Why is LRU implemented with memmove() and not a list? Could > we have a GUC regex_cache_memory, so someone who uses a lot of regexes > can opt into a large cache?) On the other hand it also uses a bit > more RAM, like other code using the reparenting trick, which is a > topic for future research. > I vote for proceeding with this approach. I wish we didn't have to > tackle either a regexp interface/management change (done here) or a > CFI() redesign (not done, but probably also a good idea for other > reasons) before getting this signal stuff straightened out, but here > we are. This approach seems good to me. Anyone have a different > take? Sorry for not looking at this sooner. I am okay with the regex changes proposed in v5-0001 through 0003, but I think you need to take another mopup pass there. Some specific complaints: * header comment for pg_regprefix has been falsified (s/malloc/palloc/) * in spell.c, regex_affix_deletion_callback could be got rid of * check other callers of pg_regerror for now-useless CHECK_FOR_INTERRUPTS In general there's a lot of comments referring to regexes being malloc'd. I'm disinclined to change the ones inside the engine, because as far as it knows it is still using malloc, but maybe we should work harder on our own comments. In particular, it'd likely be useful to have something somewhere pointing out that pg_regfree is only needed when you can't get rid of the regex by context cleanup. Maybe write a short section about memory management in backend/regex/README? I've not really looked at 0004. regards, tom lane
On Tue, Apr 4, 2023 at 1:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Sorry for not looking at this sooner. I am okay with the regex > changes proposed in v5-0001 through 0003, but I think you need to > take another mopup pass there. Some specific complaints: > * header comment for pg_regprefix has been falsified (s/malloc/palloc/) Thanks. Fixed. > * in spell.c, regex_affix_deletion_callback could be got rid of Done in a separate patch. I wondered if regex_t should be included directly as a member of that union inside AFFIX, but decided it should keep using a pointer (just without the extra wrapper struct). A direct member would make the AFFIX slightly larger, and it would require us to assume that regex_t is movable which it probably actually is in practice I guess but that isn't written down anywhere and it seemed strange to rely on it. > * check other callers of pg_regerror for now-useless CHECK_FOR_INTERRUPTS I found three of these to remove (jsonpath_gram.y, varlena.c, test_regex.c). > In general there's a lot of comments referring to regexes being malloc'd. There is also some remaining direct use of malloc() in regc_pg_locale.c because "we mustn't lose control on out-of-memory". At that time (2012) there was no MCXT_NO_OOM (2015), so we could presumably bring that cache into an observable MemoryContext now too. I haven't written a patch for that, though, because it's not in the way of my recovery conflict mission. > I'm disinclined to change the ones inside the engine, because as far as > it knows it is still using malloc, but maybe we should work harder on > our own comments. In particular, it'd likely be useful to have something > somewhere pointing out that pg_regfree is only needed when you can't > get rid of the regex by context cleanup. Maybe write a short section > about memory management in backend/regex/README? I'll try to write something for the README tomorrow. Here's a new version of the code changes. > I've not really looked at 0004. I'm hoping to get just the regex changes in ASAP, and then take a little bit longer on the recovery conflict patch itself (v6-0005) on the basis that it's bugfix work and not subject to the feature freeze.
Attachment
On Sat, Apr 08, 2023 at 01:32:22AM +1200, Thomas Munro wrote: > I'm hoping to get just the regex changes in ASAP, and then take a > little bit longer on the recovery conflict patch itself (v6-0005) on > the basis that it's bugfix work and not subject to the feature freeze. Agreed. It would be good to check with the RMT, but as long as that's not at the middle/end of the beta cycle I guess that's OK for this one, even if it is only for HEAD. -- Michael
Attachment
Michael Paquier <michael@paquier.xyz> writes: > On Sat, Apr 08, 2023 at 01:32:22AM +1200, Thomas Munro wrote: >> I'm hoping to get just the regex changes in ASAP, and then take a >> little bit longer on the recovery conflict patch itself (v6-0005) on >> the basis that it's bugfix work and not subject to the feature freeze. > Agreed. It would be good to check with the RMT, but as long as that's > not at the middle/end of the beta cycle I guess that's OK for this > one, even if it is only for HEAD. Right. regex changes pass an eyeball check here. regards, tom lane
Here is a rebase over 26669757, which introduced PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT. I got a bit confused about why this new conflict reason didn't follow the usual ERROR->FATAL promotion rules and pinged Andres who provided: "Logical decoding slots are only acquired while performing logical decoding. During logical decoding no user controlled code is run. During [sub]transaction abort, the slot is released. Therefore user controlled code cannot intercept an error before the replication slot is released." That's included in a comment in the attached to explain the special treatment.
Attachment
On Sat, Aug 5, 2023 at 1:39 PM Thomas Munro <thomas.munro@gmail.com> wrote: > Here is a rebase over 26669757, which introduced > PROCSIG_RECOVERY_CONFLICT_LOGICALSLOT. Oops, please disregard v7 (somehow lost a precious line of code). V8 is better.