Thread: Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Rahila Syed
Date:
Hi,
Please find the attached updated and rebased patch.
I have added a test in the test_dsa module that uses a function
to create a dsa area. This function is called after
the resowner->releasing is set to true, using an injection point.
I have added a test in the test_dsa module that uses a function
to create a dsa area. This function is called after
the resowner->releasing is set to true, using an injection point.
Thank you,
Rahila Syed
Rahila Syed
Attachment
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Daniel Gustafsson
Date:
> On 24 Mar 2025, at 20:31, Rahila Syed <rahilasyed90@gmail.com> wrote: > Please find the attached updated and rebased patch. Thanks for this rebase, as was mentioned in the other thread I plan to get this committed fairly soon. A few comments on the code, all performed in the attached v3. + else + { + /* Log failure of unpinning */ + elog(DEBUG2, "unable to unpin the segment %u as CurrentResourceOwner is NULL or releasing", + seg); + seg->resowner = NULL; + } I removed the elog() calls since I can't see it adding enough value, and the assignment to NULL can be removed as well since we've already asserted that seg->resowner is NULL. + INJECTION_POINT("dsa_create_on_res_release"); This feels like a name which limits its use to this one test, whereas it is a general purpose injection point. Renamed, and also moved to using dashes rather than underscore as the former is project style. +void +test_dsa_inj(const char *name, const void *private_data) Rather than duplicating the code I created an internal function for this test which can be called from the existing basic test as well as this new test. I also did a little bit of renaming to make it more readable. As it can only really be tested with an injection point I plan on only backpatching to 17 initially. Searching the archives I didn't find any mention of this bug ever being hit so it seems safe to let it prove itself in testable versions before going further back with it. -- Daniel Gustafsson
Attachment
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Rahila Syed
Date:
Hi Daniel,
Thank you for your review and code improvements.
Please find below some observations.
1. dsm_unpin_mapping(dsm_segment *seg)
+ if (CurrentResourceOwner && IsResourceOwnerReleasing(CurrentResourceOwner))
+ return;
Given that the function can return without setting resowner to a
CurrentResourceOwner which is not NULL, shall we change the function
signature to return true when "unpin" is successful and false when not?
This behavior existed earlier too, but adding the check has made it explicit.
Although this function is not currently in use, it would be beneficial to make
the API more verbose.
2. If value of IsResourceOwnerReleasing changes between dsm_create_descriptor
and attach_internal, the dsm segment and dsa area will end up with different resource owners.
Earlier the chances of CurrentResourceOwner changing between the two functions were zero.
May be something can be done to keep resowner assignments under both these functions
in sync.
Thank you,
Rahila Syed
Please find below some observations.
1. dsm_unpin_mapping(dsm_segment *seg)
+ if (CurrentResourceOwner && IsResourceOwnerReleasing(CurrentResourceOwner))
+ return;
Given that the function can return without setting resowner to a
CurrentResourceOwner which is not NULL, shall we change the function
signature to return true when "unpin" is successful and false when not?
This behavior existed earlier too, but adding the check has made it explicit.
Although this function is not currently in use, it would be beneficial to make
the API more verbose.
2. If value of IsResourceOwnerReleasing changes between dsm_create_descriptor
and attach_internal, the dsm segment and dsa area will end up with different resource owners.
Earlier the chances of CurrentResourceOwner changing between the two functions were zero.
May be something can be done to keep resowner assignments under both these functions
in sync.
Thank you,
Rahila Syed
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Daniel Gustafsson
Date:
> On 11 Apr 2025, at 21:08, Rahila Syed <rahilasyed90@gmail.com> wrote: > > Hi Daniel, > > Thank you for your review and code improvements. > > Please find below some observations. > > 1. dsm_unpin_mapping(dsm_segment *seg) > + if (CurrentResourceOwner && IsResourceOwnerReleasing(CurrentResourceOwner)) > + return; > > Given that the function can return without setting resowner to a > CurrentResourceOwner which is not NULL, shall we change the function > signature to return true when "unpin" is successful and false when not? Maybe, but at this point in the cycle we cannot change the prototype like that. Food for thought for v19 though. > 2. If value of IsResourceOwnerReleasing changes between dsm_create_descriptor > and attach_internal, the dsm segment and dsa area will end up with different resource owners. > Earlier the chances of CurrentResourceOwner changing between the two functions were zero. > May be something can be done to keep resowner assignments under both these functions > in sync. Hmm, that's a good question, not sure. Do you have a repro for this handy? Attached is a current v4 with a few small tweaks. -- Daniel Gustafsson
Attachment
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Robert Haas
Date:
On Wed, Apr 30, 2025 at 5:24 PM Daniel Gustafsson <daniel@yesql.se> wrote: > Attached is a current v4 with a few small tweaks. Sorry to turn up late here, but I strongly disagree with the notion that this is a bug in the DSM or DSA code. It seems to me that it is the caller's responsibility to provide a valid resource owner, not the job of the called code to ignore the resource owner when it's unusable. I suspect that there are many other parts of the system that rely on the ResourceOwner machinery which likewise assume that the ResourceOwner that they are passed is valid. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Michael Paquier
Date:
On Wed, Apr 30, 2025 at 06:03:49PM -0400, Robert Haas wrote: > Sorry to turn up late here, but I strongly disagree with the notion > that this is a bug in the DSM or DSA code. It seems to me that it is > the caller's responsibility to provide a valid resource owner, not the > job of the called code to ignore the resource owner when it's > unusable. I suspect that there are many other parts of the system that > rely on the ResourceOwner machinery which likewise assume that the > ResourceOwner that they are passed is valid. Yeah, dshash would be one, I think. It feels to me that if you want to enforce this kind of policy to be checked, this is something that should be done in the shape of one or more assertion based the state of the resource owner expected in these low-level paths rather than tweaking the DSA and DSM code to do what you are expecting here, and only enforce such new policies on HEAD to avoid disruption with existing systems. I'm actually rather scared of the patch, isn't there a risk of breaking existing patterns that worked out of the box by forcing the resowner to not be set? My spidey sense tingles when I see such patterns, because this is enforcing assumptions directly hidden to the callers. -- Michael
Attachment
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Daniel Gustafsson
Date:
> On 1 May 2025, at 00:04, Robert Haas <robertmhaas@gmail.com> wrote: > > On Wed, Apr 30, 2025 at 5:24 PM Daniel Gustafsson <daniel@yesql.se> wrote: >> Attached is a current v4 with a few small tweaks. > > Sorry to turn up late here, but I strongly disagree with the notion > that this is a bug in the DSM or DSA code. It seems to me that it is > the caller's responsibility to provide a valid resource owner, not the > job of the called code to ignore the resource owner when it's > unusable. I suspect that there are many other parts of the system that > rely on the ResourceOwner machinery which likewise assume that the > ResourceOwner that they are passed is valid. Thanks for review, I’m currently travelling over the extended weekend but will rework the approach shortly when back at theoffice. ./daniel
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Robert Haas
Date:
On Wed, Apr 30, 2025 at 6:56 PM Michael Paquier <michael@paquier.xyz> wrote: > I'm actually rather scared of the patch, isn't there a risk of > breaking existing patterns that worked out of the box by forcing the > resowner to not be set? My spidey sense tingles when I see such > patterns, because this is enforcing assumptions directly hidden to the > callers. I'm not worried about this particular problem. We use resowner == NULL in various places to mean "for the lifetime of the backend," which makes a lot of sense if you think about it: the job of the resowner is to release resources when the resowner is cleaned up; if you don't ever want to release the resources, then you don't need a resowner. What I'm concerned about is that I think that (as I said on the other thread) is that ProcessGetMemoryContextInterrupt is not really at all safe to execute at an arbitrary CHECK_FOR_INTERRUPTS(). In other places where we use resource owners, we don't run into the problem that happened here because we are careful about the timing of our resource owner operations. You can see that caution in, for example, AbortTransaction() and AbortSubTransaction(), where we make sure to close down higher-level subsystems first, so that when we get to lower-level cleanup, we know that no more resources are going to be allocated afterward. But here, we are executing complex operations that require a "clean" state at pretty much any point in the code, where we don't actually know that things are sane enough for this code to execute successfully. This particular error is just a symptom of that more general problem. In my mind, the possible fixes here are (1) revert that patch, (2) redesign things so that the problematic code can only get called when we know that the backend state is sane, or (3) redesign the code so that it has fewer requirements for correct operation. Regarding (3), this particular problem could maybe be solved by not relying on the resowner machinery at all: we always want the DSA to be pinned, so if we could create it pre-pinned rather than pinning it as an after-the-fact step, we wouldn't need the resowner at all. Except that I don't really think that would be entirely safe, because that seems to open up the possibility that we'd just leak the DSA area entirely: say you enter dsa_create with no resowner and then fail someplace inside that function after you've allocated resources. There's nothing to clean up those resources, but since you never returned the DSA handle, it's not stored anywhere, so those resources have just been leaked. I think this is actually why the interface works the way it does. Maybe there's some way to fix it by creating a temporary resource owner that is used by just this code ... but in general I'm skeptical that we can really set up something that is OK to do in an aborted transaction, because our ability to handle any further errors at that point is extremely limited, and this code is definitely complex enough that it could error out. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Rahila Syed
Date:
Hi,
On Thu, May 1, 2025 at 4:26 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Apr 30, 2025 at 06:03:49PM -0400, Robert Haas wrote:
> Sorry to turn up late here, but I strongly disagree with the notion
> that this is a bug in the DSM or DSA code. It seems to me that it is
> the caller's responsibility to provide a valid resource owner, not the
> job of the called code to ignore the resource owner when it's
> unusable. I suspect that there are many other parts of the system that
> rely on the ResourceOwner machinery which likewise assume that the
> ResourceOwner that they are passed is valid.
Yeah, dshash would be one, I think. It feels to me that if you want
to enforce this kind of policy to be checked, this is something that
should be done in the shape of one or more assertion based the state
of the resource owner expected in these low-level paths rather than
tweaking the DSA and DSM code to do what you are expecting here, and
only enforce such new policies on HEAD to avoid disruption with
existing systems.
I believe it would be particularly useful to add an assertion in dsm_unpin_mapping().
This function relies on CurrentResourceOwner being non-NULL and not releasing to
successfully unpin a mapping.
This function relies on CurrentResourceOwner being non-NULL and not releasing to
successfully unpin a mapping.
I'm actually rather scared of the patch, isn't there a risk of
breaking existing patterns that worked out of the box by forcing the
resowner to not be set?
In this context, resowner not being set means that the dsm segment
would remain attached until the session ends or until it is explicitly
detached. IIUC, the key difference is that a segment will stay mapped
for longer than expected in cases where the mapping was created
when a transaction was aborting.
Thank you for the review comments, it makes sense to include this
check in the callers of the DSA API.
Wrapping the dsa_create/dsa_attach call in the following snippet helps
resolve the issue in case of ProcessGetMemoryContextInterrupt.
+ ResourceOwner current_owner;
+ bool res_releasing = false;
+
+ if (CurrentResourceOwner && IsResourceOwnerReleasing(CurrentResourceOwner))
+ {
+ current_owner = CurrentResourceOwner;
+ CurrentResourceOwner = NULL;
+ res_releasing = true;
+ }
+
MemoryStatsDsaArea = dsa_create(memCxtArea->lw_lock.tranche);
+ if (res_releasing)
+ CurrentResourceOwner = current_owner;
Kindly let me know your views.
Thank you,
Rahila Syed
would remain attached until the session ends or until it is explicitly
detached. IIUC, the key difference is that a segment will stay mapped
for longer than expected in cases where the mapping was created
when a transaction was aborting.
Thank you for the review comments, it makes sense to include this
check in the callers of the DSA API.
Wrapping the dsa_create/dsa_attach call in the following snippet helps
resolve the issue in case of ProcessGetMemoryContextInterrupt.
+ ResourceOwner current_owner;
+ bool res_releasing = false;
+
+ if (CurrentResourceOwner && IsResourceOwnerReleasing(CurrentResourceOwner))
+ {
+ current_owner = CurrentResourceOwner;
+ CurrentResourceOwner = NULL;
+ res_releasing = true;
+ }
+
MemoryStatsDsaArea = dsa_create(memCxtArea->lw_lock.tranche);
+ if (res_releasing)
+ CurrentResourceOwner = current_owner;
Kindly let me know your views.
Thank you,
Rahila Syed
Re: Prevent an error on attaching/creating a DSM/DSA from an interrupt handler.
From
Mihail Nikalayeu
Date:
Hello, everyone. Not sure if it helps but encountered the same problem in relation to injection points: https://www.postgresql.org/message-id/flat/CANtu0oiTgFW47QgpTwrMOVm3Bq4N0Y5bjvTy5sP0gYWLQuVgjw%40mail.gmail.com