Re: POC: Lock updated tuples in tuple_update() and tuple_delete() - Mailing list pgsql-hackers
From | Alexander Korotkov |
---|---|
Subject | Re: POC: Lock updated tuples in tuple_update() and tuple_delete() |
Date | |
Msg-id | CAPpHfdtwKb5UVXKkDQZYW8nQCODy0fY_S7mV5Z+cg7urL=zDEA@mail.gmail.com Whole thread Raw |
In response to | Re: POC: Lock updated tuples in tuple_update() and tuple_delete() (Andres Freund <andres@anarazel.de>) |
Responses |
Re: POC: Lock updated tuples in tuple_update() and tuple_delete()
|
List | pgsql-hackers |
Hi, Andres. Thank you for your review. Sorry for the late reply. I took some time for me to figure out how to revise the patch. The revised patchset is attached. I decided to split the patch into two: 1) Avoid re-fetching the "original" row version during update and delete. 2) Save the efforts by re-using existing context of tuple_update()/tuple_delete() for locking the tuple. They are two separate optimizations. So let's evaluate their performance separately. On Tue, Jan 10, 2023 at 4:07 AM Andres Freund <andres@anarazel.de> wrote: > I'm a bit worried that this is optimizing the rare case while hurting the > common case. See e.g. my point below about creating additional slots in the > happy path. This makes sense. It worth to allocate the slot only if we're going to store a tuple there. I implemented this by passing a callback for slot allocation instead of the slot. > It's also not clear that change is right directionally. If we want to avoid > re-fetching the "original" row version, why don't we provide that > functionality via table_tuple_lock()? These are two distinct optimizations. Now, they come as two distinct patches. > On 2023-01-09 13:29:18 +0300, Alexander Korotkov wrote: > > @@ -53,6 +53,12 @@ static bool SampleHeapTupleVisible(TableScanDesc scan, Buffer buffer, > > HeapTuple tuple, > > OffsetNumber tupoffset); > > > > +static TM_Result heapam_tuple_lock_internal(Relation relation, ItemPointer tid, > > + Snapshot snapshot, TupleTableSlot*slot, > > + CommandId cid, LockTupleMode mode, > > + LockWaitPolicy wait_policy, uint8flags, > > + TM_FailureData *tmfd, bool updated); > > + > > static BlockNumber heapam_scan_get_blocks_done(HeapScanDesc hscan); > > > > static const TableAmRoutine heapam_methods; > > @@ -299,14 +305,39 @@ heapam_tuple_complete_speculative(Relation relation, TupleTableSlot *slot, > > static TM_Result > > heapam_tuple_delete(Relation relation, ItemPointer tid, CommandId cid, > > Snapshot snapshot, Snapshot crosscheck, bool wait, > > - TM_FailureData *tmfd, bool changingPart) > > + TM_FailureData *tmfd, bool changingPart, > > + TupleTableSlot *lockedSlot) > > { > > + TM_Result result; > > + > > /* > > * Currently Deleting of index tuples are handled at vacuum, in case if > > * the storage itself is cleaning the dead tuples by itself, it is the > > * time to call the index tuple deletion also. > > */ > > - return heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart); > > + result = heap_delete(relation, tid, cid, crosscheck, wait, tmfd, changingPart); > > + > > + /* > > + * If the tuple has been concurrently updated, get lock already so that on > > + * retry it will succeed, provided that the caller asked to do this by > > + * providing a lockedSlot. > > + */ > > + if (result == TM_Updated && lockedSlot != NULL) > > + { > > + result = heapam_tuple_lock_internal(relation, tid, snapshot, > > + lockedSlot, cid, LockTupleExclusive, > > + LockWaitBlock, > > + TUPLE_LOCK_FLAG_FIND_LAST_VERSION, > > + tmfd, true); > > You're ignoring the 'wait' parameter here, no? I think the modification to > heapam_tuple_update() has the same issue. Yep. I didn't catch this, because currently we also call tuple_update()/tuple_delete() with wait == true. Fixed. > > + if (result == TM_Ok) > > + { > > + tmfd->traversed = true; > > + return TM_Updated; > > + } > > + } > > + > > + return result; > > Doesn't this mean that the caller can't easily distinguish between > heapam_tuple_delete() and heapam_tuple_lock_internal() returning a failure > state? Exactly. But currently nodeModifyTable.c handles these failure states in the similar way. And I don't see why it should be different in future. > > @@ -350,213 +402,8 @@ heapam_tuple_lock(Relation relation, ItemPointer tid, Snapshot snapshot, > > LockWaitPolicy wait_policy, uint8 flags, > > TM_FailureData *tmfd) > > { > > Moving the entire body of the function around, makes it harder to review > this change, because the code movement is intermingled with "actual" changes. OK, fixed. > > +/* > > + * This routine does the work for heapam_tuple_lock(), but also support > > + * `updated` to re-use the work done by heapam_tuple_update() or > > + * heapam_tuple_delete() on fetching tuple and checking its visibility. > > + */ > > +static TM_Result > > +heapam_tuple_lock_internal(Relation relation, ItemPointer tid, Snapshot snapshot, > > + TupleTableSlot *slot, CommandId cid, LockTupleMode mode, > > + LockWaitPolicy wait_policy, uint8 flags, > > + TM_FailureData *tmfd, bool updated) > > +{ > > + BufferHeapTupleTableSlot *bslot = (BufferHeapTupleTableSlot *) slot; > > + TM_Result result; > > + Buffer buffer = InvalidBuffer; > > + HeapTuple tuple = &bslot->base.tupdata; > > + bool follow_updates; > > + > > + follow_updates = (flags & TUPLE_LOCK_FLAG_LOCK_UPDATE_IN_PROGRESS) != 0; > > + tmfd->traversed = false; > > + > > + Assert(TTS_IS_BUFFERTUPLE(slot)); > > + > > +tuple_lock_retry: > > + tuple->t_self = *tid; > > + if (!updated) > > + result = heap_lock_tuple(relation, tuple, cid, mode, wait_policy, > > + follow_updates, &buffer, tmfd); > > + else > > + result = TM_Updated; > > To make sure I understand: You're basically trying to have > heapam_tuple_lock_internal() work as before, except that you want to omit > fetching the first row version, assuming that the caller already tried to lock > it? > > I think at the very this needs an assert verifying that the slot actually > contains a tuple in the "updated" path. This part was re-written. > > + if (result == TM_Updated && > > + (flags & TUPLE_LOCK_FLAG_FIND_LAST_VERSION)) > > + { > > + if (!updated) > > + { > > + /* Should not encounter speculative tuple on recheck */ > > + Assert(!HeapTupleHeaderIsSpeculative(tuple->t_data)); > > Why shouldn't this be checked in the updated case as well? > > > > @@ -1490,7 +1492,16 @@ ExecDelete(ModifyTableContext *context, > > * transaction-snapshot mode transactions. > > */ > > ldelete: > > - result = ExecDeleteAct(context, resultRelInfo, tupleid, changingPart); > > + > > + /* > > + * Ask ExecDeleteAct() to immediately place the lock on the updated > > + * tuple if we will need EvalPlanQual() in that case to handle it. > > + */ > > + if (!IsolationUsesXactSnapshot()) > > + slot = ExecGetReturningSlot(estate, resultRelInfo); > > + > > + result = ExecDeleteAct(context, resultRelInfo, tupleid, changingPart, > > + slot); > > I don't like that 'slot' is now used for multiple things. I think this could > best be addressed by simply moving the slot variable inside the blocks using > it. And here it should be named more accurately. I didn't do that refactoring. But now editing introduced by the 1st patch of the set are more granular and doesn't affect usage of the 'slot' variable. > Is there a potential conflict with other uses of the ExecGetReturningSlot()? Yep. The current revision evades this random usage of slots. > Given that we now always create the slot, doesn't this increase the overhead > for the very common case of not needing EPQ? We'll create unnecessary slots > all the time, no? Yes, this is addressed by allocating EPQ slot only once it is needed via callback. I'm thinking about wrapping this into some abstraction called 'LazySlot'. > > */ > > EvalPlanQualBegin(context->epqstate); > > inputslot = EvalPlanQualSlot(context->epqstate, resultRelationDesc, > > resultRelInfo->ri_RangeTableIndex); > > + ExecCopySlot(inputslot, slot); > > > - result = table_tuple_lock(resultRelationDesc, tupleid, > > - estate->es_snapshot, > > - inputslot, estate->es_output_cid, > > - LockTupleExclusive, LockWaitBlock, > > - TUPLE_LOCK_FLAG_FIND_LAST_VERSION, > > - &context->tmfd); > > + Assert(context->tmfd.traversed); > > + epqslot = EvalPlanQual(context->epqstate, > > + resultRelationDesc, > > + resultRelInfo->ri_RangeTableIndex, > > + inputslot); > > The only point of using EvalPlanQualSlot() is to avoid copying the tuple from > one slot to another. Given that we're not benefiting from that anymore (due to > your manual ExecCopySlot() call), it seems we could just pass 'slot' to > EvalPlanQual() and not bother with EvalPlanQualSlot(). This makes sense. Now, usage pattern of the slots is more clear. > > @@ -1449,6 +1451,8 @@ table_multi_insert(Relation rel, TupleTableSlot **slots, int nslots, > > * tmfd - filled in failure cases (see below) > > * changingPart - true iff the tuple is being moved to another partition > > * table due to an update of the partition key. Otherwise, false. > > + * lockedSlot - slot to save the locked tuple if should lock the last row > > + * version during the concurrent update. NULL if not needed. > > The grammar in the new comments is off ("if should lock"). > > I think this is also needs to mention that this *significantly* changes the > behaviour of table_tuple_delete(). That's not at all clear from the comment. Let's see the performance results for the patchset. I'll properly revise the comments if results will be good. Pavel, could you please re-run your tests over revised patchset? ------ Regards, Alexander Korotkov
Attachment
pgsql-hackers by date: