Re: Remove HeapTuple and Buffer dependency for predicate locking functions - Mailing list pgsql-hackers
From | Ashwin Agrawal |
---|---|
Subject | Re: Remove HeapTuple and Buffer dependency for predicate locking functions |
Date | |
Msg-id | CALfoeivQ6xJdwmgSfcP+_NjTh8yjgjoHqrRqpbXxG_fRaytU4Q@mail.gmail.com Whole thread Raw |
In response to | Re: Remove HeapTuple and Buffer dependency for predicate lockingfunctions (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Remove HeapTuple and Buffer dependency for predicate locking functions
Re: Remove HeapTuple and Buffer dependency for predicate locking functions |
List | pgsql-hackers |
On Wed, Jul 31, 2019 at 2:06 PM Andres Freund <andres@anarazel.de> wrote:
Looking at the code as of master, we currently have:
Super awesome feedback and insights, thank you!
- PredicateLockTuple() calls SubTransGetTopmostTransaction() to figure
out a whether the tuple has been locked by the current
transaction. That check afaict just should be
TransactionIdIsCurrentTransactionId(), without all the other
stuff that's done today.
Agree. v1-0002 patch attached does that now. Please let me know if that's what you meant.
TransactionIdIsCurrentTransactionId() imo ought to be optimized to
always check for the top level transactionid first - that's a good bet
today, but even moreso for the upcoming AMs that won't have separate
xids for subtransactions. Alternatively we shouldn't make that a
binary search for each subtrans level, but just have a small
simplehash hashtable for xids.
v1-0001 patch checks for GetTopTransactionIdIfAny() first in TransactionIdIsCurrentTransactionId() which seems yes better in general and more for future. That mostly meets the needs for current discussion.
The alternative of not using binary search seems bigger refactoring and should be handled as separate optimization exercise outside of this thread.
- CheckForSerializableConflictOut() wants to get the toplevel xid for
the tuple, because that's the one the predicate hashtable stores.
In your patch you've already moved the HTSV() call etc out of
CheckForSerializableConflictOut(). I'm somewhat inclined to think that
the SubTransGetTopmostTransaction() call ought to go along with that.
I don't really think that belongs in predicate.c, especially if
most/all new AMs don't use subtransaction ids.
The only downside is that currently the
TransactionIdEquals(xid, GetTopTransactionIdIfAny()) check
avoids the SubTransGetTopmostTransaction() check.
But again, the better fix for that seems to be to improve the generic
code. As written the check won't prevent a subtrans lookup for heap
when subtransactions are in use, and it's IME pretty common for tuples
to get looked at again in the transaction that has created them. So
I'm somewhat inclined to think that SubTransGetTopmostTransaction()
should have a fast-path for the current transaction - probably just
employing TransactionIdIsCurrentTransactionId().
That optimization, as Kuntal also mentioned, seems something which can be done on-top afterwards on current patch.
I don't really see what we gain by having the subtrans handling in the
predicate code. Especially given that we've already moved the HTSV()
handling out, it seems architecturally the wrong place to me - but I
admit that that's a fuzzy argument. The relevant mapping should be one
line in the caller.
Okay, I moved the sub transaction handling out of CheckForSerializableConflictOut() and have it along side HTSV() now.
The reason I felt leaving subtransaction handling in generic place, was it might be premature to thing no future AM will need it. Plus, all serializable function api's having same expectations is easier. Like PredicateLockTuple() can be passed top or subtransaction id and it can handle it but with the change CheckForSerializableConflictOut() only be feed top transaction ID. But its fine and can see the point of AM needing it can easily get top transaction ID and feed it as heap.
I wonder if it'd be wroth to combine the
TransactionIdIsCurrentTransactionId() calls in the heap cases that
currently do both, PredicateLockTuple() and
HeapCheckForSerializableConflictOut(). The heap_fetch() case probably
isn't commonly that hot a pathq, but heap_hot_search_buffer() is.
Maybe, will give thought to it separate from the current patch.
Minor notes:
- I don't think 'insert_xid' is necessarily great - it could also be the
updating xid etc. And while you can argue that an update is an insert
in the current heap, that's not the case for future AMs.
- to me
@@ -1621,7 +1622,8 @@ heap_hot_search_buffer(ItemPointer tid, Relation relation, Buffer buffer,
if (valid)
{
ItemPointerSetOffsetNumber(tid, offnum);
- PredicateLockTuple(relation, heapTuple, snapshot);
+ PredicateLockTID(relation, &(heapTuple)->t_self, snapshot,
+ HeapTupleHeaderGetXmin(heapTuple->t_data));
if (all_dead)
*all_dead = false;
return true;
What are those parens - as placed they can't do anything. Did you
intend to write &(heapTuple->t_self)? Even that is pretty superfluous,
but it at least clarifies the precedence.
Fixed. No idea what I was thinking there, mostly feel I intended to have it as like &(heapTuple->t_self).
I'm also a bit confused why we don't need to pass in the offset of the
current tuple, rather than the HOT root tuple here. That's not related
to this patch. But aren't we locking the wrong tuple here, in case of
HOT?
Yes, root is being locked here instead of the HOT. But I don't have full context on the same. If we wish to fix it though, can be easily done now with the patch by passing "tid" instead of &(heapTuple->t_self).
- I wonder if CheckForSerializableConflictOutNeeded() shouldn't have a
portion of it's code as a static inline. In particular, it's a shame
that we currently perform external function calls at quite the
frequency when serializable isn't even in use.
I understand that with refactor, HeapCheckForSerializableConflictOut() is called which calls CheckForSerializableConflictOutNeeded(). If that's the problem, for addressing the same, I had proposed alternative way to refactor. CheckForSerializableConflictOut() can take callback function and void* callback argument for AM specific check instead. So, the flow would be AM calling CheckForSerializableConflictOut() as today and only if serializable in use will invoke the callback to check with AM if more work should be performed or not. Essentially HeapCheckForSerializableConflictOut() will become callback function instead. Due to void* callback argument aspect I didn't like that solution and felt AM performing checks and calling CheckForSerializableConflictOut() seems more straight forward.
Attachment
pgsql-hackers by date: