Re: Remove HeapTuple and Buffer dependency for predicate locking functions - Mailing list pgsql-hackers
From | Thomas Munro |
---|---|
Subject | Re: Remove HeapTuple and Buffer dependency for predicate locking functions |
Date | |
Msg-id | CA+hUKGLF+iE3MVR6LtenbTu4Y_CYbxRZrVzQMSgdzu8stm2YKQ@mail.gmail.com Whole thread Raw |
In response to | Re: Remove HeapTuple and Buffer dependency for predicate lockingfunctions (Andres Freund <andres@anarazel.de>) |
Responses |
Re: Remove HeapTuple and Buffer dependency for predicate locking functions
Re: Remove HeapTuple and Buffer dependency for predicate lockingfunctions |
List | pgsql-hackers |
On Tue, Aug 6, 2019 at 4:35 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-08-05 20:58:05 +1200, Thomas Munro wrote: > > 1. Commit dafaa3efb75 "Implement genuine serializable isolation > > level." (2011) locked the root tuple, because it set t_self to *tid. > > Possibly due to confusion about the effect of the preceding line > > ItemPointerSetOffsetNumber(tid, offnum). > > > > 2. Commit commit 81fbbfe3352 "Fix bugs in SSI tuple locking." (2013) > > fixed that by adding ItemPointerSetOffsetNumber(&heapTuple->t_self, > > offnum). > > Hm. It's not at all sure that it's ok to report the non-root tuple tid > here. I.e. I'm fairly sure there was a reason to not just set it to the > actual tid. I think I might have written that up on the list at some > point. Let me dig in memory and list. Obviously possible that that was > also obsoleted by parallel changes. Adding Heikki and Kevin. I haven't found your earlier discussion about that yet, but would be keen to read it if you can find it. I wondered if your argument might have had something to do with heap pruning, but I can't find a problem there. It's not as though the TID of any visible tuples change, it's just that some dead stuff goes away and the root line pointer is changed to LP_REDIRECT if it wasn't already. As for the argument for locking the tuple we emit rather than the HOT root, I think the questions are: (1) How exactly do we get away with locking only one version in a regular non-HOT update chain? (2) Is it OK to do that with a HOT root? The answer to the first question is given in README-SSI[1]. Unfortunately it doesn't discuss HOT directly, but I suspect the answer is no, HOT is not special here. By my reading, it relies on the version you lock being the version visible to your snapshot, which is important because later updates have to touch that tuple to write the next version. That doesn't apply to some arbitrarily older tuple that happens to be a HOT root. Concretely, heap_update() does CheckForSerializableConflictIn(relation, &oldtup, buffer), which is only going to produce a rw conflict if T1 took an SIREAD on precisely the version T2 locks in that path, not some arbitrarily older version that happens to be a HOT root. A HOT root might never be considered again by concurrent writers, no? As a minor consequence, the optimisation in CheckTargetForConflictsIn() assumes that a tuple being updated has the same tag as we locked when reading the tuple, which isn't the case if we locked the root while reading but now have the TID for the version we actually read, so in master we leak a tuple lock unnecessarily until end-of-transaction when we update a HOT tuple. > > This must be in want of an isolation test, but I haven't yet tried to > > get my head around how to write a test that would show the difference. > > Indeed. One practical problem is that the only way to reach PredicateLockTuple() is from an index scan, and the index scan locks the index page (or the whole index, depending on rd_indam->ampredlocks). So I think if you want to see a serialization anomaly you'll need multiple indexes (so that index page locks don't hide the problem), a long enough HOT chain and then probably several transactions to be able to miss a cycle that should be picked up by the logic in [1]. I'm out of steam for this problem today though. The simple test from the report[3] that resulted in commit 81fbbfe3352 doesn't work for me (ie with permutation "r1" "r2" "w1" "w2" "c1" "c2" twice in a row). The best I've come up with so far is an assertion that we predicate-lock the same row version that we emitted to the user, when reached via an index lookup that visits a HOT row. The test outputs 'f' for master, but 't' with the change to heapam.c. [1] Excerpt from README-SSI: === * PostgreSQL does not use "update in place" with a rollback log for its MVCC implementation. Where possible it uses "HOT" updates on the same page (if there is room and no indexed value is changed). For non-HOT updates the old tuple is expired in place and a new tuple is inserted at a new location. Because of this difference, a tuple lock in PostgreSQL doesn't automatically lock any other versions of a row. We don't try to copy or expand a tuple lock to any other versions of the row, based on the following proof that any additional serialization failures we would get from that would be false positives: o If transaction T1 reads a row version (thus acquiring a predicate lock on it) and a second transaction T2 updates that row version (thus creating a rw-conflict graph edge from T1 to T2), must a third transaction T3 which re-updates the new version of the row also have a rw-conflict in from T1 to prevent anomalies? In other words, does it matter whether we recognize the edge T1 -> T3? o If T1 has a conflict in, it certainly doesn't. Adding the edge T1 -> T3 would create a dangerous structure, but we already had one from the edge T1 -> T2, so we would have aborted something anyway. (T2 has already committed, else T3 could not have updated its output; but we would have aborted either T1 or T1's predecessor(s). Hence no cycle involving T1 and T3 can survive.) o Now let's consider the case where T1 doesn't have a rw-conflict in. If that's the case, for this edge T1 -> T3 to make a difference, T3 must have a rw-conflict out that induces a cycle in the dependency graph, i.e. a conflict out to some transaction preceding T1 in the graph. (A conflict out to T1 itself would be problematic too, but that would mean T1 has a conflict in, the case we already eliminated.) o So now we're trying to figure out if there can be an rw-conflict edge T3 -> T0, where T0 is some transaction that precedes T1. For T0 to precede T1, there has to be some edge, or sequence of edges, from T0 to T1. At least the last edge has to be a wr-dependency or ww-dependency rather than a rw-conflict, because T1 doesn't have a rw-conflict in. And that gives us enough information about the order of transactions to see that T3 can't have a rw-conflict to T0: - T0 committed before T1 started (the wr/ww-dependency implies this) - T1 started before T2 committed (the T1->T2 rw-conflict implies this) - T2 committed before T3 started (otherwise, T3 would get aborted because of an update conflict) o That means T0 committed before T3 started, and therefore there can't be a rw-conflict from T3 to T0. o So in all cases, we don't need the T1 -> T3 edge to recognize cycles. Therefore it's not necessary for T1's SIREAD lock on the original tuple version to cover later versions as well. === [2] https://www.postgresql.org/message-id/52527E4D.4060302%40vmware.com [3] https://www.postgresql.org/message-id/flat/523C29A8.20904%40vmware.com -- Thomas Munro https://enterprisedb.com
Attachment
pgsql-hackers by date: