Re: Potential G2-item cycles under serializable isolation - Mailing list pgsql-bugs
From | Kyle Kingsbury |
---|---|
Subject | Re: Potential G2-item cycles under serializable isolation |
Date | |
Msg-id | e1c7919d-2dc5-4099-e825-9bda876f2e82@jepsen.io Whole thread Raw |
In response to | Re: Potential G2-item cycles under serializable isolation (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: Potential G2-item cycles under serializable isolation
|
List | pgsql-bugs |
On 6/4/20 8:20 PM, Peter Geoghegan wrote: > Can you explain the anomaly with reference to the actual SQL queries > executed in the log? Is the information that I've provided sufficient? Elle automatically explains these anomalies, so it might be easier to interpret with the visualization and/or interpretation of the anomaly--you'll find those in store/latest/elle/. Looking at the data structure representation of the anomaly, here's what I see in jepsen-failure-2.txt. I'm looking at the first (and only) anomaly in :workload/:anomalies/:G2-item. {:cycle [{:type :ok, :f :txn, :value [[:r 100 nil] [:append 101 1] [:r 101 [1]] [:r 96 [1 2]]], :time 750303448, :process 23, :index 727} {:type :ok, :f :txn, :value [[:append 100 1] [:r 99 nil] [:r 101 nil] [:append 4 4]], :time 751922048, :process 5, :index 729} {:type :ok, :f :txn, :value [[:r 100 nil] [:append 101 1] [:r 101 [1]] [:r 96 [1 2]]], :time 750303448, :process 23, :index 727}], :steps ({:type :rw, :key 100, :value :elle.list-append/init, :value' 1, :a-mop-index 0, :b-mop-index 0} {:type :rw, :key 101, :value :elle.list-append/init, :value' 1, :a-mop-index 2, :b-mop-index 1}), :type :G2-item} We have a cycle involving two transactions (:cycle)--the first transaction is shown at the end as well to make it easier to see that it closes the loop. Let's call them T1 (:index 727) and T2 (index :729). The first step (:steps) is an anti-dependency: T1 read key 100 and observed the initial state ([:r 100 nil]), but T2 appended 1 to 100, so we know T1 < T2. The second step is also an anti-dependency: T2 read key 101 and observed the initial state ([:r 101 nil]). Both of these transactions completed successfully (:type :ok), which means we have two committed transactions, both of which failed to observe the other's writes. These can't be serializable, because if we choose T1 < T2, T2 fails to observe a write that T1 performed, and if we choose T2 < T1, T1 fails to observe a write that T2 performed. You'll probably see something similar to this explanation in store/latest/elle/G2-item.txt, and a corresponding diagram in store/latest/elle/G2-item/0.svg. Elle infers, from the presence of G2-item, that this history cannot satisfy repeatable read (:not #{:repeatable-read}). By extension, it knows (:also-not) that it can't be serializable, strict serializable, or strong session serializable either. Now, to the SQL. T1 was performed by process 23, and T2 by process 5--you've helpfully restricted the postgres logs to just those two processes, so you're one step ahead of me. I searched for $1 = '100' to find the read of key 100 from process 23: 1591312493.044 171762 0 BEGIN 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: BEGIN 1591312493.044 171762 0 SELECT 5ed9806c.29ef2 jepsen process 23 DETAIL: parameters: $1 = '100' 1591312493.044 171762 0 SELECT 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: select (val) from txn2 where sk = $1 So, process 23 begins a transaction and reads key 100 from table txn2, using the secondary key `sk`. This is new, BTW--earlier versions of the test did every operation by primary key; I've been making the test harder over time. That read returns no rows: [:r 100 nil]. Process 23, at least in this log, goes on to set the session to SERIALIZABLE. This is a bit weird, because that statement should be executed by JDBC when we started the transaction, right? Makes me wonder if maybe the log lines are out of order? Now that I look... these are suspiciously alphabetically ordered. If you sorted the file, that might explain it. :) 1591312493.044 171762 0 SET 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE 1591312493.044 171762 0 SHOW 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL Next (in this sorted log, but not in reality), process 23 appends 1 to key 101, in table txn0: 1591312493.045 171762 0 INSERT 5ed9806c.29ef2 jepsen process 23 DETAIL: parameters: $1 = '101', $2 = '101', $3 = '1' 1591312493.045 171762 0 INSERT 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: insert into txn0 (id, sk, val) values ($1, $2, $3) There should have been an update first, but it got sorted later in this file. Here's the update--this must have happened first, failed, and caused process 23 to back off to an insert. 1591312493.045 171762 0 SAVEPOINT 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: savepoint upsert 1591312493.045 171762 0 UPDATE 5ed9806c.29ef2 jepsen process 23 DETAIL: parameters: $1 = '1', $2 = '101' 1591312493.045 171762 0 UPDATE 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: update txn0 set val = CONCAT(val, ',', $1) where id = $2 Next, process 5 begins its transaction: 1591312493.046 171734 0 BEGIN 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: BEGIN 1591312493.046 171734 0 SET 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: SET SESSION CHARACTERISTICS AS TRANSACTION ISOLATION LEVEL SERIALIZABLE 1591312493.046 171734 0 SHOW 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: SHOW TRANSACTION ISOLATION LEVEL ... and tries to append 1 to key 100 by primary key `id`: 1591312493.046 171734 0 UPDATE 5ed9806c.29ed6 jepsen process 5 DETAIL: parameters: $1 = '1', $2 = '100' 1591312493.046 171734 0 UPDATE 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: update txn2 set val = CONCAT(val, ',', $1) where id = $2 Process 23 goes on to release its upsert savepoint, and perform reads of key 101 and 96, observing [1] and [1 2] respectively, before committing. Again, out-of-order log due to sorting. 1591312493.046 171762 1197960 COMMIT 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: COMMIT 1591312493.046 171762 1197960 RELEASE 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: release savepoint upsert 1591312493.046 171762 1197960 SELECT 5ed9806c.29ef2 jepsen process 23 DETAIL: parameters: $1 = '101' 1591312493.046 171762 1197960 SELECT 5ed9806c.29ef2 jepsen process 23 DETAIL: parameters: $1 = '96' 1591312493.046 171762 1197960 SELECT 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: select (val) from txn0 where id = $1 1591312493.046 171762 1197960 SELECT 5ed9806c.29ef2 jepsen process 23 LOG: execute <unnamed>: select (val) from txn0 where sk = $1 Process 5's update affects zero rows, so it similarly backs off to an insert statement: 1591312493.047 171734 0 INSERT 5ed9806c.29ed6 jepsen process 5 DETAIL: parameters: $1 = '100', $2 = '100', $3 = '1' 1591312493.047 171734 0 INSERT 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: insert into txn2 (id, sk, val) values ($1, $2, $3) 1591312493.047 171734 0 SAVEPOINT 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: savepoint upsert 1591312493.047 171734 1197963 RELEASE 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: release savepoint upsert ... followed by reads of 99 and 101. 1591312493.047 171734 1197963 SELECT 5ed9806c.29ed6 jepsen process 5 DETAIL: parameters: $1 = '101' 1591312493.047 171734 1197963 SELECT 5ed9806c.29ed6 jepsen process 5 DETAIL: parameters: $1 = '99' 1591312493.047 171734 1197963 SELECT 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: select (val) from txn0 where sk = $1 1591312493.047 171734 1197963 SELECT 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: select (val) from txn2 where id = $1 We know the read of 101 observed no rows, which makes sense because process 5 (T2)'s transaction began during process 23's transaction, *before* the commit. What *doesn't* make sense is... 1591312493.048 171734 1197963 COMMIT 5ed9806c.29ed6 jepsen process 5 LOG: execute <unnamed>: COMMIT Process 5 manages to *commit* despite the anti-dependency cycle! That's legal under SI, but not under serializability. Phew! I know it's a lot. Does this help? --Kyle
pgsql-bugs by date: