Thread: How to reproduce serialization failure for a read only transaction.
If two transactions both read and write, I can easily reproduce the following: "could not serialize access due to read/write dependencies among transactions". However, the 9.3 documentation says that "When relying on Serializable transactions to prevent anomalies, it is important that any data read from a permanent user table not be considered valid until the transaction which read it has successfully committed. This is true even for read-only transactions". I cannot have a read-only transaction fail because of serialization anomalies. Can someone show me a working example please? -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Jan6, 2014, at 20:41 , AK <alkuzo@gmail.com> wrote: > If two transactions both read and write, I can easily reproduce the > following: "could not serialize access due to read/write dependencies among > transactions". However, the 9.3 documentation says that "When relying on > Serializable transactions to prevent anomalies, it is important that any > data read from a permanent user table not be considered valid until the > transaction which read it has successfully committed. This is true even for > read-only transactions". > > I cannot have a read-only transaction fail because of serialization > anomalies. Can someone show me a working example please? A read-only transaction will abort due to a serialization failure if observes a state of the database which doesn't exist in any serial transaction schedule. Here's an example (default isolation level is assumed to be serializable, of course) W1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; W1: UPDATE t SET count=count+1 WHERE id=1; -- (*2) W1: SELECT data FROM t WHERE id=2; -- (*1) W2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; W2: UPDATE t SET count=count+1 WHERE id=2; -- (*1, *2) W2: COMMIT; R : START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY; R : SELECT data FROM t WHERE id IN (1,2); -- (*2) W1: COMMIT; -- R will now report a serialization error! Due to (*1), W1 must execute before W2 in any serial schedule, since W1 reads record 2 which is later modified by W2. Due to (*2), R must execute after W2 but before W1 since it reads record 2 previously modified by W2 and record 1 later modified by W1. (Note that W1 hasn't committed at time R acquires its snapshot) The dependencies induced by (*1) or (*2) alone are satisfyable by a serial schedule, but both together aren't - if W1 must execute before W2 as required by (*1), then surely every transaction that runs after W2 in such a schedule also runs after W1, thus contradicting (*2). Now since (*1) alone isn't contradictory, committing W1 succeeds. That leaves only the last line, the COMMIT of R, to fail, which it does. The gist of this example is that whether the state observed by R exists in any serial transaction schedule or not is only certain after all concurrent read-write transactions (W1 and W2) have committed. You can avoid the error above by specifying DEFERRABLE in R's START TRANSACTION command. The session will then acquire a snapshot and wait for all possibly interfering read-write transactions to commit. If the snapshot turns out to be observable in some serial schedule, the session will continue, otherwise the database will acquire a new snapshot and wait again. Thus, once the START TRANSACTION with the DEFERRABLE flag has committed, you can be sure that the transaction won't later be aborted due to a serialization error. BTW, since this is a question about how to use postgres rather than how to extend it, it actually belongs on pgsql-general, not on the hackers list. best regards, Florian Pflug
Hi Florian, can you explain why do you state that "default isolation level is assumed to be serializable, of course", when you explicitly specify isolation level for every session - why should he default matter at all? When I am trying to reproduce the scenario which you have posted, I am observing different results. Here is my full scenario: Session 1. Setting up: CREATE TABLE cars( license_plate VARCHAR NOT NULL, reserved_by VARCHAR NULL ); INSERT INTO cars(license_plate) VALUES ('SUPRUSR'),('MIDLYPH'); Session 2: W1 BEGIN ISOLATION LEVEL SERIALIZABLE; UPDATE cars SET reserved_by = 'Julia' WHERE license_plate = 'SUPRUSR' AND reserved_by IS NULL; SELECT * FROM Cars WHERE license_plate IN('SUPRUSR','MIDLYPH'); Session 3: W2 BEGIN ISOLATION LEVEL SERIALIZABLE; UPDATE cars SET reserved_by = 'Ryan' WHERE license_plate = 'MIDLYPH' AND reserved_by IS NULL; COMMIT; Session 4: R BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY; SELECT * FROM Cars WHERE license_plate IN('SUPRUSR','MIDLYPH'); Session 2: W1 COMMIT; ERROR: could not serialize access due to read/write dependencies among transactions What am I doing wrong? Thank you for your help! -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569p5785597.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: Re: How to reproduce serialization failure for a read only transaction.
From
Florian Pflug
Date:
On Jan6, 2014, at 23:28 , AK <alkuzo@gmail.com> wrote: > can you explain why do you state that "default isolation level is assumed to > be > serializable, of course", when you explicitly specify isolation level for > every session - why should he default matter at all? Sorry, that was a leftover - I initially wrote just START TRANSACTION with specifying an isolation level. > > When I am trying to reproduce the scenario which you have posted, I am > observing different results. Hm, yeah, I missed two things. First, dependency tracking can produce false positives, i.e. assume that dependencies exist between transactions which are actually independent. In my example, postgres fails to realize that W2 can be executed after W1, unless it uses an index scan for the UPDATE in W2. You can avoid that either by creating an index on the id column, and forcing W2 to use that by setting enable_seqscan to off, or by creating two tables t1 and t2 instead of one table t with two records (You'll have to modify the SELECT to scan both tables too). Second, since R executes it's SELECT before W1 commits, postgres is already aware that R poses a problem when W1 commits, and it chooses to cancel W1 instead of R. To avoid that, R needs to do the SELECT after W1 committed. Yet still force R to acquire a snapshot *before* that commit (without that, there's no serialization failure since R than simply executes after W1 and W2), you'll need to do e.g. SELECT 1 after R's START TRANSACTION command. I think the following should work (or, rather, fail) CREATE TABLE t (id INT PRIMARY KEY, count INT); INSERT INTO t (id, count) SELECT i, 0 FROM generate_series(1,2); W1: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; W1: UPDATE t SET count=count+1 WHERE id=1; W1: SELECT count FROM t WHERE id=2; W2: SET enable_seqscan=off; W2: START TRANSACTION ISOLATION LEVEL SERIALIZABLE; W2: UPDATE t SET count=count+1 WHERE id=2; W2: COMMIT; R : START TRANSACTION ISOLATION LEVEL SERIALIZABLE READ ONLY; R : SELECT 1; W1: COMMIT; R : SELECT data FROM t WHERE id IN (1,2); -- Should fail best regards, Florian Pflug
On 1/6/14, 5:27 PM, Florian Pflug wrote:> On Jan6, 2014, at 23:28 , AK <alkuzo@gmail.com> wrote:> First, dependency trackingcan produce false positives, i.e. assume that> dependencies exist between transactions which are actually independent.>In my example, postgres fails to realize that W2 can be executed after W1,> unless it uses an index scan forthe UPDATE in W2. You can avoid that either> by creating an index on the id column, and forcing W2 to use that by setting>enable_seqscan to off, or by creating two tables t1 and t2 instead of one> table t with two records (You'll haveto modify the SELECT to scan both tables> too).>> Second, since R executes it's SELECT before W1 commits, postgres isalready> aware that R poses a problem when W1 commits, and it chooses to cancel W1> instead of R. To avoid that, R needsto do the SELECT after W1 committed.> Yet still force R to acquire a snapshot *before* that commit (without that,> there'sno serialization failure since R than simply executes after W1 and> W2), you'll need to do e.g. SELECT 1 after R'sSTART TRANSACTION command.>> I think the following should work (or, rather, fail) This email and the previous one are an awesome bit of information, can we add it to the docs somehow? Even if it's just dumpingthe emails into a wiki page and referencing it? -- Jim C. Nasby, Data Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
This worked for me - thank you so much! The SELECT did fail. Also I cannot reproduce a scenario when "applications must not depend on results read during a transaction that later aborted;". In this example the SELECT itself has failed. Can you show an example where a SELECT completes, but the COMMIT blows up? -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569p5785618.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
On Mon, Jan 06, 2014 at 05:14:12PM -0800, AK wrote: > Also I cannot reproduce a scenario when "applications must not depend on > results read during a transaction that later aborted;". In this example the > SELECT itself has failed. > Can you show an example where a SELECT completes, but the COMMIT blows up? Actually, no, not for a read-only transaction. It happens that the final serialization failure check executed on COMMIT only affects read/write transactions, not read-only ones. That's a pretty specific implementation detail, though, so I wouldn't necessarily rely on it... Here's an example of why applications must not depend on results read during a transaction that later aborted: W2: BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLE W2: UPDATE t SET count=1 WHERE id=1; W1: BEGIN TRANSACTION ISOLATIONLEVEL SERIALIZABLE W1: SELECT * FROM t WHERE id=1; W2: COMMIT; R : BEGIN TRANSACTION ISOLATION LEVEL SERIALIZABLEREAD ONLY R : SELECT * FROM t; R : COMMIT;! W1: UPDATE t SET count=1 WHERE id=2; W1: COMMIT; If you try this, it'll cause a serialization failure on the line marked with a '!'. W1 saw (1,0) in the table, so W1 appears to have executed before W2. But R saw both (1,1) and (2,0) in the table, and that has to be a consistent snapshot of the database state, meaning W2 appears to have executed before W1. That's an inconsistency, so something has to be rolled back. This particular anomaly requires all three of the transactions, and so it can't be detected until W1 does its UPDATE. Postgres detects the conflict at that point and rolls back W1. So what does this have to do with relying on the results of read-only transactions that abort? Well, what if you had instead had R ROLLBACK instead of COMMIT -- maybe because you expected ROLLBACK and COMMIT to be equivalent for transactions that don't modify the database, or maybe because something else caused the transaction to abort? When W1 does its update, it will be checked for serialization failures, but aborted transactions are (intentionally) not included in those checks. W1 is therefore allowed to commit; the apparent serial order of execution is W1 followed by W2, and the results of the aborted transaction R aren't consistent with that. Dan -- Dan R. K. Ports UW CSE http://drkp.net/
AK <alkuzo@gmail.com> wrote: > I cannot have a read-only transaction fail because of > serialization anomalies. Can someone show me a working example > please? A common case is a read-only transaction reading a closed batch without seeing all of its entries. http://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jan7, 2014, at 00:38 , Jim Nasby <jim@nasby.net> wrote: > This email and the previous one are an awesome bit of information, > can we add it to the docs somehow? Even if it's just dumping the > emails into a wiki page and referencing it? Most of what I wrote there can be found in README-SSE, I think, under "Apparent Serial Order of Execution", "Heap locking" and "Index AM implementations". I guess it'd be nice if we explained these things in the docs somewhere, though I'm not sure what level of detail would be appropriate. Maybe a good compromise would be to explain dependency graphs, but skip over the different kinds of dependencies (ww, rw, wr). Instead we could say that whenever a transaction *does see* another transaction's modifications it must appear after that transaction in any serial schedule, and whenever a transaction *might see* another transaction's modifications but doesn't due to begin/commit ordering it must appear before that transaction. best regards, Florian Pflug
Re: Re: How to reproduce serialization failure for a read only transaction.
From
Kevin Grittner
Date:
Dan Ports <drkp@csail.mit.edu> wrote: > On Mon, Jan 06, 2014 at 05:14:12PM -0800, AK wrote: > If you try this, it'll cause a serialization failure on the line > marked with a '!'. W1 saw (1,0) in the table, so W1 appears to > have executed before W2. But R saw both (1,1) and (2,0) in the > table, and that has to be a consistent snapshot of the database > state, meaning W2 appears to have executed before W1. That's an > inconsistency, so something has to be rolled back. This > particular anomaly requires all three of the transactions, and so > it can't be detected until W1 does its UPDATE. Postgres detects > the conflict at that point and rolls back W1. Yeah, neither of the provided examples rolled back the read only transaction itself; the read only transaction caused a situation where something needed to be rolled back, but since we try to roll back a transaction which has a good chance of succeeding on retry, the read only transaction is not usually a good candidate. I created a new example on the Wiki page where the read only transaction itself must be rolled back because both of the other transactions involved have already committed: https://wiki.postgresql.org/wiki/SSI#Rollover Regarding other questions on the thread: I have no objections to moving the Wiki examples into the docs, but it seemed like a lot to include, and I'm not sure where it belongs. Ideas? Regarding the different results AK got, I set default_transaction_isolation = 'serializable' on my connections before running these for two reasons. (1) It keeps the examples more concise. (2) I think most people using serializable transactions in PostgreSQL set the default and don't set the transaction isolation level on each transaction, since (unlike strategies which rely on blocking, like S2PL) all transactions must be participating in the stricter isolation level for it to be reliable. In fact, given the performance benefits of declaring transactions READ ONLY when possible, I have seen shops that make *that* a default, too, and override it for transactions which need to write. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Re: How to reproduce serialization failure for a read only transaction.
From
Kevin Grittner
Date:
AK <alkuzo@gmail.com> wrote: > Session 1. Setting up: > > CREATE TABLE cars( > license_plate VARCHAR NOT NULL, > reserved_by VARCHAR NULL > ); > INSERT INTO cars(license_plate) > VALUES ('SUPRUSR'),('MIDLYPH'); > > Session 2: W1 > > BEGIN ISOLATION LEVEL SERIALIZABLE; > > UPDATE cars SET reserved_by = 'Julia' > WHERE license_plate = 'SUPRUSR' > AND reserved_by IS NULL; > > SELECT * FROM Cars > WHERE license_plate IN('SUPRUSR','MIDLYPH'); > > Session 3: W2 > > BEGIN ISOLATION LEVEL SERIALIZABLE; > > UPDATE cars SET reserved_by = 'Ryan' > WHERE license_plate = 'MIDLYPH' > AND reserved_by IS NULL; > > COMMIT; > > Session 4: R > > BEGIN ISOLATION LEVEL SERIALIZABLE READ ONLY; > > SELECT * FROM Cars > WHERE license_plate IN('SUPRUSR','MIDLYPH'); > > Session 2: W1 > > COMMIT; > > ERROR: could not serialize access due to read/write dependencies > among transactions > > What am I doing wrong? Even without the read only transaction the W1 and W2 transactions are a classic case of write skew. It looks like it might actually be benign, since neither transaction is updating license_plate, but serializable logic works at the row level, not the column level. After both transactions update the table there is write skew which must be resolved by cancelling one of the transactions. The first to commit "wins" and the other one will be cancelled when it attempts to run its next statement, which may or may not be a COMMIT. If, for purposes of demonstration, you add a unique index on license_plate and set enable_seqscan = off, you eliminate the simple write skew and get into more complex ways of breaking things. With that tweak you can run all of those transactions if W1 skips the SELECT. You can let W1 do the SELECT as long as you don't run R. The problem is that the SELECT in W1 sees the work of W1 but not W2 and the SELECT in R sees the work of W2 but not W1. We can't allow that. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Regarding this: >> So what does this have to do with relying on the results of read-only transactions that abort? Well, what if you had instead had R ROLLBACK instead of COMMIT -- maybe because you expected ROLLBACK and COMMIT to be equivalent for transactions that don't modify the database, or maybe because something else caused the transaction to abort? When W1 does its update, it will be checked for serialization failures, but aborted transactions are (intentionally) not included in those checks. W1 is therefore allowed to commit; the apparent serial order of execution is W1 followed by W2, and the results of the aborted transaction R aren't consistent with that. << So if I am reading the data and then commit, I should be always fine, correct? -- View this message in context: http://postgresql.1045698.n5.nabble.com/How-to-reproduce-serialization-failure-for-a-read-only-transaction-tp5785569p5785757.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
Re: Re: How to reproduce serialization failure for a read only transaction.
From
Kevin Grittner
Date:
AK <alkuzo@gmail.com> wrote: > So if I am reading the data and then commit, I should be always > fine, correct? If a serializable transaction successfully commits, that means that all data read within that transaction can be trusted. -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jan7, 2014, at 20:11 , Kevin Grittner <kgrittn@ymail.com> wrote: > Yeah, neither of the provided examples rolled back the read only > transaction itself; Actually, the fixed version [1] of my example does. [1] http://www.postgresql.org/message-id/8721AAD3-7A3A-4576-B10E-F2CBD1E5337A@phlo.org best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> wrote: > On Jan7, 2014, at 20:11 , Kevin Grittner <kgrittn@ymail.com> wrote: >> Yeah, neither of the provided examples rolled back the read only >> transaction itself; > > Actually, the fixed version [1] of my example does. > > [1] http://www.postgresql.org/message-id/8721AAD3-7A3A-4576-B10E-F2CBD1E5337A@phlo.org Due to my lame email provider, that post didn't show for me until I had already replied. :-( You had already showed an example almost exactly like what I described in my post. I tweaked it a bit more for the Wiki page to show more clearly why SSI has to care about what the writing transaction reads. For all the database engine knows, what was read contributed to whether the application allowed it to successfully commit. By using the value from the SELECT in the UPDATE it is easier to see why it matters, although it needs to be considered either way. In other words, we seem to be in full agreement, just using different language to describe it. :-) -- Kevin Grittner EDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company