BUG #11732: Non-serializable outcomes under serializable isolation - Mailing list pgsql-bugs
From | pbailis@cs.berkeley.edu |
---|---|
Subject | BUG #11732: Non-serializable outcomes under serializable isolation |
Date | |
Msg-id | 20141021071458.2678.9080@wrigleys.postgresql.org Whole thread Raw |
Responses |
Re: BUG #11732: Non-serializable outcomes under serializable isolation
Re: BUG #11732: Non-serializable outcomes under serializable isolation |
List | pgsql-bugs |
The following bug has been logged on the website: Bug reference: 11732 Logged by: Peter Bailis Email address: pbailis@cs.berkeley.edu PostgreSQL version: 9.3.5 Operating system: Ubuntu 14.04.1 LTS Description: When issuing a high-contention workload under SERIALIZABLE isolation, I am able to produce non-serializable outcomes. I am able to reproduce this behavior both via a Ruby on Rails deployment and via a standalone Python script that generates concurrent transactions (https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-py). BASIC WORKLOAD: The basic workload consists of a series of concurrent transactions performing read-modify-insert operations on a non-indexed varchar column. The workload repeatedly issues a set of K transactions in parallel, incrementing a sequence number for each wave. Each transaction within a wave checks whether or not the non-indexed varchar column contains the sequence number encoded as a string (via a SELECT WHERE query). If so, the query rolls back. If not, attempts to insert a new row with the sequence number. (This is effectively a poor manâs version of enforcing uniqueness constraints.) Under SERIALIZABLE isolation, only one transaction should successfully insert an entry into the varchar column per sequence number. (Reasoning: If, within a wave, a transaction T_1 observes that another transaction T_2 has already updated the column, it will not attempt to insert. If, within a wave, a transaction T_1 does not observe another transaction T_2's successful, committed insertion to the same sequence number, T_1 should abort to preserve serializability and will not insert. In the event that all transactions within a wave abort, there will be no insertions to that sequence number.) As a minimal working example, consider the following table, with the âkeyâ column holding the sequence number varchar column from above: TABLE "stress" ("id" serial primary key, "key" character varying(255)) and the following transaction, executed in parallel and parameterized by K, a varchar: BEGIN TRANSACTION; result = 'SELECT 1 AS one FROM "stress" WHERE "stress"."key" = K;' if result is not NULL: 'INSERT INTO "stress" ("key") VALUES (K) RETURNING "id";' COMMIT TRANSACTION; else: ROLLBACK TRANSACTION; (In both implementations I've built, the logic has been written in the application, and the single-quoted queries run in PostgreSQL.) EXPECTED BEHAVIOR: Under serializable isolation, there should only be one record per 'key': SELECT key, COUNT(*) from stress GROUP BY key HAVING COUNT(*) > 1; key | count -----+------- (0 rows) ACTUAL BEHAVIOR: Under serializable behavior, there are duplicates in the "key" column. SELECT key, COUNT(*) from stress GROUP BY key HAVING COUNT(*) > 1; key | count -----+------- 745 | 2 274 | 2 55 | 2 (3 rows) I realize this is a weird workload, but it's derived from a set of queries in Rails's ORM. ENVIRONMENT DETAILS: I have reproduced this behavior in two separate environments: PostgreSQL version(): PostgreSQL 9.3.5 on x86_64-unknown-linux-gnu, compiled by gcc (Ubuntu 4.8.2-19ubuntu1) 4.8.2, 64-bit PostgreSQL installation method: vanilla install via apt ('postgres-9.3') OS: Ubuntu 14.04 LTS Hardware: EC2 m2.4xlarge instance with 8 cores, 68.4GB memory, and 840GB local ephemeral storage PostgreSQL version: PostgreSQL 9.3.5 on x86_64-apple-darwin13.3.0, compiled by Apple LLVM version 5.1 (clang-503.0.40) (based on LLVM 3.4svn), 64-bit PostgreSQL installation method: vanilla install via brew ('postgres') OS: Apple OS X 10.9.4 Hardware: Macbook Pro, 2.8GHz Intel i5, 16GB RAM, SSD Configuration changes: I've changed the default postgresql.conf default_transaction_isolation to 'serializable' and increasing the number of connections and memory to accommodate multiple parallel connections (in the script I've provided, 128): default_transaction_isolation = 'serializable' max_connections = 200 shared_buffers = 1GB I originally thought this must be a bug in Rails, but I've now implemented the transaction logic in both Rails 4 and Python (using psycopg2) and reproduced this behavior in both environments. Autocommit is disabled, and the logs (see https://gist.github.com/pbailis/503430309104e3f7ab79#file-postgres-output-txt) demonstrate that serializable isolation is enabled (i.e., "Reason code: Canceled on identification as a pivot, during write"). This leads me to believe it is a consequence of the SSI implementation. REPRODUCING: Set up a new PostgreSQL database. The Python script assumes a database called 'stress' and a user called 'stress'. The following command will set up a new database in '/tmp/pgdata': export PGDATA=/tmp/pgdata; rm -rf $PGDATA; pg_ctl init; printf "default_transaction_isolation = 'serializable'\nshared_buffers=1GB\nmax_connections=200\n" >> $PGDATA/postgresql.conf; postgres -D /tmp/pgdata & sleep 2; psql postgres -c "CREATE USER stress WITH PASSWORD 'stress'"; psql postgres -c "CREATE DATABASE stress"; psql postgres -c "GRANT ALL PRIVILEGES ON DATABASE stress to stress"; fg Run a series of concurrent sets of transactions as described above, or just run the Python tool (https://gist.github.com/pbailis/503430309104e3f7ab79#file-pg-stress-py). Check whether the 'key' column contains duplicate entries (e.g., https://gist.github.com/pbailis/503430309104e3f7ab79#file-state-of-database-after-pg-stress-py). This may take a few tries. I'd estimate that over half of my runs in both environments (going through the whole setup -- database setup plus Python script) reproduce this behavior. The current script is a bit of a hack, leading to high variance in timing behavior. OTHER NOTES: I haven't had much luck reproducing with a non-varchar key (e.g., (key integer)) and without RETURNING "id". Putting a unique index on "key" seems to work as expected. The behavior seems to appear more often when run on a new PostgreSQL installation. If the script hangs, check the contents of the table -- there may be a duplicate. However, not all duplicates freeze the script. I'm happy to debug further, but I'm sending what I have so far.
pgsql-bugs by date: