incoherent view of serializable transactions - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | incoherent view of serializable transactions |
Date | |
Msg-id | 494F7365.EE98.0025.0@wicourts.gov Whole thread Raw |
Responses |
Re: incoherent view of serializable transactions
Re: incoherent view of serializable transactions Re: incoherent view of serializable transactions |
List | pgsql-hackers |
As I've understood limitations of the PostgreSQL implementation of SERIALIZABLE transactions, at least the only example given in the documentation, revolve around a rather unlikely situation: Given concurrent transactions T1 and T2 and non-overlapping sets of data A and B, T1 reads data including A and uses the data to modify B while T2 reads data including B and uses that data to modify A, where the modifications performed by either would affect the modifications made by the other, if they were visible. For reasons I'll omit here, that scenario didn't worry me for my current uses of PostgreSQL. I've found another form of deviation from the standard SERIALIZABLE behavior, though, which does worry me. Although the above appears to be the only situation where the end result after everything commits is inconsistent with standard SERIALIZABLE behavior, the PostgreSQL implementation allows transactions to view the data in states which would never be possible during the application of the transactions in series in the order they will appear to have been applied after the commit. Imagine, as an example, a system which involves recording receipts, each of which must go into a daily deposit. There is a control table with one row containing the current deposit date for receipts. Somewhere mid-afternoon that date is updated, all subsequent receipts fall into the new day, and a report is run listing the receipts for the day and giving the deposit total. Under a standard-compliant implementation of SERIALIZABLE, this is straightforward: a transaction which is inserting a receipt selects the deposit date to use in its transaction, and any SELECT of receipts for a date prior to the current deposit date will see the accurate, final data. Under the PostgreSQL implementation, although data eventually gets to a coherent state, there can be a window of time where a SELECT can return an incomplete list of receipts for a date which appears to be closed, even if all transactions for modifying and viewing data are SERIALIZABLE. -- setup create table ctl (k text not null primary key, deposit_date date not null); insert into ctl values ('receipt', date '2008-12-22'); create table receipt (receipt_no int not null primary key, deposit_date date not null, amount numeric(13,2)); insert into receipt values (1, (select deposit_date from ctl where k = 'receipt'), 1.00); insert into receipt values (2, (select deposit_date from ctl where k = 'receipt'), 2.00); -- connection 1 start transaction isolation level serializable ; insert into receipt values (3, (select deposit_date from ctl where k = 'receipt'), 4.00); -- connection 2 start transaction isolation level serializable ; update ctl set deposit_date = date '2008-12-23' where k = 'receipt'; commit transaction; start transaction isolation level serializable ; select * from ctl; -- (deposit_date shows as 2008-12-23) select * from receipt; -- (Only receipts 1 and 2 show for 2008-12-22.) commit; -- connection 1 commit transaction; -- connection 2 start transaction isolation level serializable ; select * from receipt; -- (All receipts for the 2008-12-22 deposit date now show.) commit transaction; At this point, SERIALIZABLE transactions appear to have worked, with receipt 3 happening before the update of deposit_date; however, there was a window of time when the update to deposit_date was visible and receipt 3 was not. This absolutely can't happen in a standard-compliant implementation. At a minimum, this window where visible data lacks coherency should be noted in the documentation. I don't know if there's any way to fix this without killing performance. -Kevin
pgsql-hackers by date: