Thread: Update on true serializable techniques in MVCC
Just to make those who care aware of it, here is Michael Cahill's Doctoral Thesis based on implementing Serializable Snapshot Isolation in InnoDB using a refined version of the techniques previously used in the Berkley DB (and previously discussed on this list): http://hdl.handle.net/2123/5353 For those who missed the previous discussion, or don't remember it well, this technique uses non-blocking SIREAD locks, which allow true serializable behavior without increasing blocking beyond what is present in normal MVCC snapshot isolation (readers do not block writers, and vice versa), but generates some false positive serialization errors. Apparently the number of false positives in InnoDB is less than Berkeley DB because of the more fine-grained locking in InnoDB. Seriously, this post is just for the benefit of those who may be interested in following these developments -- I don't have the inclination or energy for another round of debate on the topic just now. :-/ -Kevin
Kevin Grittner wrote: > Just to make those who care aware of it, here is Michael Cahill's > Doctoral Thesis based on implementing Serializable Snapshot > Isolation in InnoDB using a refined version of the techniques > previously used in the Berkley DB (and previously discussed on this > list): > > http://hdl.handle.net/2123/5353 > > Seriously, this post is just for the benefit of those who may be > interested in following these developments -- I don't have the > inclination or energy for another round of debate on the topic just > now. :-/ I understand that, and thank you for the information. Although it may have seemed that I was out to shoot the idea down, I am interested in the topic. I guess my way of understanding something is trying to find holes in it... I read into the text, and I was particularly interested how he solved the problem of phantom reads. Quote: The problem [of phantom reads] was identified in (Eswaran et al., 1976), but the general purpose "predicate locking"solution suggested there has not been widely adopted because of the difficulty in testing mutual satisfiabilityof predicates. Instead, locking DBMS implementations commonly use algorithms based on "next-key locking". In these, a range of key spaceis protected against concurrent insertion or deletion by acquiring a shared lock on the next row in order, as a scanis made to check whether rows match a predicate. The scan might be through the data records or through an index. Inserts and deletes follow the same protocol, obtaining an exclusive lock on the row after the one being inserted or deleted.The result of this locking protocol is that a range scan prevents concurrent inserts or delete within the rangeof the scan, and vice versa. That sounds like it should actually work. Yours, Laurenz Albe
2009/12/16 Albe Laurenz <laurenz.albe@wien.gv.at>: > Quote: > The problem [of phantom reads] was identified in (Eswaran et al., 1976), > but the general purpose "predicate locking" solution suggested there > has not been widely adopted because of the difficulty in testing mutual > satisfiability of predicates. > > Instead, locking DBMS implementations commonly use algorithms based on > "next-key locking". In these, a range of key space is protected against > concurrent insertion or deletion by acquiring a shared lock on the next > row in order, as a scan is made to check whether rows match a predicate. > The scan might be through the data records or through an index. > > Inserts and deletes follow the same protocol, obtaining an exclusive > lock on the row after the one being inserted or deleted. The result > of this locking protocol is that a range scan prevents concurrent > inserts or delete within the range of the scan, and vice versa. > > That sounds like it should actually work. That boils down to 2PL, using a granularity that is somewhere between table locks and single-row locks (note that the latter doesn't correctly enforce serializability, hence something more coarse which also locks not-yet-existing rows is needed). Disadvantages: 1. Unstable latency: Depending on whether indexes or table scans are used (i.e., the plan), other transactions may be blocked for long durations or not. 2. Unstable susceptibility to deadlocks: Idem; it is possible that once the planner starts to use another kind of scans, that your transactions start to deadlock. It seems that the proposed SIREAD method fixes at least (1), because there is no additional blocking involved. I am not sure whether the serialization failures that it may cause are dependent on the plan used. If so, that would be very similar to (2). Nicolas
* Albe Laurenz: > That sounds like it should actually work. If you have got an index, yes. It seems to me that it would make locking behavior dependent on your query plan, too. BTW, PostgreSQL could raise a different error when a unique constraint violation is detected which involves a row which is not visible at the current snapshot. At least in my limited experience, that would allow applications to recover more easily if small transactions fail (similar to what you have to do on deadlock). Right now (well, at least with 8.3, haven't checked 8.4 yet), it's not possible to tell a unique constraint violation caused by a phantom from an application bug. (We currently faking this by retrying a fixed number of times and bailing out if the error returned by PostgreSQL looks like a unique constraint violation.) -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
Nicolas Barbier wrote: >> Quote: [...] >> >> That sounds like it should actually work. > > That boils down to 2PL, using a granularity that is somewhere between > table locks and single-row locks (note that the latter doesn't > correctly enforce serializability, hence something more coarse which > also locks not-yet-existing rows is needed). That's how I understood it too. > Disadvantages: > > 1. Unstable latency: Depending on whether indexes or table scans are > used (i.e., the plan), other transactions may be blocked for long > durations or not. > 2. Unstable susceptibility to deadlocks: Idem; it is possible that > once the planner starts to use another kind of scans, that your > transactions start to deadlock. > > It seems that the proposed SIREAD method fixes at least (1), because > there is no additional blocking involved. I am not sure whether the > serialization failures that it may cause are dependent on the plan > used. If so, that would be very similar to (2). Well, I guess that you have to pay somehow for serializability - there will be more locks and more lock management. I did not think of that, but it is really unpleasant if your transactions suddenly start receiving serialization errors because the plan has been changed. And the thesis says that the tests did not reveal too many false positives, so maybe it is not that bad. But maybe that's a price worth paying for serializability? Yours, Laurenz Albe
On Wed, Dec 16, 2009 at 4:52 AM, Albe Laurenz <laurenz.albe@wien.gv.at> wrote: > Kevin Grittner wrote: >> Just to make those who care aware of it, here is Michael Cahill's >> Doctoral Thesis based on implementing Serializable Snapshot >> Isolation in InnoDB using a refined version of the techniques >> previously used in the Berkley DB (and previously discussed on this >> list): >> >> http://hdl.handle.net/2123/5353 >> >> Seriously, this post is just for the benefit of those who may be >> interested in following these developments -- I don't have the >> inclination or energy for another round of debate on the topic just >> now. :-/ > > I understand that, and thank you for the information. > Although it may have seemed that I was out to shoot the idea down, > I am interested in the topic. I guess my way of understanding something > is trying to find holes in it... > > I read into the text, and I was particularly interested how he solved > the problem of phantom reads. > > Quote: > The problem [of phantom reads] was identified in (Eswaran et al., 1976), > but the general purpose "predicate locking" solution suggested there > has not been widely adopted because of the difficulty in testing mutual > satisfiability of predicates. > > Instead, locking DBMS implementations commonly use algorithms based on > "next-key locking". In these, a range of key space is protected against > concurrent insertion or deletion by acquiring a shared lock on the next > row in order, as a scan is made to check whether rows match a predicate. > The scan might be through the data records or through an index. > > Inserts and deletes follow the same protocol, obtaining an exclusive > lock on the row after the one being inserted or deleted. The result > of this locking protocol is that a range scan prevents concurrent > inserts or delete within the range of the scan, and vice versa. > > That sounds like it should actually work. Only if you can guarantee that the database will access the rows using some particular index. If it gets to the data some other way it might accidentally circumvent the lock. That's kind of a killer in terms of making this work for PostgreSQL. ...Robert
Moin, On Wednesday 16 December 2009 16:24:42 Robert Haas wrote: > > Inserts and deletes follow the same protocol, obtaining an exclusive > > lock on the row after the one being inserted or deleted. The result > > of this locking protocol is that a range scan prevents concurrent > > inserts or delete within the range of the scan, and vice versa. > > > > That sounds like it should actually work. > > Only if you can guarantee that the database will access the rows using > some particular index. If it gets to the data some other way it might > accidentally circumvent the lock. That's kind of a killer in terms of > making this work for PostgreSQL. Isnt the whole topic only relevant for writing access? There you have to access the index anyway. Andres
On Wed, Dec 16, 2009 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote: > On Wednesday 16 December 2009 16:24:42 Robert Haas wrote: >> > Inserts and deletes follow the same protocol, obtaining an exclusive >> > lock on the row after the one being inserted or deleted. The result >> > of this locking protocol is that a range scan prevents concurrent >> > inserts or delete within the range of the scan, and vice versa. >> > >> > That sounds like it should actually work. >> >> Only if you can guarantee that the database will access the rows using >> some particular index. If it gets to the data some other way it might >> accidentally circumvent the lock. That's kind of a killer in terms of >> making this work for PostgreSQL. > Isnt the whole topic only relevant for writing access? There you have to > access the index anyway. Yeah, I guess you have to insert the new tuple. I guess while you were at it you might check whether the next tuple is locked... ...Robert
"Albe Laurenz" <laurenz.albe@wien.gv.at> wrote: > Although it may have seemed that I was out to shoot the idea down, > I am interested in the topic. I guess my way of understanding > something is trying to find holes in it... No problem. That's how ideas are explored and improved. The brick wall was that there seemed to be overwhelming opposition to any new technique which causes serialization failures under conditions where the cause isn't blindingly obvious, regardless of the resulting improvements in data reliability. > I was particularly interested how he solved the problem of phantom > reads. > That sounds like it should actually work. Well, that's certainly not the novel part -- those techniques were described and proven theoretically decades ago and have been in use by many popular products for almost as long. It's that non-blocking SIREAD lock which is the fun part. ;-) I just wish I had time to read all the documents referenced in the footnotes.... -Kevin
Nicolas Barbier <nicolas.barbier@gmail.com> wrote: > I am not sure whether the serialization failures that it may cause > are dependent on the plan used. They are. -Kevin
Robert Haas escribió: > On Wed, Dec 16, 2009 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote: > > On Wednesday 16 December 2009 16:24:42 Robert Haas wrote: > >> > Inserts and deletes follow the same protocol, obtaining an exclusive > >> > lock on the row after the one being inserted or deleted. The result > >> > of this locking protocol is that a range scan prevents concurrent > >> > inserts or delete within the range of the scan, and vice versa. > >> > > >> > That sounds like it should actually work. > >> > >> Only if you can guarantee that the database will access the rows using > >> some particular index. If it gets to the data some other way it might > >> accidentally circumvent the lock. That's kind of a killer in terms of > >> making this work for PostgreSQL. > > Isnt the whole topic only relevant for writing access? There you have to > > access the index anyway. > > Yeah, I guess you have to insert the new tuple. I guess while you > were at it you might check whether the next tuple is locked... So you'd have to disable HOT updates when true serializability was active? -- Alvaro Herrera http://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc.
On Wed, Dec 16, 2009 at 1:14 PM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > Robert Haas escribió: >> On Wed, Dec 16, 2009 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote: >> > On Wednesday 16 December 2009 16:24:42 Robert Haas wrote: >> >> > Inserts and deletes follow the same protocol, obtaining an exclusive >> >> > lock on the row after the one being inserted or deleted. The result >> >> > of this locking protocol is that a range scan prevents concurrent >> >> > inserts or delete within the range of the scan, and vice versa. >> >> > >> >> > That sounds like it should actually work. >> >> >> >> Only if you can guarantee that the database will access the rows using >> >> some particular index. If it gets to the data some other way it might >> >> accidentally circumvent the lock. That's kind of a killer in terms of >> >> making this work for PostgreSQL. >> > Isnt the whole topic only relevant for writing access? There you have to >> > access the index anyway. >> >> Yeah, I guess you have to insert the new tuple. I guess while you >> were at it you might check whether the next tuple is locked... > > So you'd have to disable HOT updates when true serializability was > active? I thought about that, but I don't think so. HOT only applies to updates, and predicate locking only applies to inserts. Unless I have my head in the sand? ...Robert
On Wed, Dec 16, 2009 at 1:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > On Wed, Dec 16, 2009 at 1:14 PM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> Robert Haas escribió: >>> On Wed, Dec 16, 2009 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote: >>> > On Wednesday 16 December 2009 16:24:42 Robert Haas wrote: >>> >> > Inserts and deletes follow the same protocol, obtaining an exclusive >>> >> > lock on the row after the one being inserted or deleted. The result >>> >> > of this locking protocol is that a range scan prevents concurrent >>> >> > inserts or delete within the range of the scan, and vice versa. >>> >> > >>> >> > That sounds like it should actually work. >>> >> >>> >> Only if you can guarantee that the database will access the rows using >>> >> some particular index. If it gets to the data some other way it might >>> >> accidentally circumvent the lock. That's kind of a killer in terms of >>> >> making this work for PostgreSQL. >>> > Isnt the whole topic only relevant for writing access? There you have to >>> > access the index anyway. >>> >>> Yeah, I guess you have to insert the new tuple. I guess while you >>> were at it you might check whether the next tuple is locked... >> >> So you'd have to disable HOT updates when true serializability was >> active? > > I thought about that, but I don't think so. HOT only applies to > updates, and predicate locking only applies to inserts. Unless I have > my head in the sand? Err, no, wait. Predicate locking can apply to updates, but since HOT updates never update an indexed column, I think we might still be OK? ...Robert
Robert Haas írta: > On Wed, Dec 16, 2009 at 1:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: > >> On Wed, Dec 16, 2009 at 1:14 PM, Alvaro Herrera >> <alvherre@commandprompt.com> wrote: >> >>> Robert Haas escribió: >>> >>>> On Wed, Dec 16, 2009 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote: >>>> >>>>> On Wednesday 16 December 2009 16:24:42 Robert Haas wrote: >>>>> >>>>>>> Inserts and deletes follow the same protocol, obtaining an exclusive >>>>>>> lock on the row after the one being inserted or deleted. The result >>>>>>> of this locking protocol is that a range scan prevents concurrent >>>>>>> inserts or delete within the range of the scan, and vice versa. >>>>>>> >>>>>>> That sounds like it should actually work. >>>>>>> >>>>>> Only if you can guarantee that the database will access the rows using >>>>>> some particular index. If it gets to the data some other way it might >>>>>> accidentally circumvent the lock. That's kind of a killer in terms of >>>>>> making this work for PostgreSQL. >>>>>> >>>>> Isnt the whole topic only relevant for writing access? There you have to >>>>> access the index anyway. >>>>> >>>> Yeah, I guess you have to insert the new tuple. I guess while you >>>> were at it you might check whether the next tuple is locked... >>>> >>> So you'd have to disable HOT updates when true serializability was >>> active? >>> >> I thought about that, but I don't think so. HOT only applies to >> updates, and predicate locking only applies to inserts. Unless I have >> my head in the sand? >> > > Err, no, wait. Predicate locking can apply to updates, but since HOT > updates never update an indexed column, I think we might still be OK? > A predicate can include columns from an index plus others. Am I missing something? > ...Robert > > -- Bible has answers for everything. Proof: "But let your communication be, Yea, yea; Nay, nay: for whatsoever is more than these cometh of evil." (Matthew 5:37) - basics of digital technology. "May your kingdom come" - superficial description of plate tectonics ---------------------------------- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH http://www.postgresql.at/
Alvaro Herrera <alvherre@commandprompt.com> wrote: > So you'd have to disable HOT updates when true serializability was > active? I wouldn't think so; but someone familiar with HOT logic could probably determine whether the unmodified algorithm could be used by reviewing the "simplifying assumptions" near the bottom of page 42, and the page about "Generalizing to other database engines" (section 4.8). I think those portions might stand on their own without reading the rest of the document. -Kevin
On Wed, Dec 16, 2009 at 1:29 PM, Boszormenyi Zoltan <zb@cybertec.at> wrote: > Robert Haas írta: >> On Wed, Dec 16, 2009 at 1:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >>> On Wed, Dec 16, 2009 at 1:14 PM, Alvaro Herrera >>> <alvherre@commandprompt.com> wrote: >>> >>>> Robert Haas escribió: >>>> >>>>> On Wed, Dec 16, 2009 at 10:29 AM, Andres Freund <andres@anarazel.de> wrote: >>>>> >>>>>> On Wednesday 16 December 2009 16:24:42 Robert Haas wrote: >>>>>> >>>>>>>> Inserts and deletes follow the same protocol, obtaining an exclusive >>>>>>>> lock on the row after the one being inserted or deleted. The result >>>>>>>> of this locking protocol is that a range scan prevents concurrent >>>>>>>> inserts or delete within the range of the scan, and vice versa. >>>>>>>> >>>>>>>> That sounds like it should actually work. >>>>>>>> >>>>>>> Only if you can guarantee that the database will access the rows using >>>>>>> some particular index. If it gets to the data some other way it might >>>>>>> accidentally circumvent the lock. That's kind of a killer in terms of >>>>>>> making this work for PostgreSQL. >>>>>>> >>>>>> Isnt the whole topic only relevant for writing access? There you have to >>>>>> access the index anyway. >>>>>> >>>>> Yeah, I guess you have to insert the new tuple. I guess while you >>>>> were at it you might check whether the next tuple is locked... >>>>> >>>> So you'd have to disable HOT updates when true serializability was >>>> active? >>>> >>> I thought about that, but I don't think so. HOT only applies to >>> updates, and predicate locking only applies to inserts. Unless I have >>> my head in the sand? >>> >> >> Err, no, wait. Predicate locking can apply to updates, but since HOT >> updates never update an indexed column, I think we might still be OK? >> > > A predicate can include columns from an index plus others. > Am I missing something? Hmm, interesting point. In that case you couldn't use the index to enforce predicate locking under MVCC without disabling HOT. But there will be other cases where that wouldn't help anyway - a predicate could also include unindexed columns exclusively. For those, the traditional approach (not the one discussed in this paper) probably requires locking against any heap insert, or checking each new heap insert against the constraint, or... something. ...Robert
On Wed, Dec 16, 2009 at 1:30 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote: > Alvaro Herrera <alvherre@commandprompt.com> wrote: > >> So you'd have to disable HOT updates when true serializability was >> active? > > I wouldn't think so; but someone familiar with HOT logic could > probably determine whether the unmodified algorithm could be used by > reviewing the "simplifying assumptions" near the bottom of page 42, > and the page about "Generalizing to other database engines" (section > 4.8). I think those portions might stand on their own without > reading the rest of the document. This thread veered off into a discussion of the traditional technique, rather than the one in the paper. I think that's the part Alvaro was responding to. ...Robert
Robert, Please forgive a couple editorial inserts to your statement -- I hope it clarifies. If I've distorted your meaning, feel free to straighten me out. :-) Robert Haas <robertmhaas@gmail.com> wrote: > This thread veered off into a discussion of the traditional > [predicate locking] technique, rather than the [serializable] one > in the paper. I think that's the part Alvaro was responding to. If you're right about Alvaro's concern -- my rough understanding is that HOT creates a linked lists of tuples which are mutations of one another without altering any value which is part of an index. If that's anywhere near a correct understanding, I can't see how there would be a problem with using the described locking techniques with any tuple on the list. As an aside, the thesis mentions smart use of multiple locking granularities only under the "future work" section. I can't see an implementation being considered production quality without that, as without it there would be no way to constrain the space required to track the locks. But there is no shortage of literature on how to do that, so I view such discussions as more appropriate to low-level implementation discussions should we ever get serious about using the techniques which are the main thrust of Dr. Cahill's thesis. If anyone is interested in reviewing recent literature on these techniques, Dr. Cahill seemed to like (Hellerstein et al., 2007), to the point where I may well track it down when I'm done pondering the work which I referenced at the start of this thread. -Kevin
Robert Haas wrote: > > A predicate can include columns from an index plus others. > > Am I missing something? > > Hmm, interesting point. In that case you couldn't use the index to > enforce predicate locking under MVCC without disabling HOT. But there > will be other cases where that wouldn't help anyway - a predicate > could also include unindexed columns exclusively. For those, the > traditional approach (not the one discussed in this paper) probably > requires locking against any heap insert, or checking each new heap > insert against the constraint, or... something. If I understand that correctly > [...] by acquiring a shared lock on the next > row in order, as a scan is made to check whether rows match a predicate. > The scan might be through the data records or through an index I would say that in the case of a table scan, the whole table will be SILOCKed. I guess that's pretty much unavoidable if you want serializability. Yours, Laurenz Albe
[ Forgot the list, resending. ] 2009/12/16 Boszormenyi Zoltan <zb@cybertec.at>: > Robert Haas írta: > >> On Wed, Dec 16, 2009 at 1:25 PM, Robert Haas <robertmhaas@gmail.com> wrote: >> >>> On Wed, Dec 16, 2009 at 1:14 PM, Alvaro Herrera >>> <alvherre@commandprompt.com> wrote: >>> >>>> So you'd have to disable HOT updates when true serializability was >>>> active? >>> >>> I thought about that, but I don't think so. HOT only applies to >>> updates, and predicate locking only applies to inserts. Unless I have >>> my head in the sand? >> >> Err, no, wait. Predicate locking can apply to updates, but since HOT >> updates never update an indexed column, I think we might still be OK? > > A predicate can include columns from an index plus others. > Am I missing something? This whole concept ("next-key locking") also applies in case there are no indexes. In the case of a table scan, the "next key" is either the next row relative to the scanned range (if the DBMS supports the notion of non-full table scans, for example if the table contents are themselves stored in sorted order), or something that indicates that the whole table was scanned (i.e., a table lock). Therefore, with next-key locking you better don't have too many table scans if you want to have any concurrent transactions. Nicolas
On 16.12.09 16:40 , Kevin Grittner wrote: > Nicolas Barbier<nicolas.barbier@gmail.com> wrote: > >> I am not sure whether the serialization failures that it may cause >> are dependent on the plan used. > > They are. But so are failures due to deadlocks even today, no? The processing order of UPDATES which involve joins isn't any more well-defined that the order of rows returned by a similarly complex select I think. Actually, I think the whole SIREAD-lock idea can be seen as a kind of 2PL with opportunistic locking and deferred deadlock detection. Instead of making readers and writers block each other, you let them proceed in parallel, and check if that resulted in any mutual toe-stepping later. The surprising part is that SIREAD locks and those inConflict, outConflict flags actually provide enough information to detect possible problems. Or at least this is the idea I got after skipping through the thesis for an hour or so. best regards, Florian Pflug
"Albe Laurenz" wrote: > I would say that in the case of a table scan, the whole table will > be SILOCKed. I guess that's pretty much unavoidable if you want > serializability. Agreed. -Kevin
Nicolas Barbier wrote: > Therefore, with next-key locking you better don't have too many table > scans if you want to have any concurrent transactions. Well, I would say that you don't want too many table scans on heavily updated tables if you don't want too many serialization failures. Keep in mind that the SIREAD locks would not block anything. A dangerous structure, involving two adjacent rw dependencies, must be found before anything is rolled back. -Kevin
* Florian Pflug: > On 16.12.09 16:40 , Kevin Grittner wrote: >> Nicolas Barbier<nicolas.barbier@gmail.com> wrote: >> >>> I am not sure whether the serialization failures that it may cause >>> are dependent on the plan used. >> >> They are. > > But so are failures due to deadlocks even today, no? They are detected. In this context, "serialization failure" means that PostgreSQL generates a history which lacks one-copy serializability, without reporting any errors. (In the general case, the unique constraint violation which bugs me personally is a different beast and does result in an error.) -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99
2009/12/18 Florian Weimer <fweimer@bfk.de>: > * Florian Pflug: > >> On 16.12.09 16:40 , Kevin Grittner wrote: >> >>> Nicolas Barbier<nicolas.barbier@gmail.com> wrote: >>> >>>> I am not sure whether the serialization failures that it may cause >>>> are dependent on the plan used. >>> >>> They are. >> >> But so are failures due to deadlocks even today, no? > > They are detected. In this context, "serialization failure" means > that PostgreSQL generates a history which lacks one-copy > serializability, without reporting any errors. (In the general case, > the unique constraint violation which bugs me personally is a > different beast and does result in an error.) FYI (hoping to avoid confusion): When I used the term "serialization failure" above, I surely meant the kind of failures that would be detected by the new optimistic algorithm. I would guess that currently, whether deadlocks can be triggered by a set of transactions (i.e., sequences of SQL statements) depends on the plan only marginally*. E.g., if two plans happen to use the same index, rows may always get locked in the same order by "FOR UPDATE", thus preventing certain deadlocks; if the plans were those deadlocks might become possible. Therefore, I don't think that it is currently very typical for plan-changes to trigger a massive change in the number of deadlocks that happen. The new method might change that property. This instability problem is often seen on DBMSs that use 2PL on "blocks read" or "rows inspected" as their main concurrency control mechanism (e.g., MS-SQL). It is mostly not seen on DBMSs that use MVCC (because no locks are taken that depend on the peculiarities of the plan; see caveat above at [*]), and would also not occur when one would use the most literal implementation of predicate locking (because the locks taken only depend on the SQL statements' conditions and not on the plan). Nicolas
On 18.12.09 16:42 , Florian Weimer wrote: > * Florian Pflug: >> On 16.12.09 16:40 , Kevin Grittner wrote: >>> Nicolas Barbier<nicolas.barbier@gmail.com> wrote: >>> >>>> I am not sure whether the serialization failures that it may cause >>>> are dependent on the plan used. >>> >>> They are. >> >> But so are failures due to deadlocks even today, no? > > They are detected. In this context, "serialization failure" means > that PostgreSQL generates a history which lacks one-copy > serializability, without reporting any errors. No, the whole point of this SIREAD-lock technique is to prevent that once and for all, and make SERIALIZABLE transaction really serializable (or fail with a "serialization error"). > (In the general case, > the unique constraint violation which bugs me personally is a > different beast and does result in an error.) I'm not sure I understand what you are referring to here. best regards, Florian Pflug
On 18.12.09 17:33 , Nicolas Barbier wrote: > I would guess that currently, whether deadlocks can be triggered by > a set of transactions (i.e., sequences of SQL statements) depends on > the plan only marginally*. E.g., if two plans happen to use the same > index, rows may always get locked in the same order by "FOR UPDATE", > thus preventing certain deadlocks; if the plans were those deadlocks > might become possible. > > Therefore, I don't think that it is currently very typical for > plan-changes to trigger a massive change in the number of deadlocks > that happen. The new method might change that property. Hm, I think that's true if you assume that most application issue pretty complex SELECT statements but only either pretty simple UPDATEs/DELETEs, or complex ones but only seldomly. Once you start hitting a table with a lot of concurrent UPDATEs/DELETES involving joins and subselects, the picture changes considerably I think. I must admit, however, that it's hard to imagine a real application that actually does this, though... But actually, now that I think about this, I fail to see why the false-positive serialization error the SIREAD-lock approach generates would depend on the plan. The existance or non-existance of rw dependencies does *not* depend on whether the read or write *physically* happens first, only on their logical ordering (T1 read an item that T2 changed, but T2 did not commit before T1 took it's snapshot). Plus, the way I read the thesis, the false positives of the SIREAD-lock approach has nothing to do with the SIREAD locks per se. They are introduced by the approximate way in which circles contained in the transaction's dependency graph are detected (the inConflict, outConflict business). best regards, Florian Pflug
Florian Weimer <fweimer@bfk.de> wrote: > * Florian Pflug: >> On 16.12.09 16:40 , Kevin Grittner wrote: >>> Nicolas Barbier<nicolas.barbier@gmail.com> wrote: >>> >>>> I am not sure whether the serialization failures that it may >>>> cause are dependent on the plan used. >>> >>> They are. >> >> But so are failures due to deadlocks even today, no? > > They are detected. In this context, "serialization failure" means > that PostgreSQL generates a history which lacks one-copy > serializability, without reporting any errors. (In the general > case, the unique constraint violation which bugs me personally is > a different beast and does result in an error.) I don't understand what you're saying here. Could you rephrase or expand on this? Thanks, -Kevin
I wrote: > [for a description of traditional techniques for providing various > isolation levels, including serializable], Dr. Cahill seemed to > like (Hellerstein et al., 2007) If anyone else is interested in this paper, here is additional information: Architecture of a Database System. (Joseph M. Hellerstein, Michael Stonebraker and James Hamilton). Foundations and Trends in Databases 1(2). http://db.cs.berkeley.edu/papers/fntdb07-architecture.pdf It covers a lot of ground, not just locking and latching issues. While the discussion seems very good and very clear, it doesn't get down to low level locking details -- instead referring people to: J. Gray, R. A. Lorie, G. R. Putzolu, and I. L. Traiger, *Granularity of locks and degrees of consistency in a shared data base,* in IFIP Working Conference on Modelling in Data Base Management Systems, pp. 365*394, 1976. http://www.seas.upenn.edu/~zives/05s/cis650/papers/granularity-locks.pdf This 1976 paper is the one which gets down to the nitty gritty details of how to effectively implement predicate locking with reasonable performance using index range locks, etc. This is the paper which should cover most of the questions people raise on this list where Dr. Cahill has just assumed that the "traditional" techniques he seeks to improve upon are well known to his audience. These techniques, or some variation on them, have been implemented in almost every database I've used or investigated. -Kevin