Re: Global Deadlock Information - Mailing list pgsql-cluster-hackers
From | Satoshi Nagayasu |
---|---|
Subject | Re: Global Deadlock Information |
Date | |
Msg-id | 4B6D934E.1000204@gmail.com Whole thread Raw |
In response to | Global Deadlock Information (Markus Wanner <markus@bluegap.ch>) |
Responses |
Re: Global Deadlock Information
Re: Global Deadlock Information |
List | pgsql-cluster-hackers |
Hi Markus, I attempted in two ways to resolve global deadlock situation in the PostgresForest development. (1) Use the lock_timeout to avoid from a global deadlock. The lock_timeout feature is a very simple way to avoid from the global deadlock situation. I disagree "statement_timeout is the way to avoid global deadlocks" too, because the statement_timeout kills the healthy/long-running transaction by its timeout. Some developers (including me!) proposed the lock_timeout GUC option. http://archives.postgresql.org/pgsql-hackers/2004-06/msg00935.php http://archives.postgresql.org/pgsql-hackers/2010-01/msg01167.php I still believe the "lock timeout" feature could help resolving a global deadlock in the cluster environment. (2) Use the global wait-for graph to detect a global deadlock. I had an experimental implemetation to use the global wait-for graph to prevent the global deadlock. http://en.wikipedia.org/wiki/Deadlock#Distributed_deadlock I used the node(server) identifiers and the pg_locks information to build the global wait-for graph, and the kill signal (or pg_cancel()?) to abort a victim transaction causing the deadlock. I don't think the callback function is needed to replace the current deadlock resolution feature, but I agree we need a consensus how we could avoid the global deadlock situation in the cluster. Thanks, On 2010/02/06 18:13, Markus Wanner wrote: > Hi, > > I'd like to start a thread for discussion of the second item on the > ClusterFeatures [1] list: Global Deadlock Information. > > IIRC there are two aspects to this item: a) the plain notification of a > deadlock and b) some way to control or intercept deadlock resolution. > > The problem this item seems to address is the potential for deadlocks > between transactions on different nodes. Or put another way: between a > local transaction and one that's to be applied from a remote node (or > even between two remote ones - similar issue, though). To ensure > congruency between nodes, they must take the same measures to resolve > the deadlock, i.e. abort the same transaction(s). > > I certainly disagree with the statement on the wiki that the > "statement_timeout is the way to avoid global deadlocks", because I > don't want to have to wait that long until a deadlock gets resolved. > Further it doesn't even guarantee congruency, depending on the > implementation of your clustering solution. > > I fail to see how a plain notification API would help much. After all, > this could result in one node notifying having aborted transaction A to > resolve a deadlock while another node notifies having aborted > transaction B. You'd end up having to abort two (or more) transaction > instead of just one to resolve a conflict. > > It could get more useful, if enabling such a notification would turn off > the existing deadlock resolver and leave the resolution of the deadlock > to the clustering solution. I'd call that an interception. > > Such an interception API should IMO provide a way to register a > callback, which replaces the current deadlock resolver. Upon detection > of a deadlock, the callback should get a list of transaction ids that > are part of the lock cycle. It's then up to that callback, to chose one > and abort that to resolve the conflict. > > And now, Greg's List: > > 1) What feature does this help add from a user perspective? > > Preventing cluster-wide deadlocks (while maintaining congruency of > replicas). > > > 2) Which replication projects would be expected to see an improvement > > from this addition? > > I suspect all multi-master solutions are affected, certainly Postgres-R > would benefit. Single-master ones certainly don't need it. > > > 3) What makes it difficult to implement? > > I don't see any real stumbling block. Deciding on an API needs consensus. > > > 4) Are there any other items on the list this depends on, or that it > > is expected to have a significant positive/negative interaction with? > > Not that I know of. > > > 5) What replication projects include a feature like this already, or a > > prototype of a similar one, that might be used as a proof of concept > > or example implementation? > > Old Postgres-R versions once had such an interception, but it currently > lacks a solution for this problem. I don't know of any other project > that's already solved this. > > > 6) Who is already working on it/planning to work on it/needs it for > > their related project? > > I'm not currently working on it and don't plan to do so (at least) until > PgCon 2010. > > > Cluster hackers, is this a good summary which covers your needs as well? > Something missing? > > Regards > > Markus Wanner > > [1]: feature wish list of cluster hackers: > http://wiki.postgresql.org/wiki/ClusterFeatures > > -- NAGAYASU Satoshi <satoshi.nagayasu@gmail.com>
pgsql-cluster-hackers by date: