Global Deadlock Information - Mailing list pgsql-cluster-hackers
From | Markus Wanner |
---|---|
Subject | Global Deadlock Information |
Date | |
Msg-id | 4B6D329E.6050308@bluegap.ch Whole thread Raw |
Responses |
Re: Global Deadlock Information
|
List | pgsql-cluster-hackers |
Hi, I'd like to start a thread for discussion of the second item on the ClusterFeatures [1] list: Global Deadlock Information. IIRC there are two aspects to this item: a) the plain notification of a deadlock and b) some way to control or intercept deadlock resolution. The problem this item seems to address is the potential for deadlocks between transactions on different nodes. Or put another way: between a local transaction and one that's to be applied from a remote node (or even between two remote ones - similar issue, though). To ensure congruency between nodes, they must take the same measures to resolve the deadlock, i.e. abort the same transaction(s). I certainly disagree with the statement on the wiki that the "statement_timeout is the way to avoid global deadlocks", because I don't want to have to wait that long until a deadlock gets resolved. Further it doesn't even guarantee congruency, depending on the implementation of your clustering solution. I fail to see how a plain notification API would help much. After all, this could result in one node notifying having aborted transaction A to resolve a deadlock while another node notifies having aborted transaction B. You'd end up having to abort two (or more) transaction instead of just one to resolve a conflict. It could get more useful, if enabling such a notification would turn off the existing deadlock resolver and leave the resolution of the deadlock to the clustering solution. I'd call that an interception. Such an interception API should IMO provide a way to register a callback, which replaces the current deadlock resolver. Upon detection of a deadlock, the callback should get a list of transaction ids that are part of the lock cycle. It's then up to that callback, to chose one and abort that to resolve the conflict. And now, Greg's List: > 1) What feature does this help add from a user perspective? Preventing cluster-wide deadlocks (while maintaining congruency of replicas). > 2) Which replication projects would be expected to see an improvement > from this addition? I suspect all multi-master solutions are affected, certainly Postgres-R would benefit. Single-master ones certainly don't need it. > 3) What makes it difficult to implement? I don't see any real stumbling block. Deciding on an API needs consensus. > 4) Are there any other items on the list this depends on, or that it > is expected to have a significant positive/negative interaction with? Not that I know of. > 5) What replication projects include a feature like this already, or a > prototype of a similar one, that might be used as a proof of concept > or example implementation? Old Postgres-R versions once had such an interception, but it currently lacks a solution for this problem. I don't know of any other project that's already solved this. > 6) Who is already working on it/planning to work on it/needs it for > their related project? I'm not currently working on it and don't plan to do so (at least) until PgCon 2010. Cluster hackers, is this a good summary which covers your needs as well? Something missing? Regards Markus Wanner [1]: feature wish list of cluster hackers: http://wiki.postgresql.org/wiki/ClusterFeatures
pgsql-cluster-hackers by date: