Re: eXtensible Transaction Manager API - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: eXtensible Transaction Manager API |
Date | |
Msg-id | 56405B9F.6030406@postgrespro.ru Whole thread Raw |
In response to | Re: eXtensible Transaction Manager API (Amit Kapila <amit.kapila16@gmail.com>) |
Responses |
Re: eXtensible Transaction Manager API
|
List | pgsql-hackers |
On 09.11.2015 07:46, Amit Kapila wrote:
On Sat, Nov 7, 2015 at 10:23 PM, Konstantin Knizhnik <k.knizhnik@postgrespro.ru> wrote:Lock manager is one of the tasks we are currently working on.
There are still a lot of open questions:
1. Should distributed lock manager (DLM) do something else except detection of distributed deadlock?I think so. Basically DLM should be responsible for maintainingall the lock information which inturn means that any backend processthat needs to acquire/release lock needs to interact with DLM, without thatI don't think even global deadlock detection can work (to detect deadlocksamong all the nodes, it needs to know the lock info of all nodes).
I hope that it will not needed, otherwise it will add significant performance penalty.
Unless I missed something, locks can still managed locally, but we need DLM to detect global deadlocks.
Deadlock detection in PostgreSQL is performed only after timeout expiration for lock waiting. Only then lock graph is analyzed for presence of loops. At this moment we can send our local lock graph to DLM. If it is really distributed deadlock, then sooner or later
all nodes involved in this deadlock will send their local graphs to DLM and DLM will be able to build global graph and detect distributed deadlock. In this scenario DLM will be accessed very rarely - only when lock can not be granted within deadlock_timeout.
One of the possible problems is false global deadlock detection, because local lock graphs corresponds to different moments of time, so we can found "false" loop. I am still thinking about best solution of this problem.
This is somewhat inline with what currently we do during lock conflicts,i.e if the backend incur a lock conflict and decides to sleep, beforesleeping it tries to detect an early deadlock, so if we make DLM responsiblefor managing locks the same could be even achieved in distributed system.2. Should DLM be part of XTM API or it should be separate API?We might be able to do it either ways (make it part of XTM API or devise aseparate API). I think here more important point is to first get the high leveldesign for Distributed Transactions (which primarily includes consistentCommit/Rollback, snapshots, distributed lock manager (DLM) and recovery).3. Should DLM be implemented by separate process or should it be part of arbiter (dtmd).That's important decision. I think it will depend on which kind of designwe choose for distributed transaction manager (arbiter based solution ornon-arbiter based solution, something like tsDTM). I think DLM should beseparate, else arbiter will become hot-spot with respect to contention.
There are pros and contras.
Pros for integrating DLM in DTM:
1. DTM arbiter has information about local to global transaction ID mapping which may be needed to DLM
2. If my assumptions about DLM are correct, then it will be accessed relatively rarely and should not cause significant
impact on performance.
Contras:
1. tsDTM doesn't need centralized arbiter but still needs DLM
2. Logically DLM and DTM are independent components
Can you please explain more about tsDTM approach, how timestampsare used, what is exactly CSN (is it Commit Sequence Number) andhow it is used in prepare phase? IS CSN used as timestamp?Is the coordinator also one of the PostgreSQL instance?
In tsDTM approach system time (in microseconds) is used as CSN (commit sequence number).
We also enforce that assigned CSN is unique and monotonic within PostgreSQL instance.
CSN are assigned locally and do not require interaction with some other cluster nodes.
This is why in theory tsDTM approach should provide good scalability.
I think in this patch, it is important to see the completeness of all theAPI's that needs to be exposed for the implementation of distributedtransactions and the same is difficult to visualize without having completepicture of all the components that has some interaction with the distributedtransaction system. On the other hand we can do it in incremental fashionas and when more parts of the design are clear.
That is exactly what we are going to do - we are trying to integrate DTM with existed systems (pg_shard, postgres_fdw, BDR) and find out what is missed and should be added. In parallel we are trying to compare efficiency and scalability of different solutions.One thing, I have noticed that in DTM approach, you seems tobe considering a centralized (Arbiter) transaction manager andcentralized Lock manager which seems to be workable approach,but I think such an approach won't scale in write-heavy or mixedread-write workload. Have you thought about distributing responsibilityof global transaction and lock management?
From my point of view there are two different scenarios:
1. When most transactions are local and only few of them are global (for example most operation in branch of the back are
performed with accounts of clients of this branch, but there are few transactions involved accounts from different branches).
2. When most or all transactions are global.
It seems to me that first approach is more popular in real life and actually good performance of distributed system can be achieved only in case when most transaction are local (involves only one node). There are several approaches allowing to optimize local transactions. For example once used in SAP HANA (http://pi3.informatik.uni-mannheim.de/~norman/dsi_jour_2014.pdf)
We have also DTM implementation based on this approach but it is not yet working.
If most of transaction are global, them affect random subsets of cluster nodes (so it is not possible to logically split cluster into groups of tightly coupled nodes) and number of nodes is not very large (<10) then I do not think that there can be better alternative (from performance point of view) than centralized arbiter.
But it is only my speculations and it will be really very interesting for me to know access patterns of real customers, using distributed systems.
I think such a systemmight be somewhat difficult to design, but the scalability will be better.
pgsql-hackers by date: