Re: Replication Ideas - Mailing list pgsql-general
From | Jan Wieck |
---|---|
Subject | Re: Replication Ideas |
Date | |
Msg-id | 3F4C0CAE.3030901@Yahoo.com Whole thread Raw |
In response to | Re: Replication Ideas (Chris Travers <chris@travelamericas.com>) |
Responses |
Re: Replication Ideas
|
List | pgsql-general |
WARNING: This is getting long ... Postgres-R is a very interesting and inspiring idea. And I've been kicking that concept around for a while now. What I don't like about it is that it requires fundamental changes in the lock mechanism and that it is based on the assumption of very low lock conflict. <explain-PG-R> In Postgres-R a committing transaction sends it's workset (WS - a list of all updates done in this transaction) to the group communication system (GC). The GC guarantees total order, meaning that all nodes will receive all WSs in the same order, no matter how they have been sent. If a node receives back it's own WS before any error occured, it goes ahead and finalizes the commit. If it receives a foreign WS, it has to apply the whole WS and commit it before it can process anything else. If now a local transaction, in progress or while waiting for it's WS to come back, holds a lock that is required to process such remote WS, the local transaction needs to be aborted to unlock it's resources ... it lost the total order race. </explain-PG-R> Postgres-R requires that all remote WSs are applied and committed before a local transaction can commit. Otherwise it couldn't correctly detect a lock conflict. So there will not be any read ahead. And since the total order really counts here, it cannot apply any two remote WSs in parallel, a race condition could possibly exist and a later WS in the total order runs faster and locks up a previous one, so we have to squeeze all remote WSs through one single replication work process. And all the locally parallel executed transactions that wait for their WSs to come back have to wait until that poor little worker is done with the whole pile. Bye bye concurrency. And I don't know how the GC will deal with the backlog either. Could well choke on it. I do not see how this will scale well in a multi-SMP-system cluster. At least the serialization of WSs will become a horror if there is significant lock contention like in a standard TPC-C on the district row containing the order number counter. I don't know for sure, but I suspect that with this kind of bottleneck, Postgres-R will have to rollback more than 50% of it's transactions when there are more than 4 nodes under heavy load (like in a benchmark run). That will suck ... But ... initially I said that it is an inspiring concept ... soooo ... I am currently hacking around with some C+PL/TclU+Spread constructs that might form a rude kind of prototype creature. My changes to the Postgres-R concept are that there will be as many replicating slave processes as there are in summary masters out in the cluster ... yes, it will try to utilize all the CPU's in the cluster! For failover reliability, A committing transaction will hold before finalizing the commit and send it's "I'm ready" to the GC. Every replicator that reaches the same state send's "I'm ready" too. Spread guarantees in SAFE_MESS mode that messages are delivered to all nodes in a group or that at least LEAVE/DISCONNECT messages are deliverd before. So if a node receives more than 50% of "I'm ready", there would be a very small gap where multiple nodes have to fail in the same split second so that the majority of nodes does NOT commit. A node that reported "I'm ready" but lost more than 50% of the cluster before committing has to rollback and rejoin or wait for operator intervention. Now the idea is to split up the communication into GC distribution groups per transaction. So working master backends and associated replication backends will join/leave a unique group for every transaction in the cluster. This way, the per process communication is reduced to the required minimum. As said, I am hacking on some code ... Jan Chris Travers wrote: > Tom Lane wrote: > >>Chris Travers <chris@travelamericas.com> writes: >> >> >>>Yes I have. Postgres-r is not a high-availability solution which is >>>capable of transparent failover, >>> >>> >> >>What makes you say that? My understanding is it's supposed to survive >>loss of individual servers. >> >> regards, tom lane >> >> >> >> > My mistake. I must have gotten them confused with another > (asynchronous) replication project. > > Best Wishes, > Chris Travers > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: the planner will ignore your desire to choose an index scan if your > joining column's datatypes do not match -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
pgsql-general by date: