Re: Feature Request for 7.5 - Mailing list pgsql-general
From | Jan Wieck |
---|---|
Subject | Re: Feature Request for 7.5 |
Date | |
Msg-id | 3FCDFF23.7060404@Yahoo.com Whole thread Raw |
In response to | Re: Feature Request for 7.5 ("Chris Travers" <chris@travelamericas.com>) |
Responses |
Re: Feature Request for 7.5
|
List | pgsql-general |
The following is more or less a brain-dump ... not finally thought out and not supposed to be considered a proposal at this time. The synchronous multi-master solution I have in mind needs a few currently non existent support features in the backend. One is non-blocking locks and another one is a callback mechanism just before marking a transaction in clog as committed. It will use reliable group communication (GC) that can guarantee total order. There is an AFTER trigger on all replicated tables. A daemon started for every database will create a number of threads/subprocesses. Each of these workers has his separate DB connection and is a member of a different group in the GC. The number of these groups determines the maximum number of concurrent UPDATE-transactions, the cluster can handle. At the first call of the trigger inside of a transaction (this is the first modifying statement), the trigger allocates one of the replication groups (possibly waiting for one to become free). It now communicates with one daemon thread on every database in the cluster. The triggers now send the replication data into this group. It is not necessary to wait for the other cluster members as long as the GC guarantees FIFO by sender. At the time the transaction commits, it sends a commit message into the group. This message has another service type level which is total order. It will wait now for all members in the replication group to reply with the same. When every member in the group replied, all agreed to commit and are just before stamping clog. Since the service type is total order, the GC guarantees that either all members get the messages in the same order, or if one cannot get a message a corresponding LEAVE message will be generated. Also, all the replication threads will use non-blocking locking. If any of them ever finds a locked row, it will send an ABORT message into the group, causing the whole group to roll back. This way, either all members of the group reach the "just before stamping clog" state together and know that everyone got there, or they will get an abort or leave message from any of their co-workers and roll back. There is a gap between reporting "ready" and really stamping clog in which a database might crash. This will cause all other cluster members to go ahead and commit while the crashed DB does not commit. But this is limited to crashes only and a restarting database must rejoin/resync with the cluster anyway and doubt its own data. So this is not really a problem. With this synchronous model, read only transactions can be handled on every node independently of replication at all - this is the scaling part. The total amount of UPDATE transactions is limited by the slowest cluster member and does not scale, but that is true for all synchronous solutions. Jan Chris Travers wrote: > Interesting feedback. > > It strikes me that, for many sorts of databases, multimaster synchronous > replication is not the best solution for the reasons that Scott, Jan, et. > al. have raised. I am wondering how commercial RDBMS's get arround this > problem? There are several possibilities that I can think of-- have a write > master, and many read-only slaves (available at the moment, iirc). > Replication could then occur at the tuple level using linked databases, > triggers, etc. Rewrite rules could then allow one to use the slaves to > "funnel" the queries back up to the master. It seems to me that latency > would be a killer on this sort of solution, though everything would > effectively occur on all databases in the same order, but recovering from a > crash of the master could be complicated and result in additional > downtime... > > The other solution (still not "guaranteed" to work in all cases) is that > every proxy could be hardwired to attempt to contact databases in a set > order. This would also avoid deadlocks. Note that if sufficient business > logic is built into the database, one would be guaranteed that a single > "consistent" view would be maintained at any given time (conflicts would > result in the minority of up to 50 percent of the servers needing to go > through the recovery process-- not killing uptime, but certainly killing > performance). > > However, it seems to me that the only solution for many of these databases > is to have a "cluster in a box" solution where you have a system comprised > entirely of redundent, hot-swapable hardware so that nearly anything can be > swapped out if it breaks. In this case, we should be able to just run > PostgreSQL as is.... > > > ---------------------------(end of broadcast)--------------------------- > TIP 6: Have you searched our list archives? > > http://archives.postgresql.org -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
pgsql-general by date: