Re: Replication documentation addition - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Replication documentation addition |
Date | |
Msg-id | 200610251931.k9PJV9416962@momjian.us Whole thread Raw |
In response to | Re: Replication documentation addition (Richard Troy <rtroy@ScienceTools.com>) |
Responses |
Re: Replication documentation addition
|
List | pgsql-hackers |
Richard Troy wrote: > > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > > > ...Read the document, as promissed... > > First paragraph, "(fail over)" is inconsistent with title, "failover", as > are other spots throughout the document. The whole document should be > consistent and I vote for "failover" and not "fail over." OK. Fixed to "failover" > Fourth paragraph, "This "sync problem" is the fundamental difficulty for > servers working together"; "Sync problem" hasn't been defined. Actually, > you're talking about the consistent attribute of the "acid" properties of > all competent databases: Atomic, Consistency, Isolation, and Durability. > At least define the term you are using - probably most easily done in the > preceeding paragraph. OK, "sync problem" term removed, and spelled out fully. > The fifth paragraph needs a lot more help, I think. Howabout this > alternative: > > So called "two phaised commit" was developed as a strategy in which two or > more databases are updated simultaneously and none of the data is > committed until all are committed. This guarantees consistency between the > databases with all propagation delay being absorbed by the writer at write > time. There are times when this propagation delay is large, so sometimes > alternatives are worked out which we'll call here "asynchronous updates," > however, in these cases, there is always a window of time in which some > transaction can be lost should a failure occurr. For this reason, > asynchronous updates are only used when the possibility of such losses is > acceptible. I have modified the paragraph to use some of your terms. > Paragraphs six through to "shared disk failover" seem very awkward to me. > I don't like them at all. > > "Shared disk failover" has nothing to do with "the sync problem" as it's > not a multiple-database solution. It's an uptime, "24 X 7 X 365" issue. > Further, it also has nothing to do with disk arrays, though it is often > used with RAID to help avoid disk based corruption problems. Yes, please see updated version. I removed the sync problem term from there. > The point about Warm Standby needs to include a warning about WAL that it > MUST be sensitive to the semantics of the database design or else it's > fatally flawed. I'm talking about "referential integrety". That is to say, > it's inappropriate to capture updates on a table by table basis, as some > such systems do, (I have no idea what's done by anyone in the PG world on > this right now) because an update to one table (esp. inserts) very often > go hand in glove with updates in other tables and to get one without the > other can corrupt a database. We don't have that problem. We recover only full transactions. > The description of "Continuously running replication server" should > include the critical caveat - repeated if you think it's already said > elsewhere - that it is ONLY suitable for applications in which a loss of > (missing) update data doesn't matter. For example, an airline reservation > system would be an inappropriate application for such a "solution" because > what seats are available cannot be guaranteed to be correct. I have added note about data loss for the Slony item. > Regarding data partitioning, I strongly disagree with the opening sentence > in that it doesn't split a database into sets, it splits tables into sets. OK, changed. > Data partitioning is often done within a single database on a single > server and therefore, as a concept, has nothing whatsoever to do with > different servers. Similarly, the second paragraph of this section is Uh, why would someone split things up like that on a single server? > problematic. Please define your term first, then talk about some > implementations - this is muddying the water. Further, there are both > vertical and horizontal partitioning - you mention neither - and each has > its own distinct uses. If partitioning is mentioned, it should be more > complete. Uh, what exactly needs to be defined. > Next, Query Broadcast Load Balancing... also needs a lot of work. First, > it's foremost in my memory that sending read queries everywhere and > returning the first result set back is a key way to improve application > performance at the cost of additional load on other systems - I guess > that's not at all what the document is after here, but it's a worthy part > of a dialogue on broadcasting queries. In other words, this has more parts > to it than just what the document now entertains. Secondly, the document Uh, do we want to go into that here? I guess I could. > doesn't address _at_all_ whether this is a two-phaise-commit environment > or not. If not, how are updates managed? If each server operates > independently and one of them fails, what do you do then? How do you know > _any_ server got an insert/update? ... Each server _can't_ operate > independently unless the application does its own insert/update commits to > every one of them - and that can't be fast, nor does it load balance, > though it may contribute to superior uptime performance by the > application. I think having the application middle layer do the commits is how it works now. Can someone explain how pgpool works, or should we mention how two-phase commit has to be done here? pgpool2 has additional features. > Next up; I'm not aware of any current products or projects that provide > parallel query execution, though Informix might - I can ask a colleague or > two. Either way, it's probably best to simply define the term (perhaps in > a little more detail), and not mention solutions - they change with time > anyway. Actually, Bizgres MPP, based on PostgreSQL, does this, but mostly for read-only queries. > While I've never used Oracle's clustering tools, I've read up on them and > have customers who use them, and I think this description of Oracle > clustering is a mis-read on what the Oracle system actually does. A check > with a true Oracle clustering expert is in order here. OK, would someone please comment? > Hope this helps. If asked, I'm willing to (re)write some of the bits > discussed above. Yes, please review the URL and let me know what else to change. Thanks. -- Bruce Momjian bruce@momjian.us EnterpriseDB http://www.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: