Re: Replication documentation addition - Mailing list pgsql-hackers
From | Richard Troy |
---|---|
Subject | Re: Replication documentation addition |
Date | |
Msg-id | Pine.LNX.4.33.0610250919130.30114-100000@denzel.in Whole thread Raw |
In response to | Re: Replication documentation addition (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: Replication documentation addition
|
List | pgsql-hackers |
Hi Hannu, everyone, I apologize for not having read the document in question - will do shortly. My comments are brought about by the dialogue I read on list this morning... > > Here is a new replication documentation section I want to add for 8.2: > > > > ftp://momjian.us/pub/postgresql/mypatches/replication > > > Data Partitioning > > ----------------- > > > > Data partitioning splits the database into data sets. To achieve > > replication, each data set can only be modified by one server. For > > example, data can be partitioned by offices, e.g. London and Paris. > > While London and Paris servers have all data records, only London can > > modify London records, and Paris can only modify Paris records. Such > > partitioning is usually accomplished in application code, though rules > > and triggers can help enforce partitioning and keep the read-only data > > sets current. Slony can also be used in such a setup. While Slony > > replicates only entire tables, London and Paris can be placed in > > separate tables, and inheritance can be used to access from both tables > > using a single table name. > > Maybe another use of partitioning should also be mentioned. That is , > when partitioning is used to overcome limitations of single servers > (especially IO and memory, but also CPU), and only a subset of data is > stored and processed on each server. > > I think the "official" term for this kind of "replication" is > > Shared-Nothing Clustering. "Data partitioning" has two fundamental flavors, "horizontal" and "vertical", quite a handful of implementations, and even more motivations behind why one uses either strategy and whatever implementation. The same is true for "clustering" - a few fundamental strategies, with a larger number of implementations and yet more motivations. Replication, meanwhile, is yet another beast altogether, sharing the same fundamentals of multiple flavors, implementations and motivations. I strongly urge keeping any documentation on these (and related) topics strictly distinct and separate. In my view, one should define the terms first, separately, distinctly, and as succinctly as possible, and, following this, a dialogue on how these may be combined can be entertained. The definitions of each should be both complete and academic in flavor and may include implementation and motivational information, but never "muddy the water" by mixing with other concepts - not yet, not until after all the fundamentals have been introduced. I don't know much about what PostgreSql has been doing in these areas of late - nothing, I gather from someone's post this morning - but I'll try to help out as I can with a paragraph or two - whatever you want, whatever's welcome - as "I was there" when Randy Eash created the first commercial RDBMS replicator - for Ingres - and since I created the first commercial RDBMS front-end failover technology, also for Ingres, so I have a pretty good handle on all the issues. Also, I liked what Markus Schiltknecht wrote, but will have to read the original before I can comment on his specific points. >> I am not inclined to add commercial offerings. If people wanted >> commercial database offerings, they can get them from companies that >> advertize. People are coming to PostgreSQL for open source solutions, >> and I think mentioning commercial ones doesn't make sense. >> >> If we are to add them, I need to hear that from people who haven't >> worked in PostgreSQL commerical replication companies. > > I'm not coming to PostgreSQL for open source solutions. I'm coming > to PostgreSQL for _good_ solutions. > > I want to see what solutions might be available for a problem I have. > I certainly want to know whether they're freely available, commercial > or some flavour of open source, but I'd like to know about all of them. > > A big part of the value of Postgresql is the applications and extensions > that support it. Hiding the existence of some subset of those just > because of the way they're licensed is both underselling postgresql > and doing something of a disservice to the user of the document. > If potential new users look through the docs and it says no options > available for what they want or consider they will need in the future > then they go elsewhere, if they know that some options are available > then they will look further if they want that feature. I agree that people look through the materials on the web site, documentation especially, and make choices based upon what they see. Many of us don't have time to spend a day searching the web for things we don't even know exist. By including more information, more users will be attracted to PostgreSql, whether it be in the documentation or web site. I have been SURE that certain things must exist in the PG world, but haven't known about them with certainty due to time constraints, but would gladly point our customers at Postgres solutions if only I knew about them. Count this paragraph as praise for doing _something_more_ to help get more information to (prospective) users. Consider someone like me; my company supports five RDBMSes, one of them being Postgres. We are probably not unique in that we've written an SQL dialect translator so we could write our own code in one code line to run anywhere, against any RDBMS (it can learn new dialects) - or perhaps others keep multiple code lines containing varriant dialects. Either way, we "don't care" whether our customer has Oracle, or PostgreSql, so long as they buy our stuff. But when our customers - or prospects - come to us with a given scenario, the more we know about Postgres - and its community - the more likely we can steer them to a PG solution, which we would prefer anyway, for lots of reasons, historical, personal, and technical - not to mention cost. The trouble is, Oracle, for example, has already told them (sold them?) on whatever, and we need a rebuttal ready at hand or they'll go with Oracle. We just don't have the time to fight that battle, nor do we wish to risk the sale when we can work with Oracle just fine. In sum, I agree with Tom Lane and the others who chimed in with "keep the docs clean, use the web site for mentioning other projects/products." And again I applaud this new effort. Regards, Richard -- Richard Troy, Chief Scientist Science Tools Corporation 510-924-1363 or 202-747-1263 rtroy@ScienceTools.com, http://ScienceTools.com/
pgsql-hackers by date: