Home > mailing lists

Re: Replication documentation addition - Mailing list pgsql-hackers

From	Richard Troy
Subject	Re: Replication documentation addition
Date	October 25, 2006 13:14:07
Msg-id	Pine.LNX.4.33.0610250919130.30114-100000@denzel.in Whole thread Raw
In response to	Re: Replication documentation addition (Bruce Momjian <bruce@momjian.us>)
Responses	Re: Replication documentation addition
List	pgsql-hackers

Tree view

Hi Hannu, everyone,

I apologize for not having read the document in question - will do
shortly. My comments are brought about by the dialogue I read on list this
morning...

> > Here is a new replication documentation section I want to add for 8.2:
> >
> >     ftp://momjian.us/pub/postgresql/mypatches/replication
>

> > Data Partitioning
> > -----------------
> >
> > Data partitioning splits the database into data sets.  To achieve
> > replication, each data set can only be modified by one server.  For
> > example, data can be partitioned by offices, e.g. London and Paris.
> > While London and Paris servers have all data records, only London can
> > modify London records, and Paris can only modify Paris records.  Such
> > partitioning is usually accomplished in application code, though rules
> > and triggers can help enforce partitioning and keep the read-only data
> > sets current.  Slony can also be used in such a setup.  While Slony
> > replicates only entire tables, London and Paris can be placed in
> > separate tables, and inheritance can be used to access from both tables
> > using a single table name.
>
> Maybe another use of partitioning should also be mentioned. That is ,
> when partitioning is used to overcome limitations of single servers
> (especially IO and memory, but also CPU), and only a subset of data is
> stored and processed on each server.

> > I think the "official" term for this kind of "replication" is
> > Shared-Nothing Clustering.

"Data partitioning" has two fundamental flavors, "horizontal" and
"vertical", quite a handful of implementations, and even more motivations
behind why one uses either strategy and whatever implementation. The same
is true for "clustering" - a few fundamental strategies, with a larger
number of implementations and yet more motivations. Replication,
meanwhile, is yet another beast altogether, sharing the same fundamentals
of multiple flavors, implementations and motivations.  I strongly urge
keeping any documentation on these (and related) topics strictly distinct
and separate.

In my view, one should define the terms first, separately, distinctly, and
as succinctly as possible, and, following this, a dialogue on how these
may be combined can be entertained. The definitions of each should be both
complete and academic in flavor and may include implementation and
motivational  information, but never "muddy the water" by mixing with
other concepts - not yet, not until after all the fundamentals have been
introduced.

I don't know much about what PostgreSql has been doing in these areas of
late - nothing, I gather from someone's post this morning - but I'll try
to help out as I can with a paragraph or two - whatever you want,
whatever's welcome - as "I was there" when Randy Eash created the first
commercial RDBMS replicator - for Ingres - and since I created the first
commercial RDBMS front-end failover technology, also for Ingres, so I have
a pretty good handle on all the issues.

Also, I liked what Markus Schiltknecht wrote, but will have to read the
original before I can comment on his specific points.

>> I am not inclined to add commercial offerings.  If people wanted
>> commercial database offerings, they can get them from companies that
>> advertize.  People are coming to PostgreSQL for open source solutions,
>> and I think mentioning commercial ones doesn't make sense.
>>
>> If we are to add them, I need to hear that from people who haven't
>> worked in PostgreSQL commerical replication companies.
>
> I'm not coming to PostgreSQL for open source solutions. I'm coming
> to PostgreSQL for _good_ solutions.
>
> I want to see what solutions might be available for a problem I have.
> I certainly want to know whether they're freely available, commercial
> or some flavour of open source, but I'd like to know about all of them.
>
> A big part of the value of Postgresql is the applications and extensions
> that support it. Hiding the existence of some subset of those just
> because of the way they're licensed is both underselling postgresql
> and doing something of a disservice to the user of the document.

> If potential new users look through the docs and it says no options
> available for what they want or consider they will need in the future
> then they go elsewhere, if they know that some options are available
> then they will look further if they want that feature.


I agree that people look through the materials on the web site,
documentation especially, and make choices based upon what they see. Many
of us don't have time to spend a day searching the web for things we don't
even know exist. By including more information, more users will be
attracted to PostgreSql, whether it be in the documentation or web site. I
have been SURE that certain things must exist in the PG world, but haven't
known about them with certainty due to time constraints, but would gladly
point our customers at Postgres solutions if only I knew about them. Count
this paragraph as praise for doing _something_more_ to help get more
information to (prospective) users.

Consider someone like me; my company supports five RDBMSes, one of them
being Postgres. We are probably not unique in that we've written an SQL
dialect translator so we could write our own code in one code line to run
anywhere, against any RDBMS (it can learn new dialects) - or perhaps
others keep multiple code lines containing varriant dialects. Either way,
we "don't care" whether our customer has Oracle, or PostgreSql, so long as
they buy our stuff. But when our customers - or prospects - come to us
with a given scenario, the more we know about Postgres - and its community
- the more likely we can steer them to a PG solution, which we would
prefer anyway, for lots of reasons, historical, personal, and technical -
not to mention cost. The trouble is, Oracle, for example, has already told
them (sold them?) on whatever, and we need a rebuttal ready at hand or
they'll go with Oracle. We just don't have the time to fight that battle,
nor do we wish to risk the sale when we can work with Oracle just fine.

In sum, I agree with Tom Lane and the others who chimed in with "keep the
docs clean, use the web site for mentioning other projects/products." And
again I applaud this new effort.

Regards,
Richard

-- 
Richard Troy, Chief Scientist
Science Tools Corporation
510-924-1363 or 202-747-1263
rtroy@ScienceTools.com, http://ScienceTools.com/

pgsql-hackers by date:

From: Bruce Momjian
Date: 25 October 2006, 13:03:10
Subject: Re: [DOCS] Replication documentation addition

From: David Fetter
Date: 25 October 2006, 13:20:28
Subject: Re: [DOCS] Replication documentation addition

Re: Replication documentation addition - Mailing list pgsql-hackers

Previous

Next