Home > mailing lists

Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture - Mailing list pgsql-hackers

From	Steve Singer
Subject	Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture
Date	June 16, 2012 16:03:40
Msg-id	BLU0-SMTP11640920EE4C34A020A443DCFA0@phx.gbl Whole thread Raw
In response to	Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture
List	pgsql-hackers

Tree view

On 12-06-15 04:03 PM, Robert Haas wrote:
> On Thu, Jun 14, 2012 at 4:13 PM, Andres Freund<andres@2ndquadrant.com>  wrote:
>> I don't plan to throw in loads of conflict resolution smarts. The aim is to get
>> to the place where all the infrastructure is there so that a MM solution can
>> be built by basically plugging in a conflict resolution mechanism. Maybe
>> providing a very simple one.
>> I think without in-core support its really, really hard to build a sensible MM
>> implementation. Which doesn't mean it has to live entirely in core.
> Of course, several people have already done it, perhaps most notably Bucardo.
>
> Anyway, it would be good to get opinions from more people here.  I am
> sure I am not the only person with an opinion on the appropriateness
> of trying to build a multi-master replication solution in core or,
> indeed, the only person with an opinion on any of these other issues.

This sounds like a good place for me to chime in.

I feel that in-core support to capture changes and turn them into change 
records that can be replayed on other databases, without relying on 
triggers and log tables, would be good to have.

I think we want some flexible enough that people write consumers of the 
LCRs to do conflict resolution for multi-master but I am not sure that 
the conflict resolution support actually belongs in core.

Most of the complexity of slony (both in terms of lines of code, and 
issues people encounter using it) comes not from the log triggers or 
replay of the logged data but comes from the configuration of the cluster.
Controlling things like

* Which tables replicate from a node to which other nodes
* How do you change the cluster configuration on a running system 
(adding nodes, removing nodes, moving the origin of a table, adding 
tables to replication etc...)

This is the harder part of the problem, I think we need to first get the 
infrastructure committed (that the current patch set deals with) to 
capturing, transporting and translating the LCR's into the system before 
get too caught up in the configuration aspects.   I think we will have a 
hard time agreeing on behaviours for some of that other stuff that are 
both flexible for enough use cases and simple enough for 
administrators.  I'd like to see in-core support for a lot of that stuff 
but I'm not holding my breath.

> It is not good for those other opinions to be saved for a later date.
>
>> Hm. Yes, you could do that. But I have to say I don't really see a point.
>> Maybe the fact that I do envision multimaster systems at some point is
>> clouding my judgement though as its far less easy in that case.
> Why?  I don't think that particularly changes anything.
>
>> It also complicates the wal format as you now need to specify whether you
>> transport a full or a primary-key only tuple...
> Why?  If the schemas are in sync, the target knows what the PK is
> perfectly well.  If not, you're probably in trouble anyway.
>

>> I think though that we do not want to enforce that mode of operation for
>> tightly coupled instances. For those I was thinking of using command triggers
>> to synchronize the catalogs.
>> One of the big screwups of the current replication solutions is exactly that
>> you cannot sensibly do DDL which is not a big problem if you have a huge
>> system with loads of different databases and very knowledgeable people et al.
>> but at the beginning it really sucks. I have no problem with making one of the
>> nodes the "schema master" in that case.
>> Also I would like to avoid the overhead of the proxy instance for use-cases
>> where you really want one node replicated as fully as possible with the slight
>> exception of being able to have summing tables, different indexes et al.
> In my view, a logical replication solution is precisely one in which
> the catalogs don't need to be in sync.  If the catalogs have to be in
> sync, it's not logical replication.  ISTM that what you're talking
> about is sort of a hybrid between physical replication (pages) and
> logical replication (tuples) - you want to ship around raw binary
> tuple data, but not entire pages.  The problem with that is it's going
> to be tough to make robust.  Users could easily end up with answers
> that are total nonsense, or probably even crash the server.
>

I see three catalogs in play here.
1. The catalog on the origin
2. The catalog on the proxy system (this is the catalog used to 
translate the WAL records to LCR's).  The proxy system will need 
essentially the same pgsql binaries (same architecture, important 
complie flags etc..) as the origin
3. The catalog on the destination system(s).

The catalog 2 must be in sync with catalog 1, catalog 3 shouldn't need 
to be in-sync with catalog 1.   I think catalogs 2 and 3 are combined in 
the current patch set (though I haven't yet looked at the code 
closely).   I think the performance optimizations Andres has implemented 
to update tuples through low-level functions should be left for later 
and that we should  be generating SQL in the apply cache so we don't 
start assuming much about catalog 3.

> guarantee.  And, without such a guarantee, I don't believe that we can
> create a high-performance, robust, in-core replication solution.
>
>
Part of what people expect from a robust in-core solution is that it 
should work with the the other in-core features.  If we have to list a 
bunch of in-core type as being incompatible with logical replication 
then people will look at logical replication with the same 'there be 
dragons here' attitude that scare many people away from the existing 
third party replication solutions.   Non-core or third party user 
defined types are a slightly different matter because we can't control 
what they do.

Steve

pgsql-hackers by date:

From: Daniel Farina
Date: 16 June 2012, 15:57:01
Subject: Re: Streaming-only Remastering

From: Tom Lane
Date: 16 June 2012, 17:19:02
Subject: Re: SQL standard changed behavior of ON UPDATE SET NULL/SET DEFAULT?

Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture - Mailing list pgsql-hackers

Previous

Next