Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture - Mailing list pgsql-hackers
From | Steve Singer |
---|---|
Subject | Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture |
Date | |
Msg-id | BLU0-SMTP11640920EE4C34A020A443DCFA0@phx.gbl Whole thread Raw |
In response to | Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: [RFC][PATCH] Logical Replication/BDR prototype and architecture
|
List | pgsql-hackers |
On 12-06-15 04:03 PM, Robert Haas wrote: > On Thu, Jun 14, 2012 at 4:13 PM, Andres Freund<andres@2ndquadrant.com> wrote: >> I don't plan to throw in loads of conflict resolution smarts. The aim is to get >> to the place where all the infrastructure is there so that a MM solution can >> be built by basically plugging in a conflict resolution mechanism. Maybe >> providing a very simple one. >> I think without in-core support its really, really hard to build a sensible MM >> implementation. Which doesn't mean it has to live entirely in core. > Of course, several people have already done it, perhaps most notably Bucardo. > > Anyway, it would be good to get opinions from more people here. I am > sure I am not the only person with an opinion on the appropriateness > of trying to build a multi-master replication solution in core or, > indeed, the only person with an opinion on any of these other issues. This sounds like a good place for me to chime in. I feel that in-core support to capture changes and turn them into change records that can be replayed on other databases, without relying on triggers and log tables, would be good to have. I think we want some flexible enough that people write consumers of the LCRs to do conflict resolution for multi-master but I am not sure that the conflict resolution support actually belongs in core. Most of the complexity of slony (both in terms of lines of code, and issues people encounter using it) comes not from the log triggers or replay of the logged data but comes from the configuration of the cluster. Controlling things like * Which tables replicate from a node to which other nodes * How do you change the cluster configuration on a running system (adding nodes, removing nodes, moving the origin of a table, adding tables to replication etc...) This is the harder part of the problem, I think we need to first get the infrastructure committed (that the current patch set deals with) to capturing, transporting and translating the LCR's into the system before get too caught up in the configuration aspects. I think we will have a hard time agreeing on behaviours for some of that other stuff that are both flexible for enough use cases and simple enough for administrators. I'd like to see in-core support for a lot of that stuff but I'm not holding my breath. > It is not good for those other opinions to be saved for a later date. > >> Hm. Yes, you could do that. But I have to say I don't really see a point. >> Maybe the fact that I do envision multimaster systems at some point is >> clouding my judgement though as its far less easy in that case. > Why? I don't think that particularly changes anything. > >> It also complicates the wal format as you now need to specify whether you >> transport a full or a primary-key only tuple... > Why? If the schemas are in sync, the target knows what the PK is > perfectly well. If not, you're probably in trouble anyway. > >> I think though that we do not want to enforce that mode of operation for >> tightly coupled instances. For those I was thinking of using command triggers >> to synchronize the catalogs. >> One of the big screwups of the current replication solutions is exactly that >> you cannot sensibly do DDL which is not a big problem if you have a huge >> system with loads of different databases and very knowledgeable people et al. >> but at the beginning it really sucks. I have no problem with making one of the >> nodes the "schema master" in that case. >> Also I would like to avoid the overhead of the proxy instance for use-cases >> where you really want one node replicated as fully as possible with the slight >> exception of being able to have summing tables, different indexes et al. > In my view, a logical replication solution is precisely one in which > the catalogs don't need to be in sync. If the catalogs have to be in > sync, it's not logical replication. ISTM that what you're talking > about is sort of a hybrid between physical replication (pages) and > logical replication (tuples) - you want to ship around raw binary > tuple data, but not entire pages. The problem with that is it's going > to be tough to make robust. Users could easily end up with answers > that are total nonsense, or probably even crash the server. > I see three catalogs in play here. 1. The catalog on the origin 2. The catalog on the proxy system (this is the catalog used to translate the WAL records to LCR's). The proxy system will need essentially the same pgsql binaries (same architecture, important complie flags etc..) as the origin 3. The catalog on the destination system(s). The catalog 2 must be in sync with catalog 1, catalog 3 shouldn't need to be in-sync with catalog 1. I think catalogs 2 and 3 are combined in the current patch set (though I haven't yet looked at the code closely). I think the performance optimizations Andres has implemented to update tuples through low-level functions should be left for later and that we should be generating SQL in the apply cache so we don't start assuming much about catalog 3. > guarantee. And, without such a guarantee, I don't believe that we can > create a high-performance, robust, in-core replication solution. > > Part of what people expect from a robust in-core solution is that it should work with the the other in-core features. If we have to list a bunch of in-core type as being incompatible with logical replication then people will look at logical replication with the same 'there be dragons here' attitude that scare many people away from the existing third party replication solutions. Non-core or third party user defined types are a slightly different matter because we can't control what they do. Steve
pgsql-hackers by date: