Re: Add CREATE support to event triggers - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Add CREATE support to event triggers |
Date | |
Msg-id | CA+TgmoZXgy_sayHPaCR7z=Gka=r1Grhqi0pgygRZGBnn0_LjdQ@mail.gmail.com Whole thread Raw |
In response to | Re: Add CREATE support to event triggers (Andres Freund <andres@2ndquadrant.com>) |
Responses |
Re: Add CREATE support to event triggers
|
List | pgsql-hackers |
On Wed, Nov 12, 2014 at 4:58 PM, Andres Freund <andres@2ndquadrant.com> wrote: > That's already the situation today with all the logical replication > solutions. They *constantly* break in the field. Most commonly because > of DDL differences. Right. And that's why it's cool that logical decoding can operate through DDL differences. The apply side might not be able to cope with what pops out, but that's not logical decoding's fault, and different apply-sides can adopt different policies as to how to deal with whatever problems crop up. > I don't understand why you think it's likely for logical replication to > break due to this? You mean because deparse yielded a invalid statement? > In a normal single master setup there really shouldn't be scenarios > where that happens? Except bugs - but as you know we had more than in > HS/SR as well? I don't know exactly what you mean by "a normal single master setup". Surely the point of logical decoding is that the replica might not be identical to the master. And if it isn't, then a command that succeeded on the master might fail on the standby - for example, because an object by that name already exists there, or because a type doesn't exist there. (Even if you replicate a CREATE EXTENSION command, there's no guarantee that the .so exists on the target.) Then what? This is basically the same problem as multi-master replication conflicts, except with DDL. Resolving replication conflicts is not a very easy thing to get right even if you're only concerned about the rows in the tables. It's probably harder if you're worried about the schema, too. > We are thinking about extending 2PC to be usable across logical > decoding. That's a relatively simple patch. Then it's possible to do the > DDL on the primary, ship it to the standby, apply it there, and only > afterwards commit the prepared xact if that was successfull. That's > quite cool - but somewhat in the remit of the replication solution. That would certainly make the user aware of a quite a few kinds of errors that might otherwise go undetected. > The solution here doesn't force you to do that, does it? It's something > that can be used by more than replication solution? In theory, yes. > I just don't see the alternative you're proposing? I've so far not even > seen a credible *sketch* of an alternative design that also can handle > ALTER. The only current alternatives are 1) the user inserts some > events into the queue manually. If they depend on any local state you're > screwed. If they have syntax errors they're often screwed. 2). The user > does all actions on the standby first. Then on the primary. That's hard > for ALTER ADD COLUMN and similar, and just about impossible for renaming > things. It's a really hard problem. I don't think it's possible to make statement-based replication no-fail. Physical replication is basically no-fail because it just says, hey, go write these bytes into this page, and we can pretty much always do that. But statement-based logical replication means basically executing arbitrary chunks of code all over the backend, and there is just no way to guarantee that code won't throw an error. So the best we can do is to hope that those errors will get reported back to the user, which is going to require some kind of distributed transaction. Your idea to just run the replicated DDL statements on the standby before committing on the master is one approach to that problem, and probably the simplest one, but not the only one - one can imagine something that resembles true clustering, for example. By the way, the fact that you're planning to do log-based replication of DML and trigger-based replication of DDL scares the crap out of me. I'm not sure how that's going to work at all if the two are interleaved in the same transaction. Also, relying on triggers for replication is generally not awesome, because it increases administrative complexity. Event triggers are probably better in that regard than ordinary triggers, because they're database-wide, but I don't think they solve the problem completely. But the thing that scares me even more is that the DDL replication is not only trigger-based, but statement-based. Why don't we do logical replication by recording all of the SQL statements that run on the master and re-executing them on the standby? Well, because we all know that there will be plenty of important cases where that doesn't yield the same results on both servers. There's no intrinsic reason why that shouldn't also be a problem for DDL replication, and indeed it is. This patch set is trying to patch around that by finding a way to emit a revised DDL statement that is guaranteed to do exactly the same thing on both machines, and it's probably possible to do that in most cases, but it's probably not possible to do that in all cases. To be clear, none of this is a reason to reject this patch set; I've explained my reasons for being unhappy with it elsewhere, and I think they are valid, but these are not them. These are just ruminations on the difficulty of doing truly robust DDL replication. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: