Re: Transactions involving multiple postgres foreign servers - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Transactions involving multiple postgres foreign servers |
Date | |
Msg-id | CA+TgmoY=VkHrzXD=jw5DA+Pp-ePW_6_v5n+TJk40s5Q9VXY-Pw@mail.gmail.com Whole thread Raw |
In response to | Re: Transactions involving multiple postgres foreign servers (Amit Langote <Langote_Amit_f8@lab.ntt.co.jp>) |
Responses |
Re: Transactions involving multiple postgres foreign
servers
Re: Transactions involving multiple postgres foreign servers |
List | pgsql-hackers |
On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote <Langote_Amit_f8@lab.ntt.co.jp> wrote: > However, when I briefly read the description in "Transaction Management in > the R* Distributed Database Management System (C. Mohan et al)" [2], it > seems that what Ashutosh is saying might be a correct way to proceed after > all: I think Ashutosh is mostly right, but I think there's a lot of room to doubt whether the design of this patch is good enough that we should adopt it. Consider two possible designs. In design #1, the leader performs the commit locally and then tries to send COMMIT PREPARED to every standby server afterward, and only then acknowledges the commit to the client. In design #2, the leader performs the commit locally and then acknowledges the commit to the client at once, leaving the task of running COMMIT PREPARED to some background process. Design #2 involves a race condition, because it's possible that the background process might not complete COMMIT PREPARED on every node before the user submits the next query, and that query might then fail to see supposedly-committed changes. This can't happen in design #1. On the other hand, there's always the possibility that the leader's session is forcibly killed, even perhaps by pulling the plug. If the background process contemplated by design #2 is well-designed, it can recover and finish sending COMMIT PREPARED to each relevant server after the next restart. In design #1, that background process doesn't necessarily exist, so inevitably there is a possibility of orphaning prepared transactions on the remote servers, which is not good. Even if the DBA notices them, it won't be easy to figure out whether to commit them or roll them back. I think this thought experiment shows that, on the one hand, there is a point to waiting for commits on the foreign servers, because it can avoid the anomaly of not seeing the effects of your own commits. On the other hand, it's ridiculous to suppose that every case can be handled by waiting, because that just isn't true. You can't be sure that you'll be able to wait long enough for COMMIT PREPARED to complete, and even if that works out, you may not want to wait indefinitely for a dead server. Waiting for a ROLLBACK PREPARED has no value whatsoever unless the system design is such that failing to wait for it results in the ROLLBACK PREPARED never getting performed -- which is a pretty poor excuse. Moreover, there are good reasons to think that doing this kind of cleanup work in the post-commit hooks is never going to be acceptable. Generally, the post-commit hooks need to be no-fail, because it's too late to throw an ERROR. But there's very little hope that a connection to a remote server can be no-fail; anything that involves a network connection is, by definition, prone to failure. We can try to guarantee that every single bit of code that runs in the path that sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an ERROR, but that's going to be quite difficult to do: even palloc() can throw an error. And what about interrupts? We don't want to be stuck inside this code for a long time without any hope of the user recovering control of the session by pressing ^C, but of course the way that works is it throws an ERROR, which we can't handle here. We fixed a similar issue for synchronous replication in 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a WARNING in that case and then return control to the user. But if we do that here, all of the code that every FDW emits has to be aware of that rule and follow it, and it just adds to the list of ways that the user backend can escape this code without having cleaned up all of the prepared transactions on the remote side. It seems to me that the only way to really make this feature robust is to have a background worker as part of the equation. The background worker launches at startup and looks around for local state that tells it whether there are any COMMIT PREPARED or ROLLBACK PREPARED operations pending that weren't completed during the last server lifetime, whether because of a local crash or remote unavailability. It attempts to complete those and retries periodically. When a new transaction needs this type of coordination, it adds the necessary crash-proof state and then signals the background worker. If appropriate, it can wait for the background worker to complete, just like a CHECKPOINT waits for the checkpointer to finish -- but if the CHECKPOINT command is interrupted, the actual checkpoint is unaffected. More broadly, the question has been raised as to whether it's right to try to handle atomic commit and atomic visibility as two separate problems. The XTM API proposed by Postgres Pro aims to address both with a single stroke. I don't think that API was well-designed, but maybe the idea is good even if the code is not. Generally, there are two ways in which you could imagine that a distributed version of PostgreSQL might work. One possibility is that one node makes everything work by going around and giving instructions to the other nodes, which are more or less unaware that they are part of a cluster. That is basically the design of Postgres-XC and certainly the design being proposed here. The other possibility is that the nodes are actually clustered in some way and agree on things like whether a transaction committed or what snapshot is current using some kind of consensus protocol. It is obviously possible to get a fairly long way using the first approach but it seems likely that the second one is fundamentally more powerful: among other things, because the first approach is so centralized, the leader is apt to become a bottleneck. And, quite apart from that, can a centralized architecture with the leader manipulating the other workers ever allow for atomic visibility? If atomic visibility can build on top of atomic commit, then it makes sense to do atomic commit first, but if we build this infrastructure and then find that we need an altogether different solution for atomic visibility, that will be unfortunate. I know I was one of the people initially advocating this approach, but I'm no longer convinced that it's going to work out well. I don't mean that we should abandon all work on this topic, or even less all discussion, but I think we should be careful not to get so sucked into the details of perfecting this particular patch that we ignore the bigger design questions here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: