Re: Proposal: Commit timestamp - Mailing list pgsql-hackers
From | Jan Wieck |
---|---|
Subject | Re: Proposal: Commit timestamp |
Date | |
Msg-id | 45CA9A66.3090105@Yahoo.com Whole thread Raw |
In response to | Re: Proposal: Commit timestamp (Markus Schiltknecht <markus@bluegap.ch>) |
Responses |
Re: Proposal: Commit timestamp
|
List | pgsql-hackers |
On 2/7/2007 9:27 PM, Markus Schiltknecht wrote: > Hi, > > Jan Wieck wrote: >> Then let me give you a little puzzle just for the fun of it. >> >> A database containing customer contact information (among other things) >> is a two node multimaster system. One is serving the customer web >> portal, the other is used by the company staff including the call >> center. At 13:45 the two servers lose connectivity to each other, yet >> the internal staff can access the internal server while the web portal >> is accessible from the outside. At 13:50 customer A updates their credit >> card information through the web portal, while customer B does the same >> through the call center. At 13:55 both customers change their mind to >> use yet another credit card, now customer A phones the call center while >> customer B does it via the internet. > > Phew, a mind twister... one customer would already be enough to trigger > that sort of conflict... > >> At 14:00 the two servers reconnect and go through the conflict >> resolution. How do you intend to solve both conflicts without using any >> "clock", because that seems to be a stopword causing instant rejection >> of whatever you propose. Needless to say, both customers will be >> dissatisfied if you charge the "wrong" credit card during your next >> billing cycle. > > Correct. But do these cases satisfy storing timestamps to each and every > transaction you do? That's what I doubt, not the usefulness of time > based conflict resolution for certain cases. > > You can always add a time based conflict resolution, by adding a > timestamp column and decide upon that one. I'd guess that the overall > costs are lower that way. Yes, yes, and yes ... but aside from the problem that you use the very ambiguous word "timestamp" (which somehow suggests using a "clock" of some sort), isn't the "begin" timestamp of a long running transaction worse than the "commit" timestamp, when all its work got visible to the outside world instantaneously? > > But you've withdrawn that proposal already, so... > >> Which is a good discussion because one of the reasons why I stopped >> looking into Postgres-R is the fact that is based on the idea to push >> all the replication information through a system that generates a global >> serialized message queue. That by itself isn't the problem, but the fact >> that implementing a global serialized message queue has serious >> throughput issues that are (among other details) linked to the speed of >> light. > > Agreed. Nevertheless, there are use cases for such systems, because they > put less limitations to the application. One could even argue, that your > above example would be one ;-) Now we're in sync :-) > >> I am trying to start with a system, that doesn't rely on such a >> mechanism for everything. I do intend to add an option later, that >> allows to declare a UNIQUE NOT NULL constraint to be synchronous. What >> that means is, that any INSERT, UPDATE, DELETE and SELECT FOR UPDATE >> will require the node to currently be a member of the (quorum or >> priority defined) majority of the cluster. > > Sounds reasonable. > >> An advisory lock system, >> based on a total order group communication, will grant the lock to the >> unique key values on a first come, first serve base. Every node in the >> cluster will keep those keys as "locked" until the asynchronous >> replication stream reports the locking transaction as ended. If another >> remote transaction in the meantime requires updating such key, the >> incoming stream from that node will be on hold until the lock is >> cleared. This is to protect agains node B replicating a transaction from >> node A and a later update on node B arrives on C before C got the first >> event from A. A node that got disconnected from the cluster must rebuild >> the current advisory lock list upon reconnecting to the cluster. > > Yeah, this is a convenient way to replicate sequences via a GCS. > >> I think that this will be a way to overcome Postgres-R's communication >> bottleneck, as well as allowing limited update activity even during a >> completely disconnected state of a node. Synchronous or group >> communication messages are reduced to the cases, where the application >> cannot be implemented in a conflict free way, like allocating a natural >> primary key. There is absolutely no need to synchronize for example >> creating a sales order. > > Agreed, such cases can easily be optimized. But you have to be aware of > he limitations these optimizations cause. Postgres-R is much more > targeted at very general use cases. I am, if for no other reason than that I am familiar with the concepts underneath Postgres-R for more than 3 years. What I realized is that the "general use" case (for arbitrary complex applications)is very likely to be in conflict with any king of "good default performance" case. > >> An application can use global unique ID's for >> the order number. And everything possibly referenced by an order (items, >> customers, ...) is stored in a way that the references are never >> updated. Deletes to those possibly referenced objects are implemented in >> a two step process, where they are first marked obsolete, and later on >> things that have been marked obsolete for X long are deleted. A REPLICA >> TRIGGER on inserting an order will simply reset the obsolete flag of >> referenced objects. If a node is disconnected longer than X, you have a >> problem - hunt down the guy who defined X. > > Yeah, that's another very nice optimization. Again, as long as you know > the limitations, that's all well and fine. > >> Merging certain ideas to come up with an async/sync hybrid? Seems to me >> we have similar enough ideas to need conflict resolution, because we had >> them simultaneously but communicate them asynchronously. > > Huh? Sorry, I didn't get what you're trying to say here. Out of sync again ... we'll get there tomorrow ... unless your clock is way back and tomorrow will never come. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
pgsql-hackers by date: