Home > mailing lists
Re: Proposal: Commit timestamp - Mailing list pgsql-hackers

From	Jan Wieck
Subject	Re: Proposal: Commit timestamp
Date	February 7, 2007 23:35:53
Msg-id	45CA9A66.3090105@Yahoo.com Whole thread Raw
In response to	Re: Proposal: Commit timestamp (Markus Schiltknecht <markus@bluegap.ch>)
Responses	Re: Proposal: Commit timestamp
List	pgsql-hackers
Tree view
On 2/7/2007 9:27 PM, Markus Schiltknecht wrote:
> Hi,
> 
> Jan Wieck wrote:
>> Then let me give you a little puzzle just for the fun of it.
>> 
>> A database containing customer contact information (among other things) 
>> is a two node multimaster system. One is serving the customer web 
>> portal, the other is used by the company staff including the call 
>> center. At 13:45 the two servers lose connectivity to each other, yet 
>> the internal staff can access the internal server while the web portal 
>> is accessible from the outside. At 13:50 customer A updates their credit 
>> card information through the web portal, while customer B does the same 
>> through the call center. At 13:55 both customers change their mind to 
>> use yet another credit card, now customer A phones the call center while 
>> customer B does it via the internet.
> 
> Phew, a mind twister... one customer would already be enough to trigger 
> that sort of conflict...
> 
>> At 14:00 the two servers reconnect and go through the conflict 
>> resolution. How do you intend to solve both conflicts without using any 
>> "clock", because that seems to be a stopword causing instant rejection 
>> of whatever you propose. Needless to say, both customers will be 
>> dissatisfied if you charge the "wrong" credit card during your next 
>> billing cycle.
> 
> Correct. But do these cases satisfy storing timestamps to each and every 
> transaction you do? That's what I doubt, not the usefulness of time 
> based conflict resolution for certain cases.
> 
> You can always add a time based conflict resolution, by adding a 
> timestamp column and decide upon that one. I'd guess that the overall 
> costs are lower that way.

Yes, yes, and yes ... but aside from the problem that you use the very 
ambiguous word "timestamp" (which somehow suggests using a "clock" of 
some sort), isn't the "begin" timestamp of a long running transaction 
worse than the "commit" timestamp, when all its work got visible to the 
outside world instantaneously?

> 
> But you've withdrawn that proposal already, so...
> 
>> Which is a good discussion because one of the reasons why I stopped 
>> looking into Postgres-R is the fact that is based on the idea to push 
>> all the replication information through a system that generates a global 
>> serialized message queue. That by itself isn't the problem, but the fact 
>> that implementing a global serialized message queue has serious 
>> throughput issues that are (among other details) linked to the speed of 
>> light.
> 
> Agreed. Nevertheless, there are use cases for such systems, because they 
> put less limitations to the application. One could even argue, that your 
> above example would be one ;-)

Now we're in sync :-)

> 
>> I am trying to start with a system, that doesn't rely on such a 
>> mechanism for everything. I do intend to add an option later, that 
>> allows to declare a UNIQUE NOT NULL constraint to be synchronous. What 
>> that means is, that any INSERT, UPDATE, DELETE and SELECT FOR UPDATE 
>> will require the node to currently be a member of the (quorum or 
>> priority defined) majority of the cluster.
> 
> Sounds reasonable.
> 
>> An advisory lock system, 
>> based on a total order group communication, will grant the lock to the 
>> unique key values on a first come, first serve base. Every node in the 
>> cluster will keep those keys as "locked" until the asynchronous 
>> replication stream reports the locking transaction as ended. If another 
>> remote transaction in the meantime requires updating such key, the 
>> incoming stream from that node will be on hold until the lock is 
>> cleared. This is to protect agains node B replicating a transaction from 
>> node A and a later update on node B arrives on C before C got the first 
>> event from A. A node that got disconnected from the cluster must rebuild 
>> the current advisory lock list upon reconnecting to the cluster.
> 
> Yeah, this is a convenient way to replicate sequences via a GCS.
> 
>> I think that this will be a way to overcome Postgres-R's communication 
>> bottleneck, as well as allowing limited update activity even during a 
>> completely disconnected state of a node. Synchronous or group 
>> communication messages are reduced to the cases, where the application 
>> cannot be implemented in a conflict free way, like allocating a natural 
>> primary key. There is absolutely no need to synchronize for example 
>> creating a sales order. 
> 
> Agreed, such cases can easily be optimized. But you have to be aware of 
> he limitations these optimizations cause. Postgres-R is much more 
> targeted at very general use cases.

I am, if for no other reason than that I am familiar with the concepts 
underneath Postgres-R for more than 3 years. What I realized is that the  "general use" case (for arbitrary complex
applications)is very likely 
 
to be in conflict with any king of "good default performance" case.

> 
>> An application can use global unique ID's for 
>> the order number. And everything possibly referenced by an order (items, 
>> customers, ...) is stored in a way that the references are never 
>> updated. Deletes to those possibly referenced objects are implemented in 
>> a two step process, where they are first marked obsolete, and later on 
>> things that have been marked obsolete for X long are deleted. A REPLICA 
>> TRIGGER on inserting an order will simply reset the obsolete flag of 
>> referenced objects. If a node is disconnected longer than X, you have a 
>> problem - hunt down the guy who defined X.
> 
> Yeah, that's another very nice optimization. Again, as long as you know 
> the limitations, that's all well and fine.
> 
>> Merging certain ideas to come up with an async/sync hybrid? Seems to me 
>> we have similar enough ideas to need conflict resolution, because we had 
>> them simultaneously but communicate them asynchronously.
> 
> Huh? Sorry, I didn't get what you're trying to say here.

Out of sync again ... we'll get there tomorrow ... unless your clock is 
way back and tomorrow will never come.


Jan


-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #
pgsql-hackers by date:
From: Jan Wieck
Date: 07 February 2007, 23:14:03
Subject: Re: Proposal: Commit timestamp
From: Bruce Momjian
Date: 07 February 2007, 23:36:41
Subject: Re: Proposal: Commit timestamp
Re: Proposal: Commit timestamp - Mailing list pgsql-hackers

Previous

Next