Re: Proposal: Commit timestamp - Mailing list pgsql-hackers
From | Jan Wieck |
---|---|
Subject | Re: Proposal: Commit timestamp |
Date | |
Msg-id | 45C507FC.4010405@Yahoo.com Whole thread Raw |
In response to | Re: Proposal: Commit timestamp (Theo Schlossnagle <jesus@omniti.com>) |
Responses |
Re: Proposal: Commit timestamp
Re: Proposal: Commit timestamp |
List | pgsql-hackers |
On 2/3/2007 4:58 PM, Theo Schlossnagle wrote: > On Feb 3, 2007, at 4:38 PM, Jan Wieck wrote: > >> On 2/3/2007 4:05 PM, Theo Schlossnagle wrote: >>> On Feb 3, 2007, at 3:52 PM, Jan Wieck wrote: >>>> On 2/1/2007 11:23 PM, Jim Nasby wrote: >>>>> On Jan 25, 2007, at 6:16 PM, Jan Wieck wrote: >>>>>> If a per database configurable tslog_priority is given, the >>>>>> timestamp will be truncated to milliseconds and the increment >>>>>> logic is done on milliseconds. The priority is added to the >>>>>> timestamp. This guarantees that no two timestamps for commits >>>>>> will ever be exactly identical, even across different servers. >>>>> Wouldn't it be better to just store that information >>>>> separately, rather than mucking with the timestamp? >>>>> Though, there's anothe issue here... I don't think NTP is good >>>>> for any better than a few milliseconds, even on a local network. >>>>> How exact does the conflict resolution need to be, anyway? >>>>> Would it really be a problem if transaction B committed 0.1 >>>>> seconds after transaction A yet the cluster thought it was the >>>>> other way around? >>>> >>>> Since the timestamp is basically a Lamport counter which is just >>>> bumped be the clock as well, it doesn't need to be too precise. >>> Unless I'm missing something, you are _treating_ the counter as a >>> Lamport timestamp, when in fact it is not and thus does not >>> provide semantics of a Lamport timestamp. As such, any >>> algorithms that use lamport timestamps as a basis or assumption >>> for the proof of their correctness will not translate (provably) >>> to this system. >>> How are your counter semantically equivalent to Lamport timestamps? >> >> Yes, you must be missing something. >> >> The last used timestamp is remembered. When a remote transaction is >> replicated, the remembered timestamp is set to max(remembered, >> remote). For a local transaction, the remembered timestamp is set >> to max(remembered+1ms, systemclock) and that value is used as the >> transaction commit timestamp. > > A Lamport clock, IIRC, require a cluster wide tick. This seems based > only on activity and is thus an observational tick only which means > various nodes can have various perspectives at different times. > > Given that time skew is prevalent, why is the system clock involved > at all? This question was already answered. > As is usual distributed systems problems, they are very hard to > explain casually and also hard to review from a theoretical angle > without a proof. Are you basing this off a paper? If so which one? > If not, have you written a rigorous proof of correctness for this > approach? I don't have any such paper and the proof of concept will be the implementation of the system. I do however see enough resistance against this proposal to withdraw the commit timestamp at this time. The new replication system will therefore require the installation of a patched, non-standard PostgreSQL version, compiled from sources cluster wide in order to be used. I am aware that this will dramatically reduce it's popularity but it is impossible to develop this essential feature as an external module. I thank everyone for their attention. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
pgsql-hackers by date: