Thread: Re: Clock-skew management in logical replication
Nisha Moond <nisha.moond412@gmail.com> writes: > Thoughts? Looking forward to hearing others' opinions! Had a productive conversation with Amit Kaplia today about time skew in distributed systems, and wanted to share some thoughts. Essentially, we're grappling with the classic distributed snapshot problem. In a multi-active environment, where multiple nodes can independently process transactions, it becomes crucial to determine the visibility of these transactions across the system. Time skew, where different machines have different timestamps make it a hard problem. How can we ensure consistent transaction ordering and visibility when time itself is unreliable? As you mentioned, there are several ways to tackle the time skew problem in distributed systems. These approaches generally fall into three main categories: 1. Centralized Timestamps (Timestamp Oracle) Mechanism: A dedicated server acts as a single source of truth for time, eliminating skew by providing timestamps to all nodes. Google Percolator and TiDB use this approach. Consistency level: Serializable Pros: Simple to implement. Cons: High latency for cross-geo transactions due to reliance on a central server. Can become a bottleneck. 2. Atomic Clocks (True Time) Mechanism: Utilizes highly accurate atomic clocks to provide a globally consistent view of time, as seen in Google Spanner. Consistency level: External Serializable Pros: Very high consistency level (externally consistent). Cons: Requires specialized and expensive hardware. Adds some latency to transactions, though less than centralized timestamps. 3. Hybrid Logical Clocks Mechanism: CombinesNTP for rough time synchronization with logical clocks for finer-grained ordering. Yugabyte and CockroachDB employ this strategy. Consistency level: Serializable Pros: Avoids the need for specialized hardware. Cons: Can introduce significant latency to transactions. 4 Local Clocks Mechanism: Just use logical clock Consistency level: Eventual Consistency Pros: Simple implementation Cons: The consistency level is very low Of the four implementations considered, only local clocks and the HLC approach offer a 'pure database' solution. Given PostgreSQL's practical use cases, I recommend starting with a local clock implementation. However, recognizing the increasing prevalence of distributed clock services, we should also implement a pluggable time access method. This allows users to integrate with different time services as needed. In the mid-term, implementing the HLC approach would provide highly consistent snapshot reads. This offers a significant advantage for many use cases. Long-term, we should consider integrating with a distributed time service like AWS Time Sync Service. This ensures high accuracy and scalability for demanding applications. Thanks, Shihao
On 9/21/24 01:31, shihao zhong wrote: > Nisha Moond <nisha.moond412@gmail.com> writes: >> Thoughts? Looking forward to hearing others' opinions! > > Had a productive conversation with Amit Kaplia today about time skew > in distributed systems, and wanted to share some thoughts. > Essentially, we're grappling with the classic distributed snapshot > problem. In a multi-active environment, where multiple nodes can > independently process transactions, it becomes crucial to determine > the visibility of these transactions across the system. Time skew, > where different machines have different timestamps make it a hard > problem. How can we ensure consistent transaction ordering and > visibility when time itself is unreliable? > > As you mentioned, there are several ways to tackle the time skew > problem in distributed systems. These approaches generally fall into > three main categories: > > 1. Centralized Timestamps (Timestamp Oracle) > 2. Atomic Clocks (True Time) > 3. Hybrid Logical Clocks > 4 Local Clocks > I recommend ...<snip>... implement a pluggable time access method. This > allows users to integrate with different time services as needed. Huge +1 > In the mid-term, implementing the HLC approach would provide highly > consistent snapshot reads. This offers a significant advantage for > many use cases. agreed > Long-term, we should consider integrating with a distributed time > service like AWS Time Sync Service. This ensures high accuracy and > scalability for demanding applications. I think the pluggable access method should make this possible, no? -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
> Long-term, we should consider integrating with a distributed time
> service like AWS Time Sync Service. This ensures high accuracy and
> scalability for demanding applications.
> I think the pluggable access method should make > this possible, no?
I am sorry that I did not explain clearly in previous email. What do I mean is the pluggable time access method should provide the mechanism to use customized time service. But there is no out of box solution for customer who want to use customized time service. I am suggesting we provide some default implementation for popular used time service like AWS time sync service. Maybe that should be done outside of the mainstream but this is something provide better user experience
--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
On Sun, Sep 22, 2024 at 7:24 PM Joe Conway <mail@joeconway.com> wrote: > > On 9/21/24 01:31, shihao zhong wrote: > > Nisha Moond <nisha.moond412@gmail.com> writes: > >> Thoughts? Looking forward to hearing others' opinions! > > > > Had a productive conversation with Amit Kaplia today about time skew > > in distributed systems, and wanted to share some thoughts. > > Essentially, we're grappling with the classic distributed snapshot > > problem. In a multi-active environment, where multiple nodes can > > independently process transactions, it becomes crucial to determine > > the visibility of these transactions across the system. Time skew, > > where different machines have different timestamps make it a hard > > problem. How can we ensure consistent transaction ordering and > > visibility when time itself is unreliable? > > > > As you mentioned, there are several ways to tackle the time skew > > problem in distributed systems. These approaches generally fall into > > three main categories: > > > > 1. Centralized Timestamps (Timestamp Oracle) > > 2. Atomic Clocks (True Time) > > 3. Hybrid Logical Clocks > > 4 Local Clocks > > > I recommend ...<snip>... implement a pluggable time access method. This > > allows users to integrate with different time services as needed. > > Huge +1 > The one idea to provide user control over timestamps that are used for 'latest_write_wins' strategy could be to let users specify the values in a special column in the table that will be used to resolve conflicts. CREATE TABLE foo(c1 int, c2 timestamp default conflict_fn, CHECK CONFLICTS(c2)); Now, for column c2 user can provide its function which can provide value for each row that can be used to resolve conflict. If the table_level conflict column is provided then that will be used to resolve conflicts, otherwise, the default commit timestamp provided by commit_ts module will be used to resolve conflict. On the apply-side, we will use a condition like: if ((source_new_column_value > replica_current_column_value) || operation.type == "delete") apply_update(); In the above example case, source_new_column_value and replica_current_column_value will be column c2 on publisher and subscriber. Note, that in the above case, we allowed deletes to always win as the delete operation doesn't update the column values. We can choose a different strategy to apply deletes like comparing the existing column values as well. Note that MYSQL [1] and Oracle's Timesten [2] provide a similar strategy at the table level for conflict resolution to avoid reliance on system clocks. Though this provides a way for users to control values required for conflict resolution, I prefer a simple approach at least for the first version which is to document that users should ensure time synchronization via NTP. Even Oracle mentions the same in their docs [3] (See from: "It is critical to ensure that the clocks on all databases are identical to one another and it’s recommended that all database servers are configured to maintain accurate time through a time server using the network time protocol (NTP). Even in environments where databases span different time zones, all database clocks must be set to the same time zone or Coordinated Universal Time (UTC) must be used to maintain accurate time. Failure to maintain accurate and synchronized time across the databases in an active-active replication environment will result in data integrity issues.") [1] - https://dev.mysql.com/doc/refman/9.0/en/mysql-cluster-replication-schema.html#ndb-replication-ndb-replication [2] - https://docs.oracle.com/en/database/other-databases/timesten/22.1/replication/configuring-timestamp-comparison.html#GUID-C8B0580B-B577-435F-8726-4AF341A09806 [3] - https://www.oracle.com/cn/a/tech/docs/technical-resources/wp-oracle-goldengate-activeactive-final2-1.pdf -- With Regards, Amit Kapila.
Dear hackers, > Though this provides a way for users to control values required for > conflict resolution, I prefer a simple approach at least for the first > version which is to document that users should ensure time > synchronization via NTP. Even Oracle mentions the same in their docs I researched some cloud services and found that the time-sync services on the cloud are integrated with the NTP or PTP direct connection. This means that there are no specific APIs to synchronize the machine clock. Based on that, I also agree with the simple approach (just document). I feel the synchronization can be regarded as the low-layer task and can rely on the OS. The below part shows the status of cloud vendors and Oracle. ## AWS case AWS provides a "Time Sync Service" [1] that can be used via NTP. The source server is at 169.254.169.123; users can modify the configuration file to refer to it shown below. ``` server 169.254.169.123 prefer iburst ``` Or users can even directly connect to the local and accurate hardware clock. ## GCP case GCP compute engines must use an NTP server on the GCP cloud [2], located at metadata.google.internal, or other public NTP servers. The configuration will look like this: ``` server metadata.google.internal iburst ``` ## Oracle case Oracle RAC requires that all participants are well synchronized by NTP. Formally, it had an automatic synchronization feature called "Cluster Time Synchronization Service (CTSS)." It is de-supported in Oracle Database 23ai [3]. [1]: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/configure-ec2-ntp.html [2]: https://cloud.google.com/compute/docs/instances/configure-ntp [3]: https://docs.oracle.com/en/database/oracle/oracle-database/23/cwlin/server-configuration-checklist-for-oracle-grid-infrastructure.html Best regards, Hayato Kuroda FUJITSU LIMITED
On Wed, Sep 25, 2024 at 3:09 PM Hayato Kuroda (Fujitsu) <kuroda.hayato@fujitsu.com> wrote: > > > Though this provides a way for users to control values required for > > conflict resolution, I prefer a simple approach at least for the first > > version which is to document that users should ensure time > > synchronization via NTP. Even Oracle mentions the same in their docs > > I researched some cloud services and found that the time-sync services on the > cloud are integrated with the NTP or PTP direct connection. This means that there > are no specific APIs to synchronize the machine clock. Based on that, > I also agree with the simple approach (just document). I feel the synchronization > can be regarded as the low-layer task and can rely on the OS. > > The below part shows the status of cloud vendors and Oracle. > > ## AWS case > > AWS provides a "Time Sync Service" [1] that can be used via NTP. The source server > is at 169.254.169.123; users can modify the configuration file to refer to it shown below. > > ``` > server 169.254.169.123 prefer iburst > ``` > > Or users can even directly connect to the local and accurate hardware clock. > > ## GCP case > > GCP compute engines must use an NTP server on the GCP cloud [2], located at > metadata.google.internal, or other public NTP servers. The configuration will > look like this: > > ``` > server metadata.google.internal iburst > ``` > If NTP already provides a way to configure other time-sync services as shown by you then I don't think we need to do more at this stage except to document it with the conflict resolution patch. In the future, we may want to provide an additional column in the table with a special meaning that can help in conflict resolution. -- With Regards, Amit Kapila.