Why Vector Clock are Easy or Hard?
Amazon's Dynamo, 4.4版本化的数据
常用的版本化技术
Timestamps seem to be an obvious solution for developing a chronological order. However, timestamps “rely on synchronized clocks and don’t capture causality” as Lipcon points out
Optimistic Locking implies that a unique counter or clock value is saved for each piece of data. When a client tries to update a dataset it has to provide the counter/clock-value of the revision it likes to update (cf. [K+10b]). As a downside of this procedure, the Project Voldemort development team notes that it does not work well in a distributed and dynamic scenario where servers show up and go away often and without prior notice.
悲观锁和乐观锁http://www.cnblogs.com/guyufei/archive/2011/01/10/1931632.html悲观锁:假定会发生并发冲突(悲观), 屏蔽一切可能违反数据完整性的操作
虽然简单安全, 但是当一个用户锁定记录的时候,会block所有其它用户乐观锁: 假设不会发生并发冲突(乐观), 只在提交操作时检查是否违反数据完整性
可以随意的并发读取和修改, 只在数据提交的时候通过数据版本判断是否存在冲突两种锁并没有好坏之说, 在于使用的场景更符合哪种假设, 如果符合悲观假设, 那么使用乐观锁可能效率更低
当然两者都必须依赖master来保证锁机制, 本质上都是主从结构, 所以都会有单点问题
当然好处是两者都能保证全序
Vector Clocks are an alternative approach to capture order and allow reasoning between updates in a distributed system.
向量时钟也是一种乐观锁, 只是更通用, 因为不依赖于master, 基于因果关系的狭义相对论
所以他只能保证偏序
发生冲突时, 必须依靠client去做resolve以达成全序
Multiversion Storage means to store a timestamp for each table cell.
These timestamps “don’t necessarily need to correspond to real life”, but can also be some artificial values that can be brought into a definite order. For a given row multiple versions can exist concurrently. Besides the most recent version a reader may also request the “most recent before T” version. This provides “optimistic concurrency control with compare-and-swap on timestamps” and also allows to take snapshots of datasets (cf. [Lip09, slide 20])
其实多版本存储和乐观锁的区别在于, 提交的时候它允许冲突, 并不会试图解决冲突.
而是采用另外一种方法, 把所有版本都保存下来, 即保存版本之间的偏序关系
但实际上这并不真正解决问题, 只是把冲突解决的时机延迟, 当用户使用时, 自己选择使用什么版本, 即做冲突解决
Vector Clocks
Vector clocks is an algorithm for generating a partial ordering of events in a distributed system and detecting causality violations.
其实vector clocks就是Lamport的偏序理论的一个实际应用
具体的使用方法, 参考, Why Vector Clock are Easy or Hard?
A vector clock is defined as a tuple V [0], V [1], ..., V [n] of clock values from each node (cf. [Lip09, slide18]).
In a distributed scenario node i maintains such a tuple of clock values, which represent the state of itself and the other (replica) nodes’ state as it is aware about at a given time (Vi[0] for the clock value of the first node, Vi[1] for the clock value of the second node, . . . Vi[i] for itself, . . . Vi[n] for the clock value of the last node).
时钟向量, 表示的含义由下面这个图很好的表示出来.
通过一个向量, 不但记录了当前节点上该数据的version, 还记录了其他节点该数据的版本情况(不一定是最新的)