最近工作做功能一直奉行简单的原则,有时候即使牺牲一些效率也要代码好看一些,因为大部分代码不会影响服务器效率,简洁既是美,等到效率被影响时再去优化,不过还是需要研究一下这种各种缓存分布式的复杂换效率的系统。先翻译翻译文档吧。
翻译完还是不知道是干啥的,未完待续。。。
ZooKeeper: A Distributed Coordination Service for Distributed Applications
(原谅我的渣英语!)
字面意思:用于分布式程序的分布式协作服务。
Design Goals
ZooKeeper is simple.简单
ooKeeper is replicated. 多个,是个集群
ZooKeeper is ordered. 有序
ZooKeeper is fast. 快
Data model and the hierarchical namespace
数据模型和分层命名空间
Nodes and ephemeral nodes
节点和临时节点
Conditional updates and watches
条件更新和监视
Guarantees
保证
-
Sequential Consistency - Updates from a client will be applied in the order that they were sent.
-
顺序
-
Atomicity - Updates either succeed or fail. No partial results.
- 原子性
-
Single System Image - A client will see the same view of the service regardless of the server that it connects to.
- 一致性,所有服务器一样
-
Reliability - Once an update has been applied, it will persist from that time forward until a client overwrites the update.
- 可靠习惯
-
Timeliness - The clients view of the system is guaranteed to be up-to-date within a certain time bound.
- 客户端视图保证是一段时间最新的?
Simple API
简单的api
create
creates a node at a location in the tree
delete
deletes a node
exists
tests if a node exists at a location
get data
reads the data from a node
set data
writes data to a node
get children
retrieves a list of children of a node
sync
waits for data to be propagated(传播)
Implementation
实现
Every ZooKeeper server services clients. Clients connect to exactly one server to submit irequests. Read requests are serviced from the local replica of each server database. Requests that change the state of the service, write requests, are processed by an agreement protocol.The replicated database is an in-memory database containing the entire data tree. Updates are logged to disk for recoverability, and writes are serialized to disk before they are applied to the in-memory database.
As part of the agreement protocol all write requests from clients are forwarded to a single server, called the leader. The rest of the ZooKeeper servers, called followers, receive message proposals from the leader and agree upon message delivery. The messaging layer takes care of replacing leaders on failures and syncing followers with leaders.
ZooKeeper uses a custom atomic messaging protocol. Since the messaging layer is atomic, ZooKeeper can guarantee that the local replicas never diverge. When the leader receives a write request, it calculates what the state of the system is when the write is to be applied and transforms this into a transaction that captures this new state.
Uses
The programming interface to ZooKeeper is deliberately simple. With it, however, you can implement higher order operations, such as synchronizations primitives, group membership, ownership, etc. Some distributed applications have used it to: [tbd: add uses from white paper and video presentation.] For more information, see [tbd]
Performance
ZooKeeper is designed to be highly performant. But is it? The results of the ZooKeeper's development team at Yahoo! Research indicate that it is. (See ZooKeeper Throughput as the Read-Write Ratio Varies.) It is especially high performance in applications where reads outnumber writes, since writes involve synchronizing the state of all servers. (Reads outnumbering writes is typically the case for a coordination service.)
性能
throughput 吞吐量
The figure ZooKeeper Throughput as the Read-Write Ratio Varies is a throughput graph of ZooKeeper release 3.2 running on servers with dual 2Ghz Xeon and two SATA 15K RPM drives. One drive was used as a dedicated ZooKeeper log device. The snapshots were written to the OS drive. Write requests were 1K writes and the reads were 1K reads. "Servers" indicate the size of the ZooKeeper ensemble, the number of servers that make up the service. Approximately 30 other servers were used to simulate the clients. The ZooKeeper ensemble was configured such that leaders do not allow connections from clients.
Reliability
To show the behavior of the system over time as failures are injected we ran a ZooKeeper service made up of 7 machines. We ran the same saturation benchmark as before, but this time we kept the write percentage at a constant 30%, which is a conservative ratio of our expected workloads.
可靠性
The are a few important observations from this graph. First, if followers fail and recover quickly, then ZooKeeper is able to sustain a high throughput despite the failure. But maybe more importantly, the leader election algorithm allows for the system to recover fast enough to prevent throughput from dropping substantially. In our observations, ZooKeeper takes less than 200ms to elect a new leader. Third, as followers recover, ZooKeeper is able to raise throughput again once they start processing requests.
The ZooKeeper Project
ZooKeeper has been successfully used in many industrial applications. It is used at Yahoo! as the coordination and failure recovery service for Yahoo! Message Broker, which is a highly scalable publish-subscribe system managing thousands of topics for replication and data delivery. It is used by the Fetching Service for Yahoo! crawler, where it also manages failure recovery. A number of Yahoo! advertising systems also use ZooKeeper to implement reliable services.
All users and developers are encouraged to join the community and contribute their expertise. See theZookeeper Project on Apache for more information.