开源的服务发现
返回原文英文原文:Open-Source Service Discovery
Service discovery is a key component of most distributed systems and service oriented architectures. The problem seems simple at first: How do clients determine the IP and port for a service that exist on multiple hosts? Usually, you start off with some static configuration which gets you pretty far. Things get more complicated as you start deploying more services. With a live system, service locations can change quite frequently due to auto or manual scaling, new deployments of services, as well as hosts failing or being replaced. Dynamic service registration and discovery becomes much more important in these scenarios in order to avoid service interruption. |
译者信息
服务发现是大部分分布式系统和面向服务架构的核心组件。最初问题看起来很简单:客户如何决定服务的IP地址和端口,这些服务已存在于多个服务器上的。 通常,你开始一些静态的配置,这些配置离你需要做的还挺远的。当你开始布署越来越多的服务时,事情会越来越复杂。在一个上线的系统中,由于自动的或人为的规模变化,服务的位置会经常的变化,例如布署新的服务,服务器宕机或者被替换。 在这些应用场景中为了避免服务冲突,动态的服务注册和发现会越来越重要。 |
This problem has been addressed in many different ways and is continuing to evolve. We’re going to look at some open-source or openly-discussed solutions to this problem to understand how they work. Specifically, we’ll look at how each solution uses strong or weakly consistent storage, runtime dependencies, client integration options and what the tradeoffs of those features might be. We’ll start with some strongly consistent projects such as Zookeeper,Doozer and Etcd which are typically used as coordination services but are also used for service registries as well. We’ll then look at some interesting solutions specifically designed for service registration and discovery. We’ll examine Airbnb’s SmartStack,Netflix’s Eureka, Bitly’s NSQ,Serf,Spotify and DNS and finally SkyDNS. |
译者信息
这一问题被以多种方式涉及,还将继续扩展。我们将要分析一些开源的或者开放讨论此问题的解决方案,从而理解它们是如何工作的。我们会关注这些解决方案的优势和劣势,在一致性存储、运行时依赖、客户集成选项和这些特征的利弊权衡等。 我们从一些强一致性的项目开始如Zookeeper,Doozer和Etcd,它们做为一致性服务的典型,也可以用做服务注册。 随后我们将分析一些用于服务注册和发现的有趣的解决方案。包括:Airbnb的SmartStack,Netflix的Eureka,Bitly的NSQ,Spotify和DNS,最后是SkyDNS. |
The ProblemThere are two sides to the problem of locating services. Service Registration and Service Discovery.
Any service registration and discovery solution also has other development and operational aspects to consider:
General Purpose RegistriesThese first three registries use strongly consistent protocols and are actually general purpose, consistent datastores. Although we’re looking at them as service registries, they are typically used for coordination services to aid in leader election or centralized locking with a distributed set of clients. |
译者信息
问题点定位服务的问题划分为两类。服务注册与服务发现。
任何服务注册、服务发现也有其它开发、操作层面的考虑:
通用注册处前三个注册处使用强一致性协议,实际上是通用的、一致的数据存储。尽管我们将其看作是服务注册处,它们典型地用于协调服务,来对一组分布式的客户端进行选举和集中锁定。 |
ZookeeperZookeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. It’s written in Java, is strongly consistent (CP) and uses the Zab protocol to coordinate changes across the ensemble (cluster). Zookeeper is typically run with three, five or seven members in the ensemble. Clients use language specific bindings in order to access the ensemble. Access is typically embedded into the client applications and services. Service registration is implemented with ephemeral nodesunder a namespace. Ephemeral nodes only exist while the client is connected so typically a backend service registers itself, after startup, with it’s location information. If it fails or disconnects, the node disappears from the tree. Service discovery is implemented by listing and watching the namespace for the service. Clients receive all the currently registered services as well as notifications when a service becomes unavailable or new ones register. Clients also need to handle any load balancing or failover themselves. The Zookeeper API can be difficult to use properly and language bindings might have subtle differences that could cause problems. If you’re using a JVM based language, the Curator Service Discovery Extension might be of some use. Since Zookeeper is a CP system, when a partition occurs, some of your system will not be able to register or find existing registrations even if they could function properly during the partition. Specifically, on any non-quorum side, reads and writes will return an error. |
译者信息
Zookeeper Zookeeper是一个集中的服务,它用于维护配置信息、命名、提供分布式的同步和提供分组服务。它是采用java语言编写的,强同步(CP)的,且使用Zab协议来进行跨聚簇的变更协调。 Zookeeper通常运行在有着三、五或七个成员的聚簇中。客户端使用特定的绑定来访问聚簇。访问通常嵌入到客户端应用或者服务中。 服务注册通过命名空间下的短结点来实现。短节点只有当客户端连接时存在,典型的场景是,一个后端服务在启动后,它带有地址信息,注册服务自己。当服务失败或者断开连接,结点会从树上消失。 服务的发现是通过列示和查看服务的命名空间来实现的。客户端收到所有正在注册的服务,同时发现一个服务不再可用或者又有新的服务在注册。客户端同样需要处理负载均衡或者自身的失效备援。 Zookeeper的接口使用起来有些困难,语言绑定有细微的差别导致了这一问题。如果你在正使用基于JVM的语言,Curator服务发现扩展或许会有些用。 因为Zookeeper是一个CP系统,当分片发生时,你的系统就不能注册或者查找已注册的服务,即便在分片期间它的功能是可用的。特别是在不一致时,读和写操作都将返回错误。 |
DoozerDoozer is a consistent, distributed data store. It’s written in Go, is strongly consistent and uses Paxosto maintain consensus. The project has been around for a number of years but has stagnated for a while and now has close to 160 forks. Unfortunately, this makes it difficult to know what the actual state of the project is and whether is is suitable for production use. Doozer is typically run with three, five or seven nodes in the cluster. Clients use language specific bindings to access the cluster and, similar to Zookeeper, integration is embedded into the client and services. Service registration is not as straightforward as with Zookeeper because Doozer does not have any concept of ephemeral nodes. A service can register itself under a path but if the service becomes unavailable, it won’t be removed automatically. There are a number of ways to address this issue. One option might be to add a timestamp and heartbeating mechanism to the registration process and handle expired entries during the discovery process or with another cleanup processes. Service discovery is similar to Zookeeper in that you can list all the entries under a path and then wait for any changes to that path. If you use a timestamp and heartbeat during registration, you would ignore or delete any expired entries during discovery. Like Zookeeper, Doozer is also a CP system and has the same consequences when a partition occurs. |
译者信息
DoozerDoozer是一个一致的,分布式的数据仓库,采用Go语言编写,通过Paxosto保证强一致性,这个项目进行了很多年不过已经停滞了一段时间了,现在有将近160个分支。不幸的是,这使得确认该项目的准确状态和是否适合作为产品来使用变得困难。 Doozer在集群中一般运行于3个,5个或7个节点之上。用户使用特别的语言绑定来使用集群,和Zookeeper类似,访问也集成到了客户端应用和服务中。 服务注册机制不像Zookeeper那么直接,因为Doozer没有任何短暂节点。一个服务可以自注册到一个路径之下,但如果这个服务变得不稳定,它不会被自动清除。 有很多种方法来解决这个问题。一种方法可能是给注册进程增加一个时间戳和心跳机制,在发现进程或者其他清理进程中修改过期条目。 服务发现类似于Zookeeper,你可以列出一个目录下所有的条目,等待该目录下任何的改变。如果你在注册的时候使用了一个时间戳和心跳,你可以在发现期间忽略或者删除任何过期条目。 像Zookeeper一样,Doozer也是一个CP系统,当一个分片发生的时候会造成同样的后果。 |
EtcdEtcd is a highly-available, key-value store for shared configuration and service discovery. Etcd was inspired by Zookeeper and Doozer. It’s written in Go, uses Raftfor consensus and has a HTTP+JSON based API. Etcd, similar to Doozer and Zookeeper, is usually run with three, five or seven nodes in the cluster. Clients use a language specific binding or implement one using an HTTP client. Service registration relies on using a key TTL along with heartbeating from the service to ensure the key remains available. If a services fails to update the key’s TTL, Etcd will expire it. If a service becomes unavailable, clients will need to handle the connection failure and try another service instance. Service discovery involves listing the keys under a directory and then waiting for changes on the directory. Since the API is HTTP based, the client application keeps a long-polling connection open with the Etcd cluster. Since Etcd uses Raft, it should be a strongly-consistent system. Raft requires a leader to be elected and all client requests are handled by the leader. However, Etcd also seems to support reads from non-leaders using this undocumented consistent parameter which would improve availabilty in the read case. Writes would still need to be handled by the leader during a partition and could fail. |
译者信息
EtcdEtcd 是一个应用于共享配置和服务发现的高度稳定的键值仓库。Etcd受到了Zookeeper和Doozer的启发。它采用Go语言编写,采用Raft来保证一致性,有基于HTTP+JSON的API接口。 类似于Doozer和Zookeeper,Etcd通常在集群中运行于3个,5个或7个节点之上。用户使用特殊的语言绑定或者实现一个采用HTTP协议的客户端来访问集群。 服务注册依赖于用一个采用键的TTL和来自服务的心跳来确保键保持稳定。如果一个服务更新键的TTL时失败,Etcd将会使该键失效。如果一个服务变得不稳定,客户将需要处理连接失败的情况,并尝试其他的服务实例。 服务发现包括列出一个目录下的所有键,然后等待该目录下的改变。因为API接口是基于HTTP协议的,所以客户端应用将保持一个由Etcd集群打开的长连接。 因为Etcd采用了Raft,所以它是一个强一致性系统。Raft需要选择一个主体,所有的客户请求都被主体处理。然而,Etcd也似乎支持从非主体读取数据,通过采用这个未记录的一致性参数,这个参数在读取事件发生时可以提高稳定性。在分片的时候,写操作仍然需要被主体处理而且可能会失败。 |
Single Purpose RegistriesThese next few registration services and approaches are specifically tailored to service registration and discovery. Most have come about from actual production use cases while others are interesting and different approaches to the problem. Whereas Zookeeper, Doozer and Etcd could also be used for distributed coordination, these solutions don’t have that capability. Airbnb’s SmartStackAirbnb’s SmartStack is a combination of two custom tools, Nerve and Synapsethat leverage haproxy and Zookeeper to handle service registration and discovery. Both Nerve and Synapse are written in Ruby. Nerve is a sidekick style process that runs as a separate process alongside the application service. Nerve is reponsible for registering services in Zookeeper. Applications expose a The sidekick model eliminates the need for a service to interact with Zookeeper. It simply needs a monitoring endpoint in order to be registered. This makes it much easier to support services in different languages where robust Zookeeper binding might not exist. This also provides many of benefits of theHollywood principle. Synapse is also a sidekick style process that runs as a separate process alongside the service. Synapse is responsible for service discovery. It does this by querying Zookeeper for currently registered services and reconfigures a locally running haproxy instance. Any clients on the host that need to access another service always accesses the local haproxy instance which will route the request to an available service. Synapse’s design simplifies service implementations in that they do not need to implement any client side load balancing or failover and they do not need to depend on Zookeepr or it’s language bindings. Since SmartStack relies on Zookeeper, some registrations and discovery may fail during a partition. They point out that Zookeepr is their “Achilles heel” in this setup. Provided a service has been able to discover the other services, at least once, before a partition, it should still have a snapshot of the services after the partition and may be able to continue operating during the partition. This aspect improves the availability and reliability of the overall system. |
译者信息
单一注册处后续的一些注册服务和方法是对服务注册和发现量身定做的。大多数来自于实际的产品用例,而其他的则是一些有趣的、不同的的解决问题的方法。然而Zookeeper,Doozer和Etcd还可以用来协调分布,这些解决方案就无能为力了。 Airbnb’s SmartStackAirbnb’s SmartStack 是两个自定义工具的组合 —— Nerve和Synapse。这两个工具可以利用haproxy和Zookeeper来处理服务注册和发现. Nerve和Synapse都是用Ruby写的. Nerve是一个隔离的伙伴进程和应用服务同时运行。Nerve负责Zookeeper中的注册服务。应用程序暴露一个heathendpoint给Nerve持续监视的HTTP服务,假如服务是可用的,该服务将会在Zookeeper中被注册。 伙伴模型去除了一个服务于Zookeeper交互的需求。为了被注册,它只需要一个监视端点。这使得在不同语言环境中支持服务变得更加容易,这些服务中健壮的Zookeeper绑定也许是不存在的。这也对好莱坞规则提供了很多好处。 Synapse也是一个隔离的伙伴进程和应用服务同时运行。Synapse负责服务发现。它是通过查询Zookeeper目前已经注册的服务和重新配置一 个本地运行的代理实例来实现服务发现的。任何主机上的客户必须通过本地的代理实例来访问其他服务,这个代理会将请求路由到一个可用的服务上去。 Synapse的设计简化了服务实现,通过这种实现,他们不需要实现任何客户端的负载均衡或者失效备援,他们不需要依赖于Zookeeper或者它的语言绑定。 因为SmartStack依赖于Zookeeper,一些注册和发现也许在分片的时候会失败。他们指出在这个配置中Zookeeper是他们的“致命 伤”。假如一个服务可以在分片之前至少一次发现其他的服务,也仍然会在分片之后做一个快照,也可以在分片期间继续操作。这方面提高了整个系统的稳定性和可 靠性。 |
Netflix’s EurekaEureka is Netflix’s middle-tier, load balancing and discovery service. There is a server component as well as a smart-client that is used within application services. The server and client are written in Java which means the ideal use case would be for the services to also be imlemented in Java or another JVM compatible language. The Eureka server is the registry for services. They recommend running one Eureka server in each availability zone in AWS to form a cluster. The servers replicate their state to each other through an asynchronous model which means each instance may have a slightly, different picture of all the services at any given time. Service registration is handled by the client component. Services embed the client in their application code. At runtime, the client registers the service and periodically sends heartbeats to renew it’s leases. Service discovery is handled by the smart-client as well. It retrieves the current registrations from the server and caches them locally. The client periodically refreshes it’s state and also handles load balancing and failovers. Eureka was designed to be very resilient during failures. It favors availabilty over strong consistency and can operate under a number of different failure modes. If there is a partition within the cluster, Eureka transitions to a self-preservation state. It will allow services to be discovered and registered during a partition and when it heals, the members will merge their state again. |
译者信息
Netflix的Eureka Eureka是Netflix的中间层,用于负载均衡和服务发现。在应用服务中既存有服务器组件,也有智能客户端。服务器和客户端都采用java语言编写,这就意味着理想的应用场景是用于采用java编写的服务,或者是与JVM兼容的语言编写的服务。 Eureka服务器用于注册服务。他们推荐在AWS每个可用的区域运行一个Eureka服务器,通过它来形成聚簇。服务器通过异步模式互相复制各自的状态,这意味着在任意给定的时间点每个实例关于所有服务的状态是有细微差别的。 服务的注册由客户端组件处理。服务嵌入在客户端应用程序代码中。在运行时,客户端注册服务并周期性的发送心跳来更新它的租约。 服务的发现也由智能客户端来处理。它从服务器端检索当前注册的信息并把它们缓存在本地。客户端周期性能刷新它的状态同时处理负载均衡和失效备援。 Eureka在设计时就考虑了失败时的恢复。它依托强一致性提供良好的可用性,可在运行在多种不同的失败场景中。如果聚簇中有分片,那么Eureka就转入自我保护模式。它允许在分片期间过行服务的发现和注册,当分片结束时,聚簇中的成员会把它们的状态再次合并起来。 |
Bitly’s NSQ lookupdNSQ is a realtime, distributed messaging platform. It’s written in Go and provides an HTTP based API. While it’s not a general purpose service registration and discovery tool, they have implemented a novel model of service discovery in theirnsqlookupd agent in order for clients to find nsqd instances at runtime. In an NSQ deployment, the nsqd instances are essentially the service. These are the message stores. nsqlookupd is the service registry. Clients connect directly to nsqd instances but since these may change at runtime, clients can discover the available instances by querying nsqlookupd instances. For service registration, each nsqd instance periodically sends a heartbeat of it’s state to each of nsqlookupd instance. Their state includes their address and any queues or topics they have. For discovery, clients query each nsqlookupd instance and merge the results. What is interesting about this model is that the nsqlookupd instances do not know about each other. It’s the responsibility of the clients to merge the state returned from each stand-alone nsqlookupd instance to determine the overal state. Because each nsqd instance heartbeats its state, each nsqlookupd eventually has the same information provided each nsqd instance can contact all available nsqlookupd instances. All the previously discussed registry components all form a cluster and use strong or weakly consistent consensus protocols to maintain their state. The NSQ design is inherently weakly consistent but very tolerant to partitions. |
译者信息
Bitly的NSQ lookupd NSQ是一个实时的,分布式消息平台。它采用Go语言编写,提供了基于HTTP协议的API。它并不是一个广义上用于服务注册和发现的工具。它们在nsqlookupd代理实现了一个新颖的服务发现模型,客户端在运行时用来查找nsqd实例。 在NSQ布署时,nsqd实例是必须的服务。它些服务是消息存储,nsqlookupd用于服务注册,客户端直接连接nsqd实例,但是这些实例在运行时是变化的,客户端通过查找nsqlookupd实例来发现可用的实例。 为了服务注册,每个nsqd实例都周期性的发送它的状态信息到每个nsqlookupd实例。它们的状态信息包括它们的地址和它们所有的任何队列或者话题。 为了发现服务,客户端查寻每个nsqlookupd实例并把结果合并。 有趣的是在这个模型中,nsqlookupd实例互相并不认识。客户端要合并从每个独立的nsqlookupd实例中返回的状态信息然后决定总体的 状态。因为每个nsqd实例发送各自的心跳信息,而每个nsqlookup实例最终会持有相同的信息,这些信息提供了每个nsqd实例可以联系的全部可用 的nsqlookupd实例。 前边讨论的注册组件一起形成了聚簇,用于或强或弱的一致性协议来维护它们的状态。NSQ的设计继承了弱一致性但是很好的兼容了分片。 |
SerfSerf is a decentralized solution for service discovery and orchestration. It is also written in Go and is unique in that uses a gossip based protocol, SWIM for membership, failure detection and custom event propogation. SWIM was designed to address the unscalability of traditional heart-beating protocols. Serf consists of a single binary that is installed on all hosts. It can be run as an agent, where it joins or creates a cluster, or as a client where it can discover the members in the cluster. For service registration, a serf agent is run that joins an existing cluster. The agent is started with custom tags that can identify the hosts role, env, ip, ports, etc. Once joined to the cluster, other members will be able to see this host and it’s metadata. For discovery, serf is run with the Serf is a relatively new project and is evolving quickly. It is the only project in this post that does not have a central registry architectural style which makes it unique. Since it uses a asynchronous, gossip based protocol, it is inherently weakly-consistent but more fault tolerant and available. |
译者信息
Serf Serf是一个用于服务发现和编排的分散式的解决方案。它采用Go语言编写,是唯一一个采用基于gossip协议SWIM用于成员资格、失败发现和客户事件扩展的项目。SWIM是为了解决传统的心跳协议可扩展性差的问题而设计的。 Serf由安装在主机上的单一的二进制文件构成。它可以做为代理运行用于连接或者创建聚簇,或者做为客户端用于发现聚簇中的其它成员。 为了注册服务,serf代理运行将会连接一个已存在的聚簇。带有客户标签的代理启动可以识别主机角色、环境、IP地址、端口等,一旦连接到聚簇,其它的成员就可以看到这个主机和它的元数据。 为了发现服务,带有成员命令的serf可以返回聚簇的当前成员信息。使用这些输出的成员信息,你可以基于正在运行的代理的标签为服务发现所有的主机。 Serf是一个相对较新的项目且快速扩张。在本文中它是唯一一个没有集中注册架构风格的项目,这成就了它的特独。因此,它使用了异步的基于gossip的协议,它本质上是弱一致的但具备良好的容错能力和较高的可用性。 |
转载博客,转载地址:http://www.oschina.net/translate/service-discovery-in-the-cloud?cmp&p=1