Undo the work performed by a series of steps, which together define an eventually consistent operation, if one or more of the steps fail. Operations that follow the eventual consistency model are commonly found in cloud-hosted applications that implement complex business processes and workflows.
撤消由一系列步骤执行的工作,一起定义一个最终一致的操作,如果一个或多个步骤失败。在云托管的应用程序,实现复杂的业务和工作流程,通常操作是遵循的最终一致性模型。
Context and Problem 背景和问题
Applications running in the cloud frequently modify data. This data may be spread across an assortment of data sources held in a variety of geographic locations. To avoid contention and improve performance in a distributed environment such as this, an application should not attempt to provide strong transactional consistency. Rather, the application should implement eventual consistency. In this model, a typical business operation consists of a series of autonomous steps. While these steps are being performed the overall view of the system state may be inconsistent, but when the operation has completed and all of the steps have been executed the system should become consistent again.
在云端运行的应用程序经常修改数据,此数据可能会分散在不同地理位置的数据源中。为了避免竞争和提高性能,如在分布式环境中,应用程序不应该试图提供强大的事务一致性。相反,应用程序应该实现最终的一致性。在这个模型中,一个典型的业务操作由一系列的自主步骤组成。虽然这些步骤正在进行系统状态的整体视图可能是不一致的,但当操作已经完成,并已执行的所有步骤的系统应成为一致。
Note:
The Data Consistency Primer provides more information about why distributed transactions do not scale well, and the principles that underpin the eventual consistency model.
A significant challenge in the eventual consistency model is how to handle a step that has failed irrecoverably. In this case it may be necessary to undo all of the work completed by the previous steps in the operation. However, the data cannot simply be rolled back because other concurrent instances of the application may have since changed it. Even in cases where the data has not been changed by a concurrent instance, undoing a step might not simply be a matter of restoring the original state. It may be necessary to apply various business-specific rules (see the travel website described in the Example section).
在最终一致性模型的一个重大的挑战是如何处理一个失败的步骤。在这种情况下,它可能是必要的撤消所有的工作完成的前一步操作。然而,数据不能简单地回滚,因为应用程序的其他并发实例可能已经改变了它。即使没有得到数据的并发情况的改变,撤消一步可能不仅仅是恢复原来的状态。它可能是必要的,适用于各种业务的具体规则(见例子)。
If an operation that implements eventual consistency spans several heterogeneous data stores, undoing the steps in such an operation will require visiting each data store in turn. The work performed in every data store must be undone reliably to prevent the system from remaining inconsistent.
如果实现最终一致性跨越多个异构数据存储操作,撤消步骤等操作需要依次访问每个数据存储。在每个数据存储中执行的工作都必须被撤销,以防止系统的不一致。
Not all data affected by an operation that implements eventual consistency might be held in a database. In a Service Oriented Architecture (SOA) environment an operation may invoke an action in a service, and cause a change in the state held by that service. To undo the operation, this state change must also be undone. This may involve invoking the service again and performing another action that reverses the effects of the first.
并不是所有的数据都会影响到最终的一致性,可能在一个数据库中进行。在面向服务的体系结构(SOA)环境下的操作可能在服务调用动作,而造成的服务状态的变化。撤消该操作,该状态更改也必须被撤消。这可能涉及再次调用服务,并执行另一个反转的影响的行动。
Solution 解决方案
Implement a compensating transaction. The steps in a compensating transaction must undo the effects of the steps in the original operation. A compensating transaction might not be able to simply replace the current state with the state the system was in at the start of the operation because this approach could overwrite changes made by other concurrent instances of an application. Rather, it must be an intelligent process that takes into account any work done by concurrent instances. This process will usually be application-specific, driven by the nature of the work performed by the original operation.
实现补偿事务。补偿事务中的步骤必须撤消原操作步骤的影响。一个补偿事务可能不能够简单地替换当前运行状态的状态,因为这种方法可以覆盖应用程序的其他并发实例进行更改。相反,它必须是一个智能的过程,需要考虑到任何工作所做的并发实例。这个过程通常是特定于应用程序的,由原操作执行的工作的性质驱动的。
A common approach to implementing an eventually consistent operation that requires compensation is to use a workflow. As the original operation proceeds, the system records information about each step and how the work performed by that step can be undone. If the operation fails at any point, the workflow rewinds back through the steps it has completed and performs the work that reverses each step. Note that a compensating transaction might not have to undo the work in the exact mirror-opposite order of the original operation, and it may be possible to perform some of the undo steps in parallel.
一种常见的方法来实现一个最终一致的操作,需要补偿操作是使用一个工作流。作为原始的操作进行,系统记录有关每一步的信息,以及如何完成该步骤所执行的工作。如果在任何时候操作失败,工作流将返回到步骤已经完成和执行工作的每一步。请注意,补偿事务可能不需要在原来的操作的镜像相反的顺序中撤消工作,并可能有可能执行一些并行的撤消步骤。
Note:
This approach is similar to the Sagas strategy. A description of this strategy is available online in Clemens Vasters’ blog.
A compensating transaction is itself an eventually consistent operation and it could also fail. The system should be able to resume the compensating transaction at the point of failure and continue. It may be necessary to repeat a step that has failed, so the steps in a compensating transaction should be defined as idempotent commands. For more information about idempotency, see Idempotency Patterns on Jonathan Oliver’s blog.
补偿性事务本身就是一种最终一致的操作,也可能失败。该系统应该能够恢复在故障点的补偿性操作,并继续。可能需要重复失败步骤,那么在补偿事务的步骤应定义为幂等的命令。关于幂等性的更多信息,参见Idempotency Patterns on Jonathan Oliver’s blog。
In some cases it may not be possible to recover from a step that has failed except through manual intervention. In these situations the system should raise an alert and provide as much information as possible about the reason for the failure.
在某些情况下,它可能无法恢复从一个失败的步骤,除了通过人工干预。在这种情况下,系统应该提供预警,并尽可能为失败的原因提供多的信息。
Issues and Considerations 问题和注意事项
Consider the following points when deciding how to implement this pattern:
在决定如何实现这个模式时,考虑以下几点:
- It might not be easy to determine when a step in an operation that implements eventual consistency has failed. A step might not fail immediately, but instead it could block. It may be necessary to implement some form of time-out mechanism.
- 这可能不容易决断,在一个最终一致性操作的某步骤的操作失败。一个步骤可能不会立即失败,但它反而被堵塞。它可能需要实现某种形式的超时机制。
- Compensation logic is not easily generalized. A compensating transaction is application-specific; it relies on the application having sufficient information to be able to undo the effects of each step in a failed operation.
- 补偿逻辑是不容易普及的。补偿事务是应用程序特定的,它依赖于应用程序有足够的信息,能够撤消一次失败的操作中的每一步的影响。
- You should define the steps in a compensating transaction as idempotent commands. This enables the steps to be repeated if the compensating transaction itself fails.
- 你应该定义一个补偿事务为幂等的命令。这使得能够重复的步骤,如果补偿本身失败。
- The infrastructure that handles the steps in the original operation, and the compensating transaction, must be resilient. It must not lose the information required to compensate for a failing step, and it must be able to reliably monitor the progress of the compensation logic.
- 处理在原始操作中的步骤和补偿事务的基础必须是有弹性的。它不能失去所需的信息以弥补一个失败的步骤,它必须能够可靠地监测补偿逻辑的进程。
- A compensating transaction does not necessarily return the data in the system to the state it was in at the start of the original operation. Instead, it compensates for the work performed by the steps that completed successfully before the operation failed.
- 补偿事务不必返回到原始操作开始时的系统中的数据。相反,它弥补了工作完成的步骤,在操作失败之前完成的工作。
- The order of the steps in the compensating transaction does not necessarily have to be the mirror opposite of the steps in the original operation. For example, one data store may be more sensitive to inconsistencies than another, and so the steps in the compensating transaction that undo the changes to this store should occur first.
- 在补偿事务中步骤的顺序不一定是在原来的操作步骤的镜像。例如,一个数据存储可能比另一个更为敏感,因此,在补偿事务中,撤消对该存储的更改的步骤应该首先发生。
- Placing a short-term timeout-based lock on each resource that is required to complete an operation, and obtaining these resources in advance, can help increase the likelihood that the overall activity will succeed. The work should be performed only after all the resources have been acquired. All actions must be finalized before the locks expire.
- 在每一个资源上放置一个短期的超时锁,来完成一个操作,并提前获得这些资源,可以帮助增加整体活动的可能性。所有的资源都被收购后,才进行这工作。所有行动必须在锁到期前完成。
- Consider using retry logic that is more forgiving than usual to minimize failures that trigger a compensating transaction. If a step in an operation that implements eventual consistency fails, try handling the failure as a transient exception and repeat the step. Only abort the operation and initiate a compensating transaction if a step fails repeatedly or irrecoverably.
- 考虑使用重试逻辑,通常是的更容易的,以减少故障,触发一个补偿事务。如果在一个操作中实现最终的一致性的步骤失败,试着处理故障作为一个暂时的异常,然后重复步骤。只有放弃操作,如果一个步骤失败或者反复地启动补偿事务。
Note:
Many of the challenges and issues of implementing a compensating transaction are the same as those concerned with implementing eventual consistency. See the section Considerations for Implementing Eventual Consistency in the Data Consistency Primer for more information.
When to Use this Pattern 什么时候使用这种模式
Use this pattern only for operations that must be undone if they fail. If possible, design solutions to avoid the complexity of requiring compensating transactions (for more information, see the Data Consistency Primer).
使用此模式仅用于操作必须是如果他们不能撤销。如果可能的话,设计解决方案以避免复杂的要求补偿事务(更多信息,见数据一致性引物)。
Example 例子
A travel website enables customers to book itineraries. A single itinerary may comprise a series of flights and hotels. A customer traveling from Seattle to London and then on to Paris could perform the following steps when creating an itinerary:
一个旅游网站,使客户预订行程。一个单一的行程可包括一系列的航班和酒店。一位顾客从西雅图到伦敦,然后到巴黎,在创建行程时,可以执行以下步骤:
- Book a seat on flight F1 from Seattle to London.
- Book a seat on flight F2 from London to Paris.
- Book a seat on flight F3 from Paris to Seattle.
- Reserve a room at hotel H1 in London.
- Reserve a room at hotel H2 in Paris.
预订一张从西雅图到伦敦的航班F1。
预订一张从伦敦到巴黎的航班F2。
预订一张从巴黎到西雅图的航班F3。
储备在伦敦一间酒店H1。
在巴黎饭店预订一个房间。
These steps constitute an eventually consistent operation, although each step is essentially a separate atomic action in its own right. Therefore, as well as performing these steps, the system must also record the counter operations necessary to undo each step in case the customer decides to cancel the itinerary. The steps necessary to perform the counter operations can then run as a compensating transaction if necessary.
这些步骤构成最终一致性操作,虽然每一步基本上是在自己的权利单独的原子动作。因此,以及在执行这些步骤时,系统还必须记录必要撤消在情况下,客户决定取消行程的每个步骤中的计数器的操作。然后必须执行计数器操作步骤可以在需要时运行补偿性事务。
Notice that the steps in the compensating transaction might not be the exact opposite of the original steps, and the logic in each step in the compensating transaction must take into account any business-specific rules. For example, “unbooking” a seat on a flight might not entitle the customer to a complete refund of any money paid.
请注意,在补偿事务的步骤可能不是原来的步骤完全相反,并且在补偿事务的每个步骤都必须考虑到任何特定的业务规则逻辑。例如,“订舱”在航班上的座位可能没有资格客户支付的任何款项全额退款。
Figure 1 - Generating a compensating transaction to undo a long-running transaction to book a travel itinerary
图1 - 生成补偿事务撤消长时间运行的事务预订旅游行程
Note:
It may be possible for the steps in the compensating transaction to be performed in parallel, depending on how you have designed the compensating logic for each step.
有可能在补偿事务的步骤并行取决于你如何设计用于每个步骤中的补偿逻辑被执行。
In many business solutions, failure of a single step does not always necessitate rolling the system back by using a compensating transaction. For example, if—after having booked flights F1, F2, and F3 in the travel website scenario—the customer is unable to reserve a room at hotel H1, it is preferable to offer the customer a room at a different hotel in the same city rather than cancelling the flights. The customer may still elect to cancel (in which case the compensating transaction runs and undoes the bookings made on flights F1, F2, and F3), but this decision should be made by the customer rather than by the system.
在许多业务解决方案,单步的失败并不总是必要使用补偿事务滚动系统恢复。例如,具有预定航班F1,F2和F3在旅游网站情景客户无法预订时酒店H1如果-之后,最好是能为客户提供在同一个城市的房间在不同的酒店而不是取消航班。客户仍然可以选择取消(在这种情况下,补偿事务中运行,并撤消关于航班F1,F2和F3作出的预订),但该决定应由用户,而不是由系统进行。
Related Patterns and Guidance 相关模式和指导
The following patterns and guidance may also be relevant when implementing this pattern:
- Data Consistency Primer. The Compensating Transaction pattern is frequently used to undo operations that implement the eventual consistency model. This primer provides more information on the benefits and tradeoffs of eventual consistency.
- Scheduler-Agent-Supervisor Pattern. This pattern describes how to implement resilient systems that perform business operations that utilize distributed services and resources. In some circumstances, it may be necessary to undo the work performed by an operation by using a compensating transaction.
- Retry Pattern. Compensating transactions can be expensive to perform, and it may be possible to minimize their use by implementing an effective policy of retrying failing operations by following the Retry pattern.
More Information 更多信息
- The article Sagas on Clemens Vasters’ blog.
- The article Idempotency Patterns on Jonathan Oliver’s blog.