It's common for software systems to make remote calls to software running in different processes, probably on different machines across a network. One of the big differences between in-memory calls and remote calls is that remote calls can fail, or hang without a response until some timeout limit is reached. What's worse if you have many callers on a unresponsive supplier, then you can run out of critical resources leading to cascading failures across multiple systems. In his excellent book Release It, Michael Nygard popularized the Circuit Breaker pattern to prevent this kind of catastrophic cascade.
The basic idea behind the circuit breaker is very simple. You wrap a protected function call in a circuit breaker object, which monitors for failures. Once the failures reach a certain threshold, the circuit breaker trips, and all further calls to the circuit breaker return with an error, without the protected call being made at all. Usually you'll also want some kind of monitor alert if the circuit breaker trips.
Here's a simple example of this behavior in Ruby, protecting against timeouts.
I set up the breaker with a block (Lambda) which is the protected call.
cb = CircuitBreaker.new {|arg| @supplier.func arg}
The breaker stores the block, initializes various parameters (for thresholds, timeouts, and monitoring), and resets the breaker into its closed state.
class CircuitBreaker...
attr_accessor :invocation_timeout, :failure_threshold, :monitor def initialize &block @circuit = block @invocation_timeout = 0.01 @failure_threshold = 5 @monitor = acquire_monitor reset end
Calling the circuit breaker will call the underlying block if the circuit is closed, but return an error if it's open
# client code aCircuitBreaker.call(5)
class CircuitBreaker...
def call args case state when :closed begin do_call args rescue Timeout::Error record_failure raise $! end when :open then raise CircuitBreaker::Open else raise "Unreachable Code" end end def do_call args result = Timeout::timeout(@invocation_timeout) do @circuit.call args end reset return result end
Should we get a timeout, we increment the failure counter, successful calls reset it back to zero.
class CircuitBreaker...
def record_failure @failure_count += 1 @monitor.alert(:open_circuit) if :open == state end def reset @failure_count = 0 @monitor.alert :reset_circuit end
I determine the state of the breaker comparing the failure count to the threshold
class CircuitBreaker...
def state (@failure_count >= @failure_threshold) ? :open : :closed end
This simple circuit breaker avoids making the protected call when the circuit is open, but would need an external intervention to reset it when things are well again. This is a reasonable approach with electrical circuit breakers in buildings, but for software circuit breakers we can have the breaker itself detect if the underlying calls are working again. We can implement this self-resetting behavior by trying the protected call again after a suitable interval, and resetting the breaker should it succeed.
Creating this kind of breaker means adding a threshold for trying the reset and setting up a variable to hold the time of the last error.
class ResetCircuitBreaker...
def initialize &block @circuit = block @invocation_timeout = 0.01 @failure_threshold = 5 @monitor = BreakerMonitor.new @reset_timeout = 0.1 reset end def reset @failure_count = 0 @last_failure_time = nil @monitor.alert :reset_circuit end
There is now a third state present - half open - meaning the circuit is ready to make a real call as trial to see if the problem is fixed.
class ResetCircuitBreaker...
def state case when (@failure_count >= @failure_threshold) && (Time.now - @last_failure_time) > @reset_timeout :half_open when (@failure_count >= @failure_threshold) :open else :closed end end
Asked to call in the half-open state results in a trial call, which will either reset the breaker if successful or restart the timeout if not.
class ResetCircuitBreaker...
def call args case state when :closed, :half_open begin do_call args rescue Timeout::Error record_failure raise $! end when :open raise CircuitBreaker::Open else raise "Unreachable" end end def record_failure @failure_count += 1 @monitor.alert(:open_circuit) if :open == state @last_failure_time = Time.now end
This example is a simple explanatory one, in practice circuit breakers provide a good bit more features and parameterization. Often they will protect against a range of errors that protected call could raise, such as network connection failures. Not all errors should trip the circuit, some should reflect normal failures and be dealt with as part of regular logic.
With lots of traffic, you can have problems with many calls just waiting for the initial timeout. Since remote calls are often slow, it's often a good idea to put each call on a different thread using a future or promise to handle the results when they come back. By drawing these threads from a thread pool, you can arrange for the circuit to break when the thread pool is exhausted.
The example shows a simple way to trip the breaker — a count that resets on a successful call. A more sophisticated approach might look at frequency of errors, tripping once you get, say, a 50% failure rate. You might also have different thresholds for different errors, such as a threshold of 10 for timeouts but 3 for connection failures.
The example I've shown is a circuit breaker for synchronous calls, but circuit breakers are also useful for asynchronous communications. A common technique here is to put all requests on a queue, which the supplier consumes at its speed - a useful technique to avoid overloading servers. In this case the circuit breaks when the queue fills up.
On their own, circuit breakers help reduce resources tied up in operations which are likely to fail. You avoid waiting on timeouts for the client, and a broken circuit avoids putting load on a struggling server. I talk here about remote calls, which are a common case for circuit breakers, but they can be used in any situation where you want to protect parts of a system from failures in other parts.
Circuit breakers are a valuable place for monitoring. Any change in breaker state should be logged and breakers should reveal details of their state for deeper monitoring. Breaker behavior is often a good source of warnings about deeper troubles in the environment. Operations staff should be able to trip or reset breakers.
Breakers on their own are valuable, but clients using them need to react to breaker failures. As with any remote invocation you need to consider what to do in case of failure. Does it fail the operation you're carrying out, or are there workarounds you can do? A credit card authorization could be put on a queue to deal with later, failure to get some data may be mitigated by showing some stale data that's good enough to display.
Further Reading
The netflix tech blog contains a lot of useful information on improving reliability of systems with lots of services. Their Dependency Command talks about using circuit breakers and a thread pool limit.
Netflix have open-sourced Hystrix, a sophisticated tool for dealing with latency and fault tolerance for distributed systems. It includes an implementation of the circuit breaker pattern with the thread pool limit
There are other open-source implementations of the circuit breaker pattern inRuby, Java, Grails Plugin, C#, AspectJ, and Scala
reference from:http://martinfowler.com/bliki/CircuitBreaker.html
appendix: source address:https://github.com/Comcast/jrugged/blob/master/jrugged-core/src/main/java/org/fishwife/jrugged/CircuitBreaker.java
/* CircuitBreaker.java * * Copyright 2009-2012 Comcast Interactive Media, LLC. * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. * You may obtain a copy of the License at * * http://www.apache.org/licenses/LICENSE-2.0 * * Unless required by applicable law or agreed to in writing, software * distributed under the License is distributed on an "AS IS" BASIS, * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. * See the License for the specific language governing permissions and * limitations under the License. */ package org.fishwife.jrugged; import java.io.PrintWriter; import java.io.StringWriter; import java.util.ArrayList; import java.util.Collection; import java.util.Collections; import java.util.List; import java.util.concurrent.Callable; import java.util.concurrent.atomic.AtomicLong; /** A {@link CircuitBreaker} can be used with a service to throttle traffic * to a failed subsystem (particularly one we might not be able to monitor, * such as a peer system which must be accessed over the network). Service * calls are wrapped by the <code>CircuitBreaker</code>. * <p> * When everything is operating normally, the <code>CircuitBreaker</code> * is CLOSED and the calls are allowed through. * <p> * When a call fails, however, the <code>CircuitBreaker</code> "trips" and * moves to an OPEN state. Client calls are not allowed through while * the <code>CircuitBreaker</code> is OPEN. * <p> * After a certain "cooldown" period, the <code>CircuitBreaker</code> will * transition to a HALF_CLOSED state, where one call is allowed to go through * as a test. If that call succeeds, the <code>CircuitBreaker</code> moves * back to the CLOSED state; if it fails, it moves back to the OPEN state * for another cooldown period. * <p> * Sample usage: * <pre> public class Service implements Monitorable { private CircuitBreaker cb = new CircuitBreaker(); public String doSomething(final Object arg) throws Exception { return cb.invoke(new Callable<String>() { public String call() { // make the call ... } }); } public Status getStatus() { return cb.getStatus(); } } * </pre> */ public class CircuitBreaker implements MonitoredService, ServiceWrapper { /** * Represents whether a {@link CircuitBreaker} is OPEN, HALF_CLOSED, * or CLOSED. */ protected enum BreakerState { /** An OPEN breaker has tripped and will not allow requests through. */ OPEN, /** A HALF_CLOSED breaker has completed its cooldown period and will allow one request through as a "test request." */ HALF_CLOSED, /** A CLOSED breaker is operating normally and allowing requests through. */ CLOSED } private Throwable tripException = null; /** * Returns the last exception that caused the breaker to trip, NULL if never tripped. * * @return Throwable */ public Throwable getTripException() { return tripException; } /** * Returns the last exception that caused the breaker to trip, empty <code>String </code> * if never tripped. * * @return Throwable */ public String getTripExceptionAsString() { if (tripException == null) { return ""; } else { return getFullStackTrace(tripException); } } /** Current state of the breaker. */ protected volatile BreakerState state = BreakerState.CLOSED; /** The time the breaker last tripped, in milliseconds since the epoch. */ protected AtomicLong lastFailure = new AtomicLong(0L); /** How many times the breaker has tripped during its lifetime. */ protected AtomicLong openCount = new AtomicLong(0L); /** How long the cooldown period is in milliseconds. */ protected AtomicLong resetMillis = new AtomicLong(15 * 1000L); /** The {@link FailureInterpreter} to use to determine whether a given failure should cause the breaker to trip. */ protected FailureInterpreter failureInterpreter = new DefaultFailureInterpreter(); /** Helper class to allow throwing an application-specific * exception rather than the default {@link * CircuitBreakerException}. */ protected CircuitBreakerExceptionMapper<? extends Exception> exceptionMapper; protected List<CircuitBreakerNotificationCallback> cbNotifyList = Collections.synchronizedList(new ArrayList<CircuitBreakerNotificationCallback>()); private boolean isHardTrip; /** * Bypass this CircuitBreaker - used for testing, or other operational * situations where verification of the Break might be required. */ protected boolean byPass = false; /** * Whether the "test" attempt permitted in the HALF_CLOSED state * is currently in-flight. */ protected boolean isAttemptLive = false; /** The default name if none is provided. */ private static final String DEFAULT_NAME="CircuitBreaker"; /** The name for the CircuitBreaker. */ private String name = DEFAULT_NAME; /** Creates a {@link CircuitBreaker} with a {@link * DefaultFailureInterpreter} and the default "tripped" exception * behavior (throwing a {@link CircuitBreakerException}). */ public CircuitBreaker() { } /** Creates a {@link CircuitBreaker} with a {@link * DefaultFailureInterpreter} and the default "tripped" exception * behavior (throwing a {@link CircuitBreakerException}). * @param name the name for the {@link CircuitBreaker}. */ public CircuitBreaker(String name) { this.name = name; } /** Creates a {@link CircuitBreaker} with the specified {@link * FailureInterpreter} and the default "tripped" exception * behavior (throwing a {@link CircuitBreakerException}). * @param fi the <code>FailureInterpreter</code> to use when * determining whether a specific failure ought to cause the * breaker to trip */ public CircuitBreaker(FailureInterpreter fi) { failureInterpreter = fi; } /** Creates a {@link CircuitBreaker} with the specified {@link * FailureInterpreter} and the default "tripped" exception * behavior (throwing a {@link CircuitBreakerException}). * @param name the name for the {@link CircuitBreaker}. * @param fi the <code>FailureInterpreter</code> to use when * determining whether a specific failure ought to cause the * breaker to trip */ public CircuitBreaker(String name, FailureInterpreter fi) { this.name = name; failureInterpreter = fi; } /** Creates a {@link CircuitBreaker} with a {@link * DefaultFailureInterpreter} and using the supplied {@link * CircuitBreakerExceptionMapper} when client calls are made * while the breaker is tripped. * @param name the name for the {@link CircuitBreaker}. * @param mapper helper used to translate a {@link * CircuitBreakerException} into an application-specific one */ public CircuitBreaker(String name, CircuitBreakerExceptionMapper<? extends Exception> mapper) { this.name = name; exceptionMapper = mapper; } /** Creates a {@link CircuitBreaker} with the provided {@link * FailureInterpreter} and using the provided {@link * CircuitBreakerExceptionMapper} when client calls are made * while the breaker is tripped. * @param name the name for the {@link CircuitBreaker}. * @param fi the <code>FailureInterpreter</code> to use when * determining whether a specific failure ought to cause the * breaker to trip * @param mapper helper used to translate a {@link * CircuitBreakerException} into an application-specific one */ public CircuitBreaker(String name, FailureInterpreter fi, CircuitBreakerExceptionMapper<? extends Exception> mapper) { this.name = name; failureInterpreter = fi; exceptionMapper = mapper; } /** Wrap the given service call with the {@link CircuitBreaker} * protection logic. * @param c the {@link Callable} to attempt * @return whatever c would return on success * @throws CircuitBreakerException if the * breaker was OPEN or HALF_CLOSED and this attempt wasn't the * reset attempt * @throws Exception if <code>c</code> throws one during * execution */ public <V> V invoke(Callable<V> c) throws Exception { if (!byPass) { if (!allowRequest()) { throw mapException(new CircuitBreakerException()); } try { V result = c.call(); close(); return result; } catch (Throwable cause) { handleFailure(cause); } throw new IllegalStateException("not possible"); } else { return c.call(); } } /** Wrap the given service call with the {@link CircuitBreaker} * protection logic. * @param r the {@link Runnable} to attempt * @throws CircuitBreakerException if the * breaker was OPEN or HALF_CLOSED and this attempt wasn't the * reset attempt * @throws Exception if <code>c</code> throws one during * execution */ public void invoke(Runnable r) throws Exception { if (!byPass) { if (!allowRequest()) { throw mapException(new CircuitBreakerException()); } try { r.run(); close(); return; } catch (Throwable cause) { handleFailure(cause); } throw new IllegalStateException("not possible"); } else { r.run(); } } /** Wrap the given service call with the {@link CircuitBreaker} * protection logic. * @param r the {@link Runnable} to attempt * @param result what to return after <code>r</code> succeeds * @return result * @throws CircuitBreakerException if the * breaker was OPEN or HALF_CLOSED and this attempt wasn't the * reset attempt * @throws Exception if <code>c</code> throws one during * execution */ public <V> V invoke(Runnable r, V result) throws Exception { if (!byPass) { if (!allowRequest()) { throw mapException(new CircuitBreakerException()); } try { r.run(); close(); return result; } catch (Throwable cause) { handleFailure(cause); } throw new IllegalStateException("not possible"); } else { r.run(); return result; } } /** * When called with true - causes the {@link CircuitBreaker} to byPass * its functionality allowing requests to be executed unmolested * until the <code>CircuitBreaker</code> is reset or the byPass * is manually set to false. * * @param b Set this breaker into bypass mode */ public void setByPassState(boolean b) { byPass = b; notifyBreakerStateChange(getStatus()); } /** * Get the current state of the {@link CircuitBreaker} byPass * * @return boolean the byPass flag's current value */ public boolean getByPassState() { return byPass; } /** * Causes the {@link CircuitBreaker} to trip and OPEN; no new * requests will be allowed until the <code>CircuitBreaker</code> * resets. */ public void trip() { if (state != BreakerState.OPEN) { openCount.getAndIncrement(); } state = BreakerState.OPEN; lastFailure.set(System.currentTimeMillis()); isAttemptLive = false; notifyBreakerStateChange(getStatus()); } /** * Manually trips the CircuitBreaker until {@link #reset()} is invoked. */ public void tripHard() { this.trip(); isHardTrip = true; } /** * Returns the last time the breaker tripped OPEN, measured in * milliseconds since the Epoch. * @return long the last failure time */ public long getLastTripTime() { return lastFailure.get(); } /** * Returns the number of times the breaker has tripped OPEN during * its lifetime. * @return long the number of times the circuit breaker tripped */ public long getTripCount() { return openCount.get(); } /** * Manually set the breaker to be reset and ready for use. This * is only useful after a manual trip otherwise the breaker will * trip automatically again if the service is still unavailable. * Just like a real breaker. WOOT!!! */ public void reset() { state = BreakerState.CLOSED; isHardTrip = false; byPass = false; isAttemptLive = false; notifyBreakerStateChange(getStatus()); } /** * Returns the current {@link org.fishwife.jrugged.Status} of the * {@link CircuitBreaker}. In this case, it really refers to the * status of the client service. If the * <code>CircuitBreaker</code> is CLOSED, we report that the * client is UP; if it is HALF_CLOSED, we report that the client * is DEGRADED; if it is OPEN, we report the client is DOWN. * * @return Status the current status of the breaker */ public Status getStatus() { return getServiceStatus().getStatus(); } /** * Get the current {@link org.fishwife.jrugged.ServiceStatus} of the * {@link CircuitBreaker}, including the name, * {@link org.fishwife.jrugged.Status}, and reason. * @return the {@link org.fishwife.jrugged.ServiceStatus}. */ public ServiceStatus getServiceStatus() { boolean canSendProbeRequest = !isHardTrip && lastFailure.get() > 0 && (System.currentTimeMillis() - lastFailure.get() >= resetMillis.get()); if (byPass) { return new ServiceStatus(name, Status.DEGRADED, "Bypassed"); } switch(state) { case OPEN: return (canSendProbeRequest ? new ServiceStatus(name, Status.DEGRADED, "Send Probe Request") : new ServiceStatus(name, Status.DOWN, "Open")); case HALF_CLOSED: return new ServiceStatus(name, Status.DEGRADED, "Half Closed"); case CLOSED: default: return new ServiceStatus(name, Status.UP); } } /** * Returns the cooldown period in milliseconds. * @return long */ public long getResetMillis() { return resetMillis.get(); } /** Sets the reset period to the given number of milliseconds. The * default is 15,000 (make one retry attempt every 15 seconds). * * @param l number of milliseconds to "cool down" after tripping * before allowing a "test request" through again */ public void setResetMillis(long l) { resetMillis.set(l); } /** Returns a {@link String} representation of the breaker's * status; potentially useful for exposing to monitoring software. * @return <code>String</code> which is <code>"GREEN"</code> if * the breaker is CLOSED; <code>"YELLOW"</code> if the breaker * is HALF_CLOSED; and <code>"RED"</code> if the breaker is * OPEN (tripped). */ public String getHealthCheck() { return getStatus().getSignal(); } /** * Specifies the failure tolerance limit for the {@link * DefaultFailureInterpreter} that comes with a {@link * CircuitBreaker} by default. * @see DefaultFailureInterpreter * @param limit the number of tolerated failures in a window */ public void setLimit(int limit) { FailureInterpreter fi = getFailureInterpreter(); if (!(fi instanceof DefaultFailureInterpreter)) { throw new IllegalStateException("setLimit() not supported: this CircuitBreaker's FailureInterpreter isn't a DefaultFailureInterpreter."); } ((DefaultFailureInterpreter)fi).setLimit(limit); } /** * Specifies a set of {@link Throwable} classes that should not * be considered failures by the {@link CircuitBreaker}. * @see DefaultFailureInterpreter * @param ignore a {@link java.util.Collection} of {@link Throwable} * classes */ public void setIgnore(Collection<Class<? extends Throwable>> ignore) { FailureInterpreter fi = getFailureInterpreter(); if (!(fi instanceof DefaultFailureInterpreter)) { throw new IllegalStateException("setIgnore() not supported: this CircuitBreaker's FailureInterpreter isn't a DefaultFailureInterpreter."); } @SuppressWarnings("unchecked") Class<? extends Throwable>[] classes = new Class[ignore.size()]; int i = 0; for(Class<? extends Throwable> c : ignore) { classes[i] = c; i++; } ((DefaultFailureInterpreter)fi).setIgnore(classes); } /** * Specifies the tolerance window in milliseconds for the {@link * DefaultFailureInterpreter} that comes with a {@link * CircuitBreaker} by default. * @see DefaultFailureInterpreter * @param windowMillis length of the window in milliseconds */ public void setWindowMillis(long windowMillis) { FailureInterpreter fi = getFailureInterpreter(); if (!(fi instanceof DefaultFailureInterpreter)) { throw new IllegalStateException("setWindowMillis() not supported: this CircuitBreaker's FailureInterpreter isn't a DefaultFailureInterpreter."); } ((DefaultFailureInterpreter)fi).setWindowMillis(windowMillis); } /** * Specifies a helper that determines whether a given failure will * cause the breaker to trip or not. * * @param failureInterpreter the {@link FailureInterpreter} to use */ public void setFailureInterpreter(FailureInterpreter failureInterpreter) { this.failureInterpreter = failureInterpreter; } /** * Get the failure interpreter for this instance. The failure * interpreter provides the configuration for determining which * exceptions trip the circuit breaker, in what time interval, * etc. * * @return {@link FailureInterpreter} for this instance or null if no * failure interpreter was set. */ public FailureInterpreter getFailureInterpreter() { return this.failureInterpreter; } /** * A helper that converts CircuitBreakerExceptions into a known * 'application' exception. * * @param mapper my converter object */ public void setExceptionMapper(CircuitBreakerExceptionMapper<? extends Exception> mapper) { this.exceptionMapper = mapper; } /** * Add an interested party for {@link CircuitBreaker} events, like up, * down, degraded status state changes. * * @param listener an interested party for {@link CircuitBreaker} status events. */ public void addListener(CircuitBreakerNotificationCallback listener) { cbNotifyList.add(listener); } /** * Set a list of interested parties for {@link CircuitBreaker} events, like up, * down, degraded status state changes. * * @param listeners a list of interested parties for {@link CircuitBreaker} status events. */ public void setListeners(ArrayList<CircuitBreakerNotificationCallback> listeners) { cbNotifyList = Collections.synchronizedList(listeners); } /** * Get the helper that converts {@link CircuitBreakerException}s into * application-specific exceptions. * @return {@link CircuitBreakerExceptionMapper} my converter object, or * <code>null</code> if one is not currently set. */ public CircuitBreakerExceptionMapper<? extends Exception> getExceptionMapper(){ return this.exceptionMapper; } private Exception mapException(CircuitBreakerException cbe) { if (exceptionMapper == null) return cbe; return exceptionMapper.map(this, cbe); } private void handleFailure(Throwable cause) throws Exception { if (failureInterpreter == null || failureInterpreter.shouldTrip(cause)) { this.tripException = cause; trip(); } if (isAttemptLive) { close(); } if (cause instanceof Exception) { throw (Exception)cause; } else if (cause instanceof Error) { throw (Error)cause; } else { throw (RuntimeException)cause; } } /** * Reports a successful service call to the {@link CircuitBreaker}, * putting the <code>CircuitBreaker</code> back into the CLOSED * state serving requests. */ private void close() { state = BreakerState.CLOSED; isAttemptLive = false; notifyBreakerStateChange(getStatus()); } private synchronized boolean canAttempt() { if (!(BreakerState.HALF_CLOSED == state) || isAttemptLive) { return false; } isAttemptLive = true; return true; } private void notifyBreakerStateChange(Status status) { if (cbNotifyList != null && cbNotifyList.size() >= 1) { for (CircuitBreakerNotificationCallback notifyObject : cbNotifyList) { notifyObject.notify(status); } } } /** * @return boolean whether the breaker will allow a request * through or not. */ private boolean allowRequest() { if (this.isHardTrip) { return false; } else if (BreakerState.CLOSED == state) { return true; } if (BreakerState.OPEN == state && System.currentTimeMillis() - lastFailure.get() >= resetMillis.get()) { state = BreakerState.HALF_CLOSED; } return canAttempt(); } private String getFullStackTrace(Throwable t) { StringWriter sw = new StringWriter(); t.printStackTrace(new PrintWriter(sw)); return sw.toString(); } }