本博客逐步分析Akka Streams的源码,当然必须循序渐进,且估计会分很多篇,毕竟Akka Streams还是比较复杂的。
implicit val system = ActorSystem("QuickStart") implicit val materializer = ActorMaterializer()
在使用Streams相关的API时,上面两个对象是必须创建的。ActorSystem不再说了,我们来看ActorMaterializer。
/** * An ActorMaterializer takes a stream blueprint and turns it into a running stream. */ abstract class ActorMaterializer extends Materializer with MaterializerLoggingProvider
ActorMaterializer把一个流式计算的blueprint(大纲、蓝本?)转换成一个运行的流,简单来说这就是用来编译akka提供的流式API的。它继承一个类和一个特质。MaterializerLoggingProvider就不看了,就是提供日志相关的功能的。
/** * Materializer SPI (Service Provider Interface) * * Binary compatibility is NOT guaranteed on materializer internals. * * Custom materializer implementations should be aware that the materializer SPI * is not yet final and may change in patch releases of Akka. Please note that this * does not impact end-users of Akka streams, only implementors of custom materializers, * with whom the Akka team co-ordinates such changes. * * Once the SPI is final this notice will be removed. */ abstract class Materializer { /** * The `namePrefix` shall be used for deriving the names of processing * entities that are created during materialization. This is meant to aid * logging and failure reporting both during materialization and while the * stream is running. */ def withNamePrefix(name: String): Materializer /** * This method interprets the given Flow description and creates the running * stream. The result can be highly implementation specific, ranging from * local actor chains to remote-deployed processing networks. */ def materialize[Mat](runnable: Graph[ClosedShape, Mat]): Mat /** * This method interprets the given Flow description and creates the running * stream using an explicitly provided [[Attributes]] as top level (least specific) attributes that * will be defaults for the materialized stream. * The result can be highly implementation specific, ranging from local actor chains to remote-deployed * processing networks. */ def materialize[Mat]( runnable: Graph[ClosedShape, Mat], @deprecatedName('initialAttributes) defaultAttributes: Attributes): Mat /** * Running a flow graph will require execution resources, as will computations * within Sources, Sinks, etc. This [[scala.concurrent.ExecutionContextExecutor]] * can be used by parts of the flow to submit processing jobs for execution, * run Future callbacks, etc. * * Note that this is not necessarily the same execution context the stream operator itself is running on. */ implicit def executionContext: ExecutionContextExecutor /** * Interface for operators that need timer services for their functionality. Schedules a * single task with the given delay. * * @return A [[akka.actor.Cancellable]] that allows cancelling the timer. Cancelling is best effort, if the event * has been already enqueued it will not have an effect. */ def scheduleOnce(delay: FiniteDuration, task: Runnable): Cancellable /** * Interface for operators that need timer services for their functionality. Schedules a * repeated task with the given interval between invocations. * * @return A [[akka.actor.Cancellable]] that allows cancelling the timer. Cancelling is best effort, if the event * has been already enqueued it will not have an effect. */ def schedulePeriodically(initialDelay: FiniteDuration, interval: FiniteDuration, task: Runnable): Cancellable }
Materializer非常重要,所以这里贴出完整代码。Materializer是一个SPI(服务提供接口),Materializer内部不保证二进制兼容,也就是说版本可能不兼容。但这个定义比较奇怪,既然所有的方法都没有实现,那是不是用trait比较合适呢?为啥是一个抽象类呢?
Materializer一共6个方法,我们一个个看。
withNamePrefix在派生处理实体的时候用到,简单来说就是流中的每个计算实体(Source/Sink/Flow/RunnableGraph)等映射成一个actor的时候,这些actor名称需要一个前缀,withNamePrefix就是用来设置这个前缀的。
materialize方法有两个实现。这个方法用来解释给定的Flow定义,并创建运行的流的。创建的结果会高度依赖具体的实现,可能是本地的actor链也可能是远程部署的加工过程。
executionContext用来提供异步执行环境。
scheduleOnce/schedulePeriodically用来提供定时调度的功能,毕竟还是有一些操作需要时间服务的。
Materializer貌似很简单,就只有上面几个接口,但其核心的接口就是materialize方法,毕竟这是用来编译akka流的。如果你使用过storm的话,就一定知道Storm有一个拓扑编译器。其功能跟这个差不多。
我们再来看ActorMaterializer。
ActorMaterializer还提供了三个比较重要的接口:actorOf/system/supervisor。其中actorOf接收MaterializationContext、Props作为参数,创建一个Actor;system返回与ActorMaterializer关联的ActorSystem;supervisor功能后面再研究。MaterializationContext这个参数还是比较有意思的,可以理解成物理化(编译流)时的上下文。
/** * Context parameter to the `create` methods of sources and sinks. * * INTERNAL API */ @InternalApi private[akka] case class MaterializationContext( materializer: Materializer, effectiveAttributes: Attributes, islandName: String)
MaterializationContext一共三个变量,materializer不再说了,就是当前上下文关联的Materializer。effectiveAttributes就是用来提供参数的,只不过又封装成了Attributes。islandName比较有意思,单纯从命名上来翻译,它是“岛名称”。那什么是“岛”(island)呢?后面分析时,也会遇到这个“island”,大家这里稍微留意一下就行了。
另外在Materializer这个抽象类中materialize方法有一个参数需要我们研究下:runnable: Graph[ClosedShape, Mat]。
/** * Not intended to be directly extended by user classes * * @see [[akka.stream.stage.GraphStage]] */ trait Graph[+S <: Shape, +M]
Graph是一个trait,它有两个类型参数:S/M。能不能起个好点的名字,SM,哈哈。其中S是Shape的子类,从名称来看,好像是graph的形状。简单分析一下Graph的主体代码,就会发现它有两个方法特别重要:shape、traversalBuilder。其中shape返回图的形状,虽然我们还不知道形状是什么,但ClosedShape是它的一个形式;traversalBuilder
/** * INTERNAL API. * * Every materializable element must be backed by a stream layout module */ private[stream] def traversalBuilder: TraversalBuilder
traversalBuilder返回一个TraversalBuilder,注释说每一个可物理化的元素都必须被流布局模块支持。TraversalBuilder从名称来看,是一个可遍历的编译器。估计就是一个拓扑排序。
下面我们来看Shape究竟是什么。
/** * A Shape describes the inlets and outlets of a [[Graph]]. In keeping with the * philosophy that a Graph is a freely reusable blueprint, everything that * matters from the outside are the connections that can be made with it, * otherwise it is just a black box. */ abstract class Shape
一个Shape描述了Graph的入口和出口,按照Graph是自由重用的蓝本这样的哲学,从外部来看Graph是可被链接的就非常重要了,否则它就只是一个黑盒子。我们姑且认为它只是用来定义Graph的输入输出的吧。而且这个trait最重要的两个方法也就是inlets/outlets。
/** * Scala API: get a list of all input ports */ def inlets: immutable.Seq[Inlet[_]] /** * Scala API: get a list of all output ports */ def outlets: immutable.Seq[Outlet[_]]
那Inlet和Outlet又是什么呢?
final class Inlet[T] private (val s: String) extends InPort { def carbonCopy(): Inlet[T] = { val in = Inlet[T](s) in.mappedTo = this in } /** * INTERNAL API. */ def as[U]: Inlet[U] = this.asInstanceOf[Inlet[U]] override def toString: String = s + "(" + this.hashCode + s")" + (if (mappedTo eq this) "" else s" mapped to $mappedTo") } /** * An input port of a StreamLayout.Module. This type logically belongs * into the impl package but must live here due to how `sealed` works. * It is also used in the Java DSL for “untyped Inlets” as a work-around * for otherwise unreasonable existential types. */ sealed abstract class InPort { self: Inlet[_] ⇒ final override def hashCode: Int = super.hashCode final override def equals(that: Any): Boolean = this eq that.asInstanceOf[AnyRef] /** * INTERNAL API */ @volatile private[stream] var id: Int = -1 /** * INTERNAL API */ @volatile private[stream] var mappedTo: InPort = this /** * INTERNAL API */ private[stream] def inlet: Inlet[_] = this }
Inlet好像也没有提供什么方法啊,貌似目前来看比较重要的就只有一个id字段,其他的都是变量返回、复制的。
好像分析到这里,Materializer、Shape、Graph都还比较抽象,还得来看Materializer的具体实现,毕竟它只是一个trait。
def apply(materializerSettings: ActorMaterializerSettings, namePrefix: String)(implicit context: ActorRefFactory): ActorMaterializer = { val haveShutDown = new AtomicBoolean(false) val system = actorSystemOf(context) new PhasedFusingActorMaterializer( system, materializerSettings, system.dispatchers, actorOfStreamSupervisor(materializerSettings, context, haveShutDown), haveShutDown, FlowNames(system).name.copy(namePrefix)) }
通过分析ActorMaterializer的apply方法,我们发现最终调用了上面这个版本的apply,可以看到,最终创建了PhasedFusingActorMaterializer,而且是new出来的,所以这个类一定是一个具体类,也就是说所有的方法和字段都有对应的实现和定义。PhasedFusingActorMaterializer的名称其实还挺有意思的,直译的话就是分段熔断actor物化器。分段好理解,熔断就不知道怎么理解了,或者翻译错了?哈哈,我也不知道。
@InternalApi private[akka] case class PhasedFusingActorMaterializer( system: ActorSystem, override val settings: ActorMaterializerSettings, dispatchers: Dispatchers, supervisor: ActorRef, haveShutDown: AtomicBoolean, flowNames: SeqActorName) extends ExtendedActorMaterializer
PhasedFusingActorMaterializer居然是一个case class,那还new干啥。它继承了ExtendedActorMaterializer。ExtendedActorMaterializer源码不再贴出来,它就是重新覆盖了ActorMaterializer的几个方法,并实现了一个方法actorOf。我们就来看这个actorOf
@InternalApi private[akka] override def actorOf(context: MaterializationContext, props: Props): ActorRef = { val effectiveProps = props.dispatcher match { case Dispatchers.DefaultDispatcherId ⇒ props.withDispatcher(context.effectiveAttributes.mandatoryAttribute[ActorAttributes.Dispatcher].dispatcher) case ActorAttributes.IODispatcher.dispatcher ⇒ // this one is actually not a dispatcher but a relative config key pointing containing the actual dispatcher name props.withDispatcher(settings.blockingIoDispatcher) case _ ⇒ props } actorOf(effectiveProps, context.islandName) }
其实它就是用特定的dispatcher替换了Props中的值,然后调用另一个actorOf创建actor。
@InternalApi private[akka] def actorOf(props: Props, name: String): ActorRef = { supervisor match { case ref: LocalActorRef ⇒ ref.underlying.attachChild(props, name, systemService = false) case ref: RepointableActorRef ⇒ if (ref.isStarted) ref.underlying.asInstanceOf[ActorCell].attachChild(props, name, systemService = false) else { implicit val timeout = ref.system.settings.CreationTimeout val f = (supervisor ? StreamSupervisor.Materialize(props, name)).mapTo[ActorRef] Await.result(f, timeout.duration) } case unknown ⇒ throw new IllegalStateException(s"Stream supervisor must be a local actor, was [${unknown.getClass.getName}]") } }
这个actorOf我们也不深入分析了,反正就是在创建actor。下面我们来看materialize在PhasedFusingActorMaterializer中的实现。
override def materialize[Mat]( graph: Graph[ClosedShape, Mat], defaultAttributes: Attributes, defaultPhase: Phase[Any], phases: Map[IslandTag, Phase[Any]]): Mat = { if (isShutdown) throw new IllegalStateException("Trying to materialize stream after materializer has been shutdown") val islandTracking = new IslandTracking(phases, settings, defaultAttributes, defaultPhase, this, islandNamePrefix = createFlowName() + "-") var current: Traversal = graph.traversalBuilder.traversal val attributesStack = new java.util.ArrayDeque[Attributes](8) attributesStack.addLast(defaultAttributes and graph.traversalBuilder.attributes) val traversalStack = new java.util.ArrayDeque[Traversal](16) traversalStack.addLast(current) val matValueStack = new java.util.ArrayDeque[Any](8) if (Debug) { println(s"--- Materializing layout:") TraversalBuilder.printTraversal(current) println(s"--- Start materialization") } // Due to how Concat works, we need a stack. This probably can be optimized for the most common cases. while (!traversalStack.isEmpty) { current = traversalStack.removeLast() while (current ne EmptyTraversal) { var nextStep: Traversal = EmptyTraversal current match { case MaterializeAtomic(mod, outToSlot) ⇒ if (Debug) println(s"materializing module: $mod") val matAndStage = islandTracking.getCurrentPhase.materializeAtomic(mod, attributesStack.getLast) val logic = matAndStage._1 val matValue = matAndStage._2 if (Debug) println(s" materialized value is $matValue") matValueStack.addLast(matValue) val stageGlobalOffset = islandTracking.getCurrentOffset wireInlets(islandTracking, mod, logic) wireOutlets(islandTracking, mod, logic, stageGlobalOffset, outToSlot) if (Debug) println(s"PUSH: $matValue => $matValueStack") case Concat(first, next) ⇒ if (next ne EmptyTraversal) traversalStack.add(next) nextStep = first case Pop ⇒ val popped = matValueStack.removeLast() if (Debug) println(s"POP: $popped => $matValueStack") case PushNotUsed ⇒ matValueStack.addLast(NotUsed) if (Debug) println(s"PUSH: NotUsed => $matValueStack") case transform: Transform ⇒ val prev = matValueStack.removeLast() val result = transform(prev) matValueStack.addLast(result) if (Debug) println(s"TRFM: $matValueStack") case compose: Compose ⇒ val second = matValueStack.removeLast() val first = matValueStack.removeLast() val result = compose(first, second) matValueStack.addLast(result) if (Debug) println(s"COMP: $matValueStack") case PushAttributes(attr) ⇒ attributesStack.addLast(attributesStack.getLast and attr) if (Debug) println(s"ATTR PUSH: $attr") case PopAttributes ⇒ attributesStack.removeLast() if (Debug) println(s"ATTR POP") case EnterIsland(tag) ⇒ islandTracking.enterIsland(tag, attributesStack.getLast) case ExitIsland ⇒ islandTracking.exitIsland() case _ ⇒ } current = nextStep } } def shutdownWhileMaterializingFailure = new IllegalStateException("Materializer shutdown while materializing stream") try { islandTracking.getCurrentPhase.onIslandReady() islandTracking.allNestedIslandsReady() if (Debug) println("--- Finished materialization") matValueStack.peekLast().asInstanceOf[Mat] } finally { if (isShutdown) throw shutdownWhileMaterializingFailure } }
由于这个方法过于重要,所以我还是贴出了整段代码。我们发现它有两个参数没有见过:defaultPhase: Phase[Any]、phases: Map[IslandTag, Phase[Any]]。这涉及到两个类型IslandTag、Phase。Phase比较好理解就是阶段,那IslandTag是啥呢?“岛”的标签?“岛”是什么?!
@DoNotInherit private[akka] trait Phase[M] { def apply( settings: ActorMaterializerSettings, effectiveAttributes: Attributes, materializer: PhasedFusingActorMaterializer, islandName: String): PhaseIsland[M] }
Phase这个trait就只有一个apply,他就是根据各个参数,返回了PhaseIsland,阶段岛?
@DoNotInherit private[akka] trait PhaseIsland[M] { def name: String def materializeAtomic(mod: AtomicModule[Shape, Any], attributes: Attributes): (M, Any) def assignPort(in: InPort, slot: Int, logic: M): Unit def assignPort(out: OutPort, slot: Int, logic: M): Unit def createPublisher(out: OutPort, logic: M): Publisher[Any] def takePublisher(slot: Int, publisher: Publisher[Any]): Unit def onIslandReady(): Unit }
PhaseIsland有两个方法非常重要:createPublisher、takePublisher。为啥重要?因为它好像在创建Publisher啊,Publisher是啥?翻翻我上一篇博客喽?这是reactivestreams里面的接口,之所以重要因为它涉及到了底层。
还有一个IslandTag,这又是啥呢?
@DoNotInherit private[akka] trait IslandTag
啥元素也没有,难道仅仅是用来做类型匹配的?鬼知道。估计是给Island打标签的。
materialize方法代码很多,逻辑很复杂,但有几个重要的点需要注意。它创建了一个IslandTracking,并通过一个while循环访问了graph.traversalBuilder,对IslandTracking进行了某些操作,最后调用了islandTracking的两个方法。
islandTracking.getCurrentPhase.onIslandReady() islandTracking.allNestedIslandsReady()
所以IslandTracking和这两个方法就非常重要了,因为这个类的这两个方法调用完之后,流就可以正常运行了啊。
@InternalApi private[akka] class IslandTracking( val phases: Map[IslandTag, Phase[Any]], val settings: ActorMaterializerSettings, attributes: Attributes, defaultPhase: Phase[Any], val materializer: PhasedFusingActorMaterializer, islandNamePrefix: String)
IslandTracking岛跟踪,居然一行注释都没有,这让我怎么分析啊,麻蛋!
由于时间关系,今天就先分析到这里吧,很显然Akka Streams的代码很复杂,看源码非常有难度,希望我还能看下去。哈哈。