Dynamic stream handling
动态流处理
Dependency
To use Akka Streams, add the module to your project:
-
val AkkaVersion = "2.6.9" libraryDependencies += "com.typesafe.akka" %% "akka-stream" % AkkaVersion
Introduction
Controlling stream completion with KillSwitch
使用KillSwitch控制流完成
A KillSwitch
allows the completion of operators of FlowShape
from the outside. It consists of a flow element that can be linked to an operator of FlowShape
needing completion control. The KillSwitch
trait allows to:
KillSwitch允许从外部完成FlowShape的操作。它由一个流元素组成,该元素可以链接到需要完成控制的FlowShape操作符。KillSwitch特性允许:
- complete the stream(s) via
shutdown()
通过shutdown()完成流
- fail the stream(s) via
abort(Throwable error)
通过中止使流失败(可抛出错误)
-
trait KillSwitch { /** * After calling [[KillSwitch#shutdown]] the linked [[Graph]]s of [[FlowShape]] are completed normally. */ def shutdown(): Unit /** * After calling [[KillSwitch#abort]] the linked [[Graph]]s of [[FlowShape]] are failed. */ def abort(ex: Throwable): Unit }
After the first call to either shutdown
or abort
, all subsequent calls to any of these methods will be ignored. Stream completion is performed by both
在第一次调用shutdown或abort之后,所有对这些方法的后续调用都将被忽略。流完成由两者执行
- cancelling its upstream.
取消它的上游。
- completing (in case of
shutdown
) or failing (in case ofabort
) its downstream
完成(关闭时)或失败(中止时)其下游
A KillSwitch
can control the completion of one or multiple streams, and therefore comes in two different flavours.
KillSwitch可以控制一个或多个流的完成,因此有两种不同的特点 。
UniqueKillSwitch
UniqueKillSwitch
allows to control the completion of one materialized Graph
of FlowShape
. Refer to the below for usage examples.
UniqueKillSwitch允许控制FlowShape的一个物化图形的完成。请参阅下面的用法示例。
- Shutdown
- 关闭
-
val countingSrc = Source(Stream.from(1)).delay(1.second, DelayOverflowStrategy.backpressure) val lastSnk = Sink.last[Int] val (killSwitch, last) = countingSrc .viaMat(KillSwitches.single)(Keep.right) .toMat(lastSnk)(Keep.both) .run() doSomethingElse() killSwitch.shutdown() Await.result(last, 1.second) shouldBe 2
- Abort
- 中止(与about关于不同)
-
val countingSrc = Source(Stream.from(1)).delay(1.second, DelayOverflowStrategy.backpressure) val lastSnk = Sink.last[Int] val (killSwitch, last) = countingSrc .viaMat(KillSwitches.single)(Keep.right) .toMat(lastSnk)(Keep.both).run() val error = new RuntimeException("boom!") killSwitch.abort(error) Await.result(last.failed, 1.second) shouldBe error
SharedKillSwitch
A SharedKillSwitch
allows to control the completion of an arbitrary number operators of FlowShape
. It can be materialized multiple times via its flow
method, and all materialized operators linked to it are controlled by the switch. Refer to the below for usage examples.
SharedKillSwitch允许控制FlowShape任意数量运算符的完成。它可以通过其流方法实现多次物化,所有与之相关的物化操作符都由开关控制。请参阅下面的用法示例。
- Shutdown
- 关闭
-
val countingSrc = Source(Stream.from(1)).delay(1.second, DelayOverflowStrategy.backpressure) val lastSnk = Sink.last[Int] val sharedKillSwitch = KillSwitches.shared("my-kill-switch") val last = countingSrc .via(sharedKillSwitch.flow) .runWith(lastSnk) val delayedLast = countingSrc .delay(1.second, DelayOverflowStrategy.backpressure) .via(sharedKillSwitch.flow) .runWith(lastSnk) doSomethingElse() sharedKillSwitch.shutdown() Await.result(last, 1.second) shouldBe 2 Await.result(delayedLast, 1.second) shouldBe 1
- Abort
- 中止
-
val countingSrc = Source(Stream.from(1)).delay(1.second) val lastSnk = Sink.last[Int] val sharedKillSwitch = KillSwitches.shared("my-kill-switch") val last1 = countingSrc.via(sharedKillSwitch.flow).runWith(lastSnk) val last2 = countingSrc.via(sharedKillSwitch.flow).runWith(lastSnk) val error = new RuntimeException("boom!") sharedKillSwitch.abort(error) Await.result(last1.failed, 1.second) shouldBe error Await.result(last2.failed, 1.second) shouldBe error
A UniqueKillSwitch
is always a result of a materialization, whilst SharedKillSwitch
needs to be constructed before any materialization takes place.
注意
UniqueKillSwitch总是物化的结果,而SharedKillSwitch需要在任何具体化发生之前构造。
Dynamic fan-in and fan-out with MergeHub, BroadcastHub and PartitionHub
使用MergeHub、BroadcastHub和PartitionHub动态扇入和扇出
There are many cases when consumers or producers of a certain service (represented as a Sink, Source, or possibly Flow) are dynamic and not known in advance. The Graph DSL does not allow to represent this, all connections of the graph must be known in advance and must be connected upfront. To allow dynamic fan-in and fan-out streaming, the Hubs should be used. They provide means to construct Sink
and Source
pairs that are “attached” to each other, but one of them can be materialized multiple times to implement dynamic fan-in or fan-out.
在许多情况下,某个服务的消费者或生产者(表示为接收器、源或可能的流)是动态的,并且事先不知道。图DSL不允许表示这一点,图的所有连接必须事先知道,并且必须预先连接。为了允许动态扇入和扇出流,应该使用集线器。它们提供了构造相互“连接”的Sink和Source对的方法,但其中一个可以多次具体化以实现动态扇入或扇出。
Using the MergeHub
A MergeHub
allows to implement a dynamic fan-in junction point in a graph where elements coming from different producers are emitted in a First-Comes-First-Served fashion. If the consumer cannot keep up then all of the producers are backpressured. The hub itself comes as a Source
to which the single consumer can be attached. It is not possible to attach any producers until this Source
has been materialized (started). This is ensured by the fact that we only get the corresponding Sink
as a materialized value. Usage might look like this:
- MergeHub允许在图中实现一个动态扇入连接点,其中来自不同生产者的元素以先到先得的方式发出。如果消费者跟不上,那么所有的生产商都会背负压力。集线器本身就是一个可以连接单个消费者的源。在这个源被具体化(启动)之前,不可能附加任何生产者。这是由这样一个事实所保证的,即我们只获得对应的Sink作为一个物化值。用法可能如下所示:
-
// A simple consumer that will print to the console for now val consumer = Sink.foreach(println) // Attach a MergeHub Source to the consumer. This will materialize to a // corresponding Sink. val runnableGraph: RunnableGraph[Sink[String, NotUsed]] = MergeHub.source[String](perProducerBufferSize = 16).to(consumer) // By running/materializing the consumer we get back a Sink, and hence // now have access to feed elements into it. This Sink can be materialized // any number of times, and every element that enters the Sink will // be consumed by our consumer. val toConsumer: Sink[String, NotUsed] = runnableGraph.run() // Feeding two independent sources into the hub. Source.single("Hello!").runWith(toConsumer) Source.single("Hub!").runWith(toConsumer)
This sequence, while might look odd at first, ensures proper startup order. Once we get the Sink
, we can use it as many times as wanted. Everything that is fed to it will be delivered to the consumer we attached previously until it cancels.
这个序列,虽然一开始看起来很奇怪,但可以确保正确的启动顺序。一旦我们得到水槽,我们就可以随心所欲地使用它。所有供给它的东西都将被送到我们之前附加的消费者,直到它取消。
Using the BroadcastHub
使用BroadcastHub
A BroadcastHub
can be used to consume elements from a common producer by a dynamic set of consumers. The rate of the producer will be automatically adapted to the slowest consumer. In this case, the hub is a Sink
to which the single producer must be attached first. Consumers can only be attached once the Sink
has been materialized (i.e. the producer has been started). One example of using the BroadcastHub
:
- BroadcastHub可用于由一组动态的使用者使用来自公共生产者的元素。生产者的税率将自动适应最慢的消费者。在这种情况下,集线器是一个接收器,必须首先连接单个生产者。只有当接收器被具体化(即生产者已经启动)时,才能附加消费者。使用BroadcastHub的一个示例:
-
// A simple producer that publishes a new "message" every second val producer = Source.tick(1.second, 1.second, "New message") // Attach a BroadcastHub Sink to the producer. This will materialize to a // corresponding Source. // (We need to use toMat and Keep.right since by default the materialized // value to the left is used) val runnableGraph: RunnableGraph[Source[String, NotUsed]] = producer.toMat(BroadcastHub.sink(bufferSize = 256))(Keep.right) // By running/materializing the producer, we get back a Source, which // gives us access to the elements published by the producer. val fromProducer: Source[String, NotUsed] = runnableGraph.run() // Print out messages from the producer in two independent consumers fromProducer.runForeach(msg => println("consumer1: " + msg)) fromProducer.runForeach(msg => println("consumer2: " + msg))
The resulting Source
can be materialized any number of times, each materialization effectively attaching a new subscriber. If there are no subscribers attached to this hub then it will not drop any elements but instead backpressure the upstream producer until subscribers arrive. This behavior can be tweaked by using the operators .buffer
for example with a drop strategy, or attaching a subscriber that drops all messages. If there are no other subscribers, this will ensure that the producer is kept drained (dropping all elements) and once a new subscriber arrives it will adaptively slow down, ensuring no more messages are dropped.
产生的源可以被物化任意次数,每次物化有效地附加一个新的订户。如果没有订阅服务器连接到这个集线器,那么它不会丢弃任何元素,而是向上游生产商施加反压力,直到订阅服务器到达为止。可以通过使用operators.buffer(例如使用drop策略)或附加一个删除所有消息的订阅服务器来调整此行为。如果没有其他订阅者,这将确保生产者保持枯竭(删除所有元素),一旦新订户到达,它将自适应地减速,确保不再丢弃更多的消息。
Combining dynamic operators to build a simple Publish-Subscribe service
结合动态运算符构建简单的发布-订阅服务
The features provided by the Hub implementations are limited by default. This is by design, as various combinations can be used to express additional features like unsubscribing producers or consumers externally. We show here an example that builds a Flow
representing a publish-subscribe channel. The input of the Flow
is published to all subscribers while the output streams all the elements published.
默认情况下,集线器实现提供的功能受到限制。生产商可以通过外部的多种功能组合来表示退订。我们在这里展示了一个示例,它构建了一个表示发布订阅通道的流。流的输入被发布到所有订阅服务器,而输出流传输所有已发布的元素。
First, we connect a MergeHub
and a BroadcastHub
together to form a publish-subscribe channel. Once we materialize this small stream, we get back a pair of Source
and Sink
that together define the publish and subscribe sides of our channel.
- 首先,我们将MergeHub和BroadcastHub连接在一起,形成一个发布-订阅频道。一旦我们实现了这个小流,我们就得到了一对源和汇,它们共同定义了我们通道的发布和订阅端。
-
// Obtain a Sink and Source which will publish and receive from the "bus" respectively. val (sink, source) = MergeHub.source[String](perProducerBufferSize = 16).toMat(BroadcastHub.sink(bufferSize = 256))(Keep.both).run()
We now use a few tricks to add more features. First of all, we attach a Sink.ignore
at the broadcast side of the channel to keep it drained when there are no subscribers. If this behavior is not the desired one this line can be dropped.
- 我们现在使用一些技巧来添加更多功能。首先,我们附上Sink.忽略在频道的广播侧,以在没有订户的情况下保持它的流量。如果此行为不是期望的行为,则可以删除该行。
-
// Ensure that the Broadcast output is dropped if there are no listening parties. // If this dropping Sink is not attached, then the broadcast hub will not drop any // elements itself when there are no subscribers, backpressuring the producer instead. source.runWith(Sink.ignore)
We now wrap the Sink
and Source
in a Flow
using Flow.fromSinkAndSource
. This bundles up the two sides of the channel into one and forces users of it to always define a publisher and subscriber side (even if the subscriber side is dropping). It also allows us to attach a KillSwitch
as a BidiStage
which in turn makes it possible to close both the original Sink
and Source
at the same time. Finally, we add backpressureTimeout
on the consumer side to ensure that subscribers that block the channel for more than 3 seconds are forcefully removed (and their stream failed).
- 我们现在使用来自sinkandsource的流. 这会将通道的两侧捆绑成一个,并强制通道的用户始终定义发布者和订阅者端(即使订户端正在退出)。它还允许我们附加一个KillSwitch作为BidiStage,从而使我们能够同时关闭原始的Sink和Source。最后,我们在用户端添加backpressureTimeout,以确保阻塞通道超过3秒的订户被强制删除(并且他们的流失败)。
-
// We create now a Flow that represents a publish-subscribe channel using the above // started stream as its "topic". We add two more features, external cancellation of // the registration and automatic cleanup for very slow subscribers. val busFlow: Flow[String, String, UniqueKillSwitch] = Flow .fromSinkAndSource(sink, source) .joinMat(KillSwitches.singleBidi[String, String])(Keep.right) .backpressureTimeout(3.seconds)
The resulting Flow now has a type of Flow[String, String, UniqueKillSwitch]
representing a publish-subscribe channel which can be used any number of times to attach new producers or consumers. In addition, it materializes to a UniqueKillSwitch
(see UniqueKillSwitch) that can be used to deregister a single user externally:
- 生成的流现在有一个流类型[String,String,UniqueKillSwitch],它表示一个发布订阅通道,可以多次使用该通道来附加新的生产者或消费者。此外,它具体化为UniqueKillSwitch(请参见UniqueKillSwitch),可用于在外部注销单个用户:
-
val switch: UniqueKillSwitch = Source.repeat("Hello world!").viaMat(busFlow)(Keep.right).to(Sink.foreach(println)).run() // Shut down externally switch.shutdown()
Using the PartitionHub
使用PartitionHub
This is a may change feature*
这是一个可能改变的功能*
A PartitionHub
can be used to route elements from a common producer to a dynamic set of consumers. The selection of consumer is done with a function. Each element can be routed to only one consumer.
PartitionHub可用于将元素从一个公共生产者路由到一组动态的使用者。消费者的选择是通过函数完成的。每个元素只能路由到一个使用者。
The rate of the producer will be automatically adapted to the slowest consumer. In this case, the hub is a Sink
to which the single producer must be attached first. Consumers can only be attached once the Sink
has been materialized (i.e. the producer has been started). One example of using the PartitionHub
:
生产者的税率将自动适应最慢的消费者。在这种情况下,集线器是一个接收器,必须首先连接单个生产者。只有当接收器被具体化(即生产者已经启动)时,才能附加消费者。使用PartitionHub的一个示例:
-
// A simple producer that publishes a new "message-" every second val producer = Source.tick(1.second, 1.second, "message").zipWith(Source(1 to 100))((a, b) => s"$a-$b") // Attach a PartitionHub Sink to the producer. This will materialize to a // corresponding Source. // (We need to use toMat and Keep.right since by default the materialized // value to the left is used) val runnableGraph: RunnableGraph[Source[String, NotUsed]] = producer.toMat( PartitionHub.sink( (size, elem) => math.abs(elem.hashCode % size), startAfterNrOfConsumers = 2, bufferSize = 256))(Keep.right) // By running/materializing the producer, we get back a Source, which // gives us access to the elements published by the producer. val fromProducer: Source[String, NotUsed] = runnableGraph.run() // Print out messages from the producer in two independent consumers fromProducer.runForeach(msg => println("consumer1: " + msg)) fromProducer.runForeach(msg => println("consumer2: " + msg))
The partitioner
function takes two parameters; the first is the number of active consumers and the second is the stream element. The function should return the index of the selected consumer for the given element, i.e. int
greater than or equal to 0 and less than number of consumers.
partitioner函数接受两个参数;第一个参数是活动使用者的数量,第二个参数是stream元素。函数应返回给定元素的选定使用者的索引,即int大于或等于0且小于消费者数量。
The resulting Source
can be materialized any number of times, each materialization effectively attaching a new consumer. If there are no consumers attached to this hub then it will not drop any elements but instead backpressure the upstream producer until consumers arrive. This behavior can be tweaked by using an operator, for example .buffer
with a drop strategy, or attaching a consumer that drops all messages. If there are no other consumers, this will ensure that the producer is kept drained (dropping all elements) and once a new consumer arrives and messages are routed to the new consumer it will adaptively slow down, ensuring no more messages are dropped.
产生的源可以被物化任意次数,每次物化都有效地附加了一个新的消费者。如果没有消费者连接到这个中心,那么它不会丢弃任何元素,而是向上游生产商施加反压力,直到消费者到达。这种行为可以通过使用运算符进行调整,例如,使用删除策略的缓冲区,或附加一个删除所有消息的使用者。如果没有其他消费者,这将确保生产者保持枯竭(丢弃所有元素),一旦新的消费者到达,消息被路由到新的消费者,它将自适应地减慢速度,确保不再丢弃更多的消息。
It is possible to define how many initial consumers that are required before it starts emitting any messages to the attached consumers. While not enough consumers have been attached messages are buffered and when the buffer is full the upstream producer is backpressured. No messages are dropped.
在它开始向附加的使用者发送任何消息之前,可以定义需要多少初始使用者。虽然没有足够的用户被附加到消息缓冲区,当缓冲区已满时,上游生产者将背压。不会丢弃任何消息。
The above example illustrate a stateless partition function. For more advanced stateful routing the statefulSink
can be used. Here is an example of a stateful round-robin function:
上面的示例演示了一个无状态分区函数。对于更高级的有状态路由,可以使用statefulSink。下面是一个有状态循环函数的示例:
-
// A simple producer that publishes a new "message-" every second val producer = Source.tick(1.second, 1.second, "message").zipWith(Source(1 to 100))((a, b) => s"$a-$b") // New instance of the partitioner function and its state is created // for each materialization of the PartitionHub. def roundRobin(): (PartitionHub.ConsumerInfo, String) => Long = { var i = -1L (info, elem) => { i += 1 info.consumerIdByIdx((i % info.size).toInt) } } // Attach a PartitionHub Sink to the producer. This will materialize to a // corresponding Source. // (We need to use toMat and Keep.right since by default the materialized // value to the left is used) val runnableGraph: RunnableGraph[Source[String, NotUsed]] = producer.toMat(PartitionHub.statefulSink(() => roundRobin(), startAfterNrOfConsumers = 2, bufferSize = 256))( Keep.right) // By running/materializing the producer, we get back a Source, which // gives us access to the elements published by the producer. val fromProducer: Source[String, NotUsed] = runnableGraph.run() // Print out messages from the producer in two independent consumers fromProducer.runForeach(msg => println("consumer1: " + msg)) fromProducer.runForeach(msg => println("consumer2: " + msg))
Note that it is a factory of a function to to be able to hold stateful variables that are unique for each materialization.
请注意,它是一个函数的工厂,能够保存每个具体化都是唯一的状态变量。
The function takes two parameters; the first is information about active consumers, including an array of consumer identifiers and the second is the stream element. The function should return the selected consumer identifier for the given element. The function will never be called when there are no active consumers, i.e. there is always at least one element in the array of identifiers.
该函数接受两个参数;第一个参数是有关活动使用者的信息,包括一个使用者标识符数组,第二个参数是流元素。函数应返回给定元素的选定使用者标识符。如果没有活动的消费者,即标识符数组中始终至少有一个元素,则永远不会调用该函数。
Another interesting type of routing is to prefer routing to the fastest consumers. The ConsumerInfo
has an accessor queueSize
that is approximate number of buffered elements for a consumer. Larger value than other consumers could be an indication of that the consumer is slow. Note that this is a moving target since the elements are consumed concurrently. Here is an example of a hub that routes to the consumer with least buffered elements:
另一种有趣的路由选择是选择最快的用户。ConsumerInfo有一个访问器queueSize,它是一个使用者的缓冲元素的近似数量。比其他消费者更大的价值可能表明消费者行动迟缓。请注意,这是一个移动的目标,因为元素是并发使用的。以下是一个集线器的示例,该集线器使用最少的缓冲元素路由到使用者:
-
val producer = Source(0 until 100) // ConsumerInfo.queueSize is the approximate number of buffered elements for a consumer. // Note that this is a moving target since the elements are consumed concurrently. val runnableGraph: RunnableGraph[Source[Int, NotUsed]] = producer.toMat( PartitionHub.statefulSink( () => (info, elem) => info.consumerIds.minBy(id => info.queueSize(id)), startAfterNrOfConsumers = 2, bufferSize = 16))(Keep.right) val fromProducer: Source[Int, NotUsed] = runnableGraph.run() fromProducer.runForeach(msg => println("consumer1: " + msg)) fromProducer.throttle(10, 100.millis).runForeach(msg => println("consumer2: " + msg))