• alpakka-kafka(1)-producer


      alpakka项目是一个基于akka-streams流处理编程工具的scala/java开源项目,通过提供connector连接各种数据源并在akka-streams里进行数据处理。alpakka-kafka就是alpakka项目里的kafka-connector。对于我们来说:可以用alpakka-kafka来对接kafka,使用kafka提供的功能。或者从另外一个角度讲:alpakka-kafka就是一个用akka-streams实现kafka功能的scala开发工具。

    alpakka-kafka提供了kafka的核心功能:producer、consumer,分别负责把akka-streams里的数据写入kafka及从kafka中读出数据并输入到akka-streams里。用akka-streams集成kafka的应用场景通常出现在业务集成方面:在一项业务A中产生一些业务操作指令写入kafka,然后通过kafka把指令传送给另一项业务B,业务B从kafka中获取操作指令并进行相应的业务操作。如:有两个业务模块:收货管理和库存管理,一方面收货管理向kafka写入收货记录。另一头库存管理从kafka中读取收货记录并更新相关库存数量记录。注意,这两项业务是分别操作的。在alpakka中,实际的业务操作基本就是在akka-streams里的数据处理(transform),其实是典型的CQRS模式:读写两方互不关联,写时不管受众是谁,如何使用、读者不关心谁是写方。这里的写和读两方分别代表kafka里的producer和consumer。

    本篇我们先介绍alpakka-kafka的producer功能及其使用方法。如前所述:alpakka是用akka-streams实现了kafka-producer功能。alpakka提供的producer也就是akka-streams的一种组件,可以与其它的akka-streams组件组合形成更大的akka-streams个体。构建一个producer需要先完成几个配件类构成:

    1、producer-settings配置:alpakka-kafka在reference.conf里的akka.kafka.producer配置段落提供了足够支持基本运作的默认producer配置。用户可以通过typesafe config配置文件操作工具来灵活调整配置

    2、de/serializer序列化工具:alpakka-kafka提供了String类型的序列化/反序列化函数,可以直接使用

    4、bootstrap-server:一个以逗号分隔的kafka-cluster节点ip清单文本

    下面是一个具体的例子:

      implicit val system = ActorSystem("kafka_sys")
      val bootstrapServers = "localhost:9092"
      val config = system.settings.config.getConfig("akka.kafka.producer")
      val producerSettings =
        ProducerSettings(config, new StringSerializer, new StringSerializer)
          .withBootstrapServers(bootstrapServers)

    这里使用ActorSystem只是为了读取.conf文件里的配置,还没有使用任何akka-streams组件。akka.kafka.producer配置段落在alpakka-kafka的reference.conf里提供了默认配置,不需要在application.conf里重新定义。

    alpakka-kafka提供了一个最基本的producer,非akka-streams组件,sendProducer。下面我们示范一下sendProducer的使用和效果:

    import akka.actor.ActorSystem
    import akka.kafka.scaladsl.{Consumer, SendProducer}
    import org.apache.kafka.clients.producer.{ProducerRecord, RecordMetadata}
    import akka.kafka._
    import org.apache.kafka.common.serialization._
    import scala.concurrent.duration._
    import scala.concurrent.{Await, Future}
    
    object SendProducerDemo extends App {
      implicit val system = ActorSystem("kafka_sys")
      implicit val executionContext = system.dispatcher
      val bootstrapServers = "localhost:9092"
      val config = system.settings.config.getConfig("akka.kafka.producer")
      val producerSettings =
        ProducerSettings(config, new StringSerializer, new StringSerializer)
          .withBootstrapServers(bootstrapServers)
      val producer = SendProducer(producerSettings)
      val topic = "greatings"
      val lstfut: Seq[Future[RecordMetadata]] =
        (100 to 200).reverse
          .map(_.toString)
          .map(value => new ProducerRecord[String, String](topic, s"hello-$value"))
          .map(msg => producer.send(msg))
    
      val futlst = Future.sequence(lstfut)
      Await.result(futlst, 2.seconds)
    
    
      scala.io.StdIn.readLine()
      producer.close()
      system.terminate()
    }

    以上示范用sendProducer向kafka写入100条hello消息。使用的是集合遍历,没有使用akka-streams的Source。为了检验具体效果,我们可以使用kafka提供的一些手工指令,如下:

    w> ./kafka-topics --create --topic greatings --bootstrap-server localhost:9092
    Created topic greatings.
    w> ./kafka-console-consumer --topic greatings  --bootstrap-server localhost:9092
    hello-100
    hello-101
    hello-102
    hello-103
    hello-104
    hello-105
    hello-106
    ...

    既然producer代表写入功能,那么在akka-streams里就是Sink或Flow组件的功能了。下面这个例子是producer Sink组件plainSink的示范:

    import akka.Done
    import akka.actor.ActorSystem
    import akka.kafka.scaladsl._
    import akka.kafka._
    import akka.stream.scaladsl._
    import org.apache.kafka.clients.producer.ProducerRecord
    import org.apache.kafka.common.serialization._
    
    import scala.concurrent._
    import scala.concurrent.duration._
    
    object plain_sink extends App {
      implicit val system = ActorSystem("kafka_sys")
      val bootstrapServers = "localhost:9092"
      val config = system.settings.config.getConfig("akka.kafka.producer")
      val producerSettings =
        ProducerSettings(config, new StringSerializer, new StringSerializer)
          .withBootstrapServers(bootstrapServers)
    
      implicit val executionContext = system.dispatcher
      val topic = "greatings"
      val done: Future[Done] =
        Source(1 to 100)
          .map(_.toString)
          .map(value => new ProducerRecord[String, String](topic, s"hello-$value"))
          .runWith(Producer.plainSink(producerSettings))
    
      Await.ready(done,3.seconds)
    
      scala.io.StdIn.readLine()
      system.terminate()
    }

    这是一个典型的akka-streams应用实例,其中Producer.plainSink就是一个akka-streams Sink组件。

    以上两个示范都涉及到构建一个ProducerRecord类型并将之写入kafka。ProducerRecord是一个基本的kafka消息类型:

       public ProducerRecord(String topic, K key, V value) {
            this(topic, null, null, key, value, null);
        }

    topic是String类型,key, value 是 Any 类型的。 alpakka-kafka在ProducerRecord之上又拓展了一个复杂点的消息类型ProducerMessage.Envelope类型:

    sealed trait Envelope[K, V, +PassThrough] {
        def passThrough: PassThrough
        def withPassThrough[PassThrough2](value: PassThrough2): Envelope[K, V, PassThrough2]
      }
    
    
      final case class Message[K, V, +PassThrough](
          record: ProducerRecord[K, V],
          passThrough: PassThrough
      ) extends Envelope[K, V, PassThrough] {
        override def withPassThrough[PassThrough2](value: PassThrough2): Message[K, V, PassThrough2] =
          copy(passThrough = value)
      }
    

    ProducerMessage.Envelope增加了个PassThrough参数,用来与消息一道传递额外的元数据。alpakka-kafka streams组件使用这个消息类型作为流元素,最终把它转换成一或多条ProducerRecord写入kafka。如下: 

    object EventMessages {
    //一对一条ProducerRecord
       def createMessage[KeyType,ValueType,PassThroughType](
          topic: String,
          key: KeyType,
          value: ValueType,
          passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = {
         val single = ProducerMessage.single(
           new ProducerRecord[KeyType,ValueType](topic,key,value),
           passThrough
         )
         single
       }
    //一对多条ProducerRecord
      def createMultiMessage[KeyType,ValueType,PassThroughType] (
           topics: List[String],
           key: KeyType,
           value: ValueType,
           passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = {
        import scala.collection.immutable
        val msgs = topics.map { topic =>
          new ProducerRecord(topic,key,value)
        }.toSeq
        val multi = ProducerMessage.multi(
          msgs,
          passThrough
        )
        multi
      }
    //只传递通过型元数据
      def createPassThroughMessage[KeyType,ValueType,PassThroughType](
           topic: String,
           key: KeyType,
           value: ValueType,
           passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = {
        ProducerMessage.passThrough(passThrough)
      }
    
    }
    

    flexiFlow是一个alpakka-kafka Flow组件,流入ProducerMessage.Evelope,流出Results类型: 

      def flexiFlow[K, V, PassThrough](
          settings: ProducerSettings[K, V]
      ): Flow[Envelope[K, V, PassThrough], Results[K, V, PassThrough], NotUsed] = { ... }
    

    Results类型定义如下:

      final case class Result[K, V, PassThrough] private (
          metadata: RecordMetadata,
          message: Message[K, V, PassThrough]
      ) extends Results[K, V, PassThrough] {
        def offset: Long = metadata.offset()
        def passThrough: PassThrough = message.passThrough
      }
    

    也就是说flexiFlow可以返回写入kafka后kafka返回的操作状态数据。我们再看看flexiFlow的使用案例: 

    import akka.kafka.ProducerMessage._
    import akka.actor.ActorSystem
    import akka.kafka.scaladsl._
    import akka.kafka.{ProducerMessage, ProducerSettings}
    import akka.stream.scaladsl.{Sink, Source}
    import org.apache.kafka.clients.producer.ProducerRecord
    import org.apache.kafka.common.serialization.StringSerializer
    
    import scala.concurrent._
    import scala.concurrent.duration._
    
    object flexi_flow extends App {
      implicit val system = ActorSystem("kafka_sys")
      val bootstrapServers = "localhost:9092"
      val config = system.settings.config.getConfig("akka.kafka.producer")
      val producerSettings =
        ProducerSettings(config, new StringSerializer, new StringSerializer)
          .withBootstrapServers(bootstrapServers)
    
      // needed for the future flatMap/onComplete in the end
      implicit val executionContext = system.dispatcher
      val topic = "greatings"
    
      val done = Source(1 to 100)
        .map { number =>
          val value = number.toString
          EventMessages.createMessage(topic,"key",value,number)
        }
        .via(Producer.flexiFlow(producerSettings))
        .map {
          case ProducerMessage.Result(metadata, ProducerMessage.Message(record, passThrough)) =>
            s"${metadata.topic}/${metadata.partition} ${metadata.offset}: ${record.value}"
    
          case ProducerMessage.MultiResult(parts, passThrough) =>
            parts
              .map {
                case MultiResultPart(metadata, record) =>
                  s"${metadata.topic}/${metadata.partition} ${metadata.offset}: ${record.value}"
              }
              .mkString(", ")
    
          case ProducerMessage.PassThroughResult(passThrough) =>
            s"passed through"
        }
        .runWith(Sink.foreach(println(_)))
    
      Await.ready(done,3.seconds)
    
      scala.io.StdIn.readLine()
      system.terminate()
    }
    
    object EventMessages {
       def createMessage[KeyType,ValueType,PassThroughType](
          topic: String,
          key: KeyType,
          value: ValueType,
          passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = {
         val single = ProducerMessage.single(
           new ProducerRecord[KeyType,ValueType](topic,key,value),
           passThrough
         )
         single
       }
      def createMultiMessage[KeyType,ValueType,PassThroughType] (
           topics: List[String],
           key: KeyType,
           value: ValueType,
           passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = {
        import scala.collection.immutable
        val msgs = topics.map { topic =>
          new ProducerRecord(topic,key,value)
        }.toSeq
        val multi = ProducerMessage.multi(
          msgs,
          passThrough
        )
        multi
      }
      def createPassThroughMessage[KeyType,ValueType,PassThroughType](
           topic: String,
           key: KeyType,
           value: ValueType,
           passThrough: PassThroughType): ProducerMessage.Envelope[KeyType,ValueType,PassThroughType] = {
        ProducerMessage.passThrough(passThrough)
      }
    
    }
    

    producer除向kafka写入与业务相关的业务事件或业务指令外还会向kafka写入当前消息读取的具体位置offset,所以alpakka-kafka的produce可分成两种类型:上面示范的plainSink, flexiFlow只向kafka写业务数据。还有一类如commitableSink还包括了把消息读取位置offset写入commit的功能。如下:

    val control =
      Consumer
        .committableSource(consumerSettings, Subscriptions.topics(topic1, topic2))
        .map { msg =>
          ProducerMessage.single(
            new ProducerRecord(targetTopic, msg.record.key, msg.record.value),
            msg.committableOffset
          )
        }
        .toMat(Producer.committableSink(producerSettings, committerSettings))(DrainingControl.apply)
        .run()
    
    control.drainAndShutdown()
    

    如上所示,committableSource从kafka读取业务消息及读取位置committableOffsset,然后Producer.committableSink把业务消息和offset再写入kafka。

    下篇讨论我们再具体介绍consumer。

  • 相关阅读:
    HDU 1572 (DFS)
    UVA 439 BFS 骑士的移动
    STL next_permutation 和 prev_permutation
    Fire Net
    HDU 1026
    Awesome CS Courses 超级棒的课程
    Tensorflow 最佳实践样例程序-mnist
    关于交叉熵在loss函数中使用的理解
    神经网络训练程序,tensorflow
    神经网络前向传播算法的tensorflow实现。
  • 原文地址:https://www.cnblogs.com/tiger-xc/p/14419008.html
Copyright © 2020-2023  润新知