• Flink 反馈流 Demo


    有的时候,我们需要创建有环执行流图,比如将一些处理过后还不满足条件的数据,返回到最开始重新处理。

    之前在做的时候,会考虑将处理后还不满足的数据,写入到单独的 Topic 中重新消费处理

    今天发现 Flink Iterate 算子,发现也能满足需求

    官网介绍: https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/stream/operators/

    Creates a "feedback" loop in the flow, by redirecting the output of one operator to some previous operator. This is especially useful for defining algorithms that continuously update a model. The following code starts with a stream and applies the iteration body continuously. Elements that are greater than 0 are sent back to the feedback channel, and the rest of the elements are forwarded downstream.

    通过将一个算子的输出重定向到某个先前的算子,在流中创建“feedback”循环。 这对于定义不断更新模型的算法特别有用。 以下代码从流开始,并连续应用迭代主体。 大于0的元素将被发送回反馈通道,其余元素将被转发到下游。

    官网 Demo

    // 创建 IterativeStream
    IterativeStream<Long> iteration = initialStream.iterate();
    // 迭代操作
    DataStream<Long> iterationBody = iteration.map (/*do something*/);
    // filter 过滤需要返回的内容
    DataStream<Long> feedback = iterationBody.filter(new FilterFunction<Long>(){
        @Override
        public boolean filter(Long value) throws Exception {
          // 满足条件的反馈
            return value > 0;
        }
    });
    // 将 feedback 流 反馈到 iteration 流中
    iteration.closeWith(feedback);
    // 输出部分
    DataStream<Long> output = iterationBody.filter(new FilterFunction<Long>(){
        @Override
        public boolean filter(Long value) throws Exception {
          // 满足条件的输出
            return value <= 0;
        }
    });

    Scala Demo

    业务场景:基于 Key 的窗口求和,如果窗口结果不满足条件,就重新进入窗口,再求和

    object FeedbackStreamDemo {
    
      def main(args: Array[String]): Unit = {
        // environment
        val env: StreamExecutionEnvironment = StreamExecutionEnvironment.getExecutionEnvironment
        env.setParallelism(1)
    
        val source = env.addSource(new SimpleStringSource)
    
        val mapStream = source.map(str => {
          val arr = str.split(",")
          println("map : " + str)
          (arr(0), arr(1).toLong)
        })
          .disableChaining()
    
        val itStrema = mapStream.iterate(ds => {
          // 迭代过程
          val dsMap = ds.map(str => {
            (str._1, str._2 + 1)
          })
            .keyBy(_._1)
            .window(TumblingProcessingTimeWindows.of(Time.seconds(10)))
            .process(new ProcessWindowFunction[(String, Long), (String, Long), String, TimeWindow] {
              override def process(key: String, context: Context, elements: Iterable[(String, Long)], out: Collector[(String, Long)]): Unit = {
                // process 简单的窗口求和
                val it = elements.toIterator
                var sum = 0l
                while (it.hasNext) {
                  val current = it.next()
                  sum = sum + current._2
                }
                out.collect(key, sum)
              }
            })
    
          // 反馈分支:窗口输出数据小于 500,反馈到 mapStream,重新窗口求和
          (dsMap.filter(s => {
            s._2 < 500
          })
            ,
            // 输出分支:大于等于 500 的就处理完了,直接输出
            dsMap.filter(s => {
              s._2 >= 500
            })
          )
        })
          .disableChaining()
    
        itStrema.print("result:")
        env.execute("FeedbackStreamDemo")
      }
    }

    欢迎关注Flink菜鸟公众号,会不定期更新Flink(开发技术)相关的推文

  • 相关阅读:
    Android系统在新进程中启动自定义服务过程(startService)的原理分析
    Thread和Service应用场合的区别
    Android数据格式解析对象JSON用法
    数据交换格式XML和JSON对比
    Android Handler的使用
    Android之Handler用法总结
    Handler的另外一种用法(HandlerThread)
    solr原理
    mysql主从:主键冲突问题
    修改mysql数据库存储目录
  • 原文地址:https://www.cnblogs.com/Springmoon-venn/p/13857002.html
Copyright © 2020-2023  润新知