• 「Flink」使用Managed Keyed State实现计数窗口功能


    先上代码:

    public class WordCountKeyedState {
        public static void main(String[] args) throws Exception {
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    
            // 初始化测试单词数据流
            DataStreamSource<String> lineDS = env.addSource(new RichSourceFunction<String>() {
                private boolean isCanaled = false;
    
                @Override
                public void run(SourceContext<String> ctx) throws Exception {
                    while(!isCanaled) {
                        ctx.collect("hadoop flink spark");
                        Thread.sleep(1000);
                    }
                }
    
                @Override
                public void cancel() {
                    isCanaled = true;
                }
            });
    
            // 切割单词,并转换为元组
            SingleOutputStreamOperator<Tuple2<String, Integer>> wordTupleDS = lineDS.flatMap((String line, Collector<Tuple2<String, Integer>> ctx) -> {
                Arrays.stream(line.split(" ")).forEach(word -> ctx.collect(Tuple2.of(word, 1)));
            }).returns(Types.TUPLE(Types.STRING, Types.INT));
    
            // 按照单词进行分组
            KeyedStream<Tuple2<String, Integer>, Integer> keyedWordTupleDS = wordTupleDS.keyBy(t -> t.f0);
    
            // 对单词进行计数
            keyedWordTupleDS.flatMap(new RichFlatMapFunction<Tuple2<String, Integer>, Tuple2<String, Integer>>() {
    
                private transient ValueState<Tuple2<Integer, Integer>> countSumValueState;
    
                @Override
                public void open(Configuration parameters) throws Exception {
                    // 初始化ValueState
                    ValueStateDescriptor<Tuple2<Integer, Integer>> countSumValueStateDesc = new ValueStateDescriptor("countSumValueState",
                            TypeInformation.of(new TypeHint<Tuple2<Integer, Integer>>() {})
                    );
                    countSumValueState = getRuntimeContext().getState(countSumValueStateDesc);
                }
    
                @Override
                public void flatMap(Tuple2<String, Integer> value, Collector<Tuple2<String, Integer>> out) throws Exception {
                    if(countSumValueState.value() == null) {
                        countSumValueState.update(Tuple2.of(0, 0));
                    }
    
                    Integer count = countSumValueState.value().f0;
                    count++;
                    Integer valueSum = countSumValueState.value().f1;
                    valueSum += value.f1;
    
                    countSumValueState.update(Tuple2.of(count, valueSum));
    
                    // 每当达到3次,发送到下游
                    if(count > 3) {
                        out.collect(Tuple2.of(value.f0, valueSum));
                        // 清除计数
                        countSumValueState.update(Tuple2.of(0, valueSum));
                    }
                }
            }).print();
    
            env.execute("KeyedState State");
        }
    }

    代码说明:

    1、构建测试数据源,每秒钟发送一次文本,为了测试方便,这里就发一个包含三个单词的文本行

    image

    2、对句子按照空格切分,并将单词转换为元组,每个单词初始出现的次数为1

    image

    3、按照单词进行分组

    4、自定义FlatMap

    初始化ValueState,注意:ValueState只能在KeyedStream中使用,而且每一个ValueState都对一个一个key。每当一个并发处理ValueState,都会从上下文获取到Key的取值,所以每个处理逻辑拿到的ValueStated都是对应指定key的ValueState,这个部分是由Flink自动完成的。

    image

    注意:

    带默认初始值的ValueStateDescriptor已经过期了,官方推荐让我们手动在处理时检查是否为空

    instead and manually manage the default value by checking whether the contents of the state is null.

    /**
    * Creates a new {@code ValueStateDescriptor} with the given name, default value, and the specific
    * serializer.
    *
    * @deprecated Use {@link #ValueStateDescriptor(String, TypeSerializer)} instead and manually
    * manage the default value by checking whether the contents of the state is {@code null}.
    *
    * @param name The (unique) name for the state.
    * @param typeSerializer The type serializer of the values in the state.
    * @param defaultValue The default value that will be set when requesting state without setting
    * a value before.
    */
    @Deprecated
    public ValueStateDescriptor(String name, TypeSerializer<T> typeSerializer, T defaultValue) {
    super(name, typeSerializer, defaultValue);
    }

    5、逻辑实现

    在flatMap逻辑中判断ValueState是否已经初始化,如果没有手动给一个初始值。并进行累加后更新。每当count > 3发送计算结果到下游,并清空计数。

    image

  • 相关阅读:
    2014多校第四场1005 || HDU 4901 The Romantic Hero (DP)
    HDU 1698 Just a Hook (线段树区间更新)
    HDU 1016 Prime Ring Problem (素数筛+DFS)
    2014多校第二场1011 || HDU 4882 ZCC Loves Codefires (贪心)
    HDU 1142 A Walk Through the Forest(SPFA+记忆化搜索DFS)
    JSP九大内置对象和四个作用域
    Jsp遍历后台传过来的List
    JavaWeb文件上传和下载
    servlet中doGet()和doPost()的区别
    Ajax请求会话过期处理(JS)
  • 原文地址:https://www.cnblogs.com/ilovezihan/p/12247368.html
Copyright © 2020-2023  润新知