• Flink 合流操作——CoProcessFunction


    CoProcessFunction 简介

    对于连接流ConnectedStreams的处理操作,需要分别定义对两条流的处理转换,因此接口中就会有两个相同的方法需要实现,用数字“1”“2”区分,在两条流中的数据到来时分别调用。我们把这种接口叫作“协同处理函数”(co-process function)。与CoMapFunction类似,如果是调用.flatMap()就需要传入一个CoFlatMapFunction,需要实现flatMap1()、flatMap2()两个方法;而调用.process()时,传入的则是一个CoProcessFunction。抽象类CoProcessFunction在源码中定义如下:

    @PublicEvolving
    public abstract class CoProcessFunction<IN1, IN2, OUT> extends AbstractRichFunction {
    
        private static final long serialVersionUID = 1L;
    
        /**
         * This method is called for each element in the first of the connected streams.
         *
         * <p>This function can output zero or more elements using the {@link Collector} parameter and
         * also update internal state or set timers using the {@link Context} parameter.
         *
         * @param value The stream element
         * @param ctx A {@link Context} that allows querying the timestamp of the element, querying the
         *     {@link TimeDomain} of the firing timer and getting a {@link TimerService} for registering
         *     timers and querying the time. The context is only valid during the invocation of this
         *     method, do not store it.
         * @param out The collector to emit resulting elements to
         * @throws Exception The function may throw exceptions which cause the streaming program to fail
         *     and go into recovery.
         */
        public abstract void processElement1(IN1 value, Context ctx, Collector<OUT> out)
                throws Exception;
    
        /**
         * This method is called for each element in the second of the connected streams.
         *
         * <p>This function can output zero or more elements using the {@link Collector} parameter and
         * also update internal state or set timers using the {@link Context} parameter.
         *
         * @param value The stream element
         * @param ctx A {@link Context} that allows querying the timestamp of the element, querying the
         *     {@link TimeDomain} of the firing timer and getting a {@link TimerService} for registering
         *     timers and querying the time. The context is only valid during the invocation of this
         *     method, do not store it.
         * @param out The collector to emit resulting elements to
         * @throws Exception The function may throw exceptions which cause the streaming program to fail
         *     and go into recovery.
         */
        public abstract void processElement2(IN2 value, Context ctx, Collector<OUT> out)
                throws Exception;
    
        /**
         * Called when a timer set using {@link TimerService} fires.
         *
         * @param timestamp The timestamp of the firing timer.
         * @param ctx An {@link OnTimerContext} that allows querying the timestamp of the firing timer,
         *     querying the {@link TimeDomain} of the firing timer and getting a {@link TimerService}
         *     for registering timers and querying the time. The context is only valid during the
         *     invocation of this method, do not store it.
         * @param out The collector for returning result values.
         * @throws Exception This method may throw exceptions. Throwing an exception will cause the
         *     operation to fail and may trigger recovery.
         */
        public void onTimer(long timestamp, OnTimerContext ctx, Collector<OUT> out) throws Exception {}
    
        /**
         * Information available in an invocation of {@link #processElement1(Object, Context,
         * Collector)}/ {@link #processElement2(Object, Context, Collector)} or {@link #onTimer(long,
         * OnTimerContext, Collector)}.
         */
        public abstract class Context {
    
            /**
             * Timestamp of the element currently being processed or timestamp of a firing timer.
             *
             * <p>This might be {@code null}, for example if the time characteristic of your program is
             * set to {@link org.apache.flink.streaming.api.TimeCharacteristic#ProcessingTime}.
             */
            public abstract Long timestamp();
    
            /** A {@link TimerService} for querying time and registering timers. */
            public abstract TimerService timerService();
    
            /**
             * Emits a record to the side output identified by the {@link OutputTag}.
             *
             * @param outputTag the {@code OutputTag} that identifies the side output to emit to.
             * @param value The record to emit.
             */
            public abstract <X> void output(OutputTag<X> outputTag, X value);
        }
    
        /**
         * Information available in an invocation of {@link #onTimer(long, OnTimerContext, Collector)}.
         */
        public abstract class OnTimerContext extends Context {
            /** The {@link TimeDomain} of the firing timer. */
            public abstract TimeDomain timeDomain();
        }
    }

    可以看到,很明显CoProcessFunction也是“处理函数”家族中的一员,用法非常相似。它需要实现的就是processElement1()、processElement2()两个方法,在每个数据到来时,会根据来源的流调用其中的一个方法进行处理。CoProcessFunction同样可以通过上下文ctx来访问timestamp、水位线,并通过TimerService注册定时器;另外也提供了.onTimer()方法,用于定义定时触发的处理操作。下面是CoProcessFunction的一个具体示例:我们可以实现一个实时对账的需求,也就是app的支付操作和第三方的支付操作的一个双流Join。App的支付事件和第三方的支付事件将会互相等待5秒钟,如果等不来对应的支付事件,那么就输出报警信息.

    参考代码

    /**
     * 实时对账 demo
     */
    public class BillCheckExample0828 {
        public static void main(String[] args) throws Exception {
            //1、获取执行环境
            StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
            //1.1、便于测试,测试环境设置并行度为 1,生产环境记得设置为 kafka topic 的分区数
            env.setParallelism(1);
            //2、读取数据 并 声明水位线
            //2.1、模拟来自app 的数据 appStream
            SingleOutputStreamOperator<Tuple3<String, String, Long>> appStream = env.fromElements(
                    Tuple3.of("order-1", "app", 1000L),
                    Tuple3.of("order-2", "app", 2000L),
                    Tuple3.of("order-3", "app", 3500L)
            ).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple3<String, String, Long>>forBoundedOutOfOrderness(Duration.ZERO)
                    .withTimestampAssigner(new SerializableTimestampAssigner<Tuple3<String, String, Long>>() {
                        @Override
                        public long extractTimestamp(Tuple3<String, String, Long> element, long recordTimestamp) {
                            return element.f2;
                        }
                    }));
    
            //2.2、模拟来自第三方支付平台的数据  
            SingleOutputStreamOperator<Tuple4<String, String, String, Long>> thirdPartStream = env.fromElements(
                    Tuple4.of("order-1", "third-party", "success", 3000L),
                    Tuple4.of("order-3", "third-party", "success", 4000L)
            ).assignTimestampsAndWatermarks(WatermarkStrategy.<Tuple4<String, String, String, Long>>forBoundedOutOfOrderness(Duration.ZERO)
                    .withTimestampAssigner(new SerializableTimestampAssigner<Tuple4<String, String, String, Long>>() {
                        @Override
                        public long extractTimestamp(Tuple4<String, String, String, Long> element, long recordTimestamp) {
                            return element.f3;
                        }
                    }));
            
            //3、调用实现 CoProcessFunction 的静态类 检查同一支付单,是否两条流种是否匹配
            appStream.connect(thirdPartStream).keyBy(data -> data.f0, data -> data.f0)
                    .process(new OrderMatchResult0828())
                    .print();
    
    
            env.execute();
        }
    
        /**
         * 自定义实现 CoProcessFunction
         */
        public static class OrderMatchResult0828 extends CoProcessFunction<Tuple3<String, String, Long>, Tuple4<String, String, String, Long>, String> {
    
            //定义状态,保存已经到达的状态
            private ValueState<Tuple3<String, String, Long>> appEventState;
            private ValueState<Tuple4<String, String, String, Long>> thirdPartyEventState;
    
            @Override
            public void open(Configuration parameters) throws Exception {
                appEventState = getRuntimeContext().getState(
                        new ValueStateDescriptor<Tuple3<String, String, Long>>("app-state", Types.TUPLE(Types.STRING, Types.STRING, Types.LONG))
                );
    
                thirdPartyEventState = getRuntimeContext().getState(
                        new ValueStateDescriptor<Tuple4<String, String, String, Long>>("thirt-party-state", Types.TUPLE(Types.STRING, Types.STRING, Types.STRING, Types.LONG))
                );
            }
    
            @Override
            public void processElement1(Tuple3<String, String, Long> value, Context ctx, Collector<String> out) throws Exception {
                //来的时 app 数据,查看 第三方数据是否来过
                if (thirdPartyEventState.value() != null) {
                    out.collect("对账成功:" + value + " " + thirdPartyEventState.value());
                    //对账成功后可以清空状态
                    thirdPartyEventState.clear();
                } else {
                    //更新状态 更新 app
                    appEventState.update(value);
                    //定义注册定时器,等待另一条流的数据
                    ctx.timerService().registerEventTimeTimer(value.f2 + 5000L); //等待 5s
                }
            }
    
            @Override
            public void processElement2(Tuple4<String, String, String, Long> value, Context ctx, Collector<String> out) throws Exception {
                //来的时 app 数据,查看 第三方数据是否来过
                if (appEventState.value() != null) {
                    out.collect("对账成功:" + appEventState.value() + " " + value);
                    //对账成功后可以清空状态
                    appEventState.clear();
                } else {
                    //更新状态 更新 app
                    thirdPartyEventState.update(value);
                    //定义注册定时器,等待另一条流的数据
                    ctx.timerService().registerEventTimeTimer(value.f3 + 5000L); //等待 5s
                }
            }
    
            //定时触发
            @Override
            public void onTimer(long timestamp, OnTimerContext ctx, Collector<String> out) throws Exception {
                //如果某个状态不为空,说明另一方流差数据
                if (appEventState.value() != null) {
                    out.collect("对账失败 " + appEventState.value() + " 第三方差数据");
                }
                if (thirdPartyEventState.value() != null) {
                    out.collect("对账失败 " + thirdPartyEventState.value() + " app差数据");
                }
    
                //清空数据
                appEventState.clear();
                thirdPartyEventState.clear();
    
            }
        }
    }

    运行效果

    对账成功:(order-1,app,1000) (order-1,third-party,success,3000)
    对账成功:(order-3,app,3500) (order-3,third-party,success,4000)
    对账失败 (order-2,app,2000) 第三方差数据
  • 相关阅读:
    hdu 2489 dfs枚举组合情况+最小生成树
    hdu3938 Portal 离线的并查集
    hdu3926 Hand in Hand 判断同构
    hdu1811 Rank of Tetris 拓扑排序+并查集
    poj3083 Children of the Candy Corn 深搜+广搜
    HDU 2529 Shot (物理数学题)
    HDU 4576 Robot(概率dp)
    HDU 2672 god is a girl (字符串处理,找规律,简单)
    HDU 2669 Romantic(扩展欧几里德, 数学题)
    HDU 2671 Can't be easier(数学题,点关于直线对称)
  • 原文地址:https://www.cnblogs.com/wdh01/p/16646541.html
Copyright © 2020-2023  润新知