FLINK基础（103）: DS算子与窗口（14）多流转换算子(5) Interval Join

Interval Join

KeyedStream,KeyedStream → DataStream #

Join two elements e1 and e2 of two keyed streams with a common key over a given time interval, so that e1.timestamp + lowerBound <= e2.timestamp <= e1.timestamp + upperBound.

Interval Join会将两个数据流按照相同的key，并且在其中一个流的时间范围内的数据进行join处理。通常用于把一定时间范围内相关的分组数据拉成一个宽表。我们通常可以用类似下面的表达式来使用interval Join来处理两个数据流：

key1 == key2 && e1.timestamp + lowerBound <= e2.timestamp <= e1.timestamp + upperBound

我们通常可以使用下面的编程模型来处理两个数据流：

import org.apache.flink.api.java.functions.KeySelector;
import org.apache.flink.streaming.api.functions.co.ProcessJoinFunction;
import org.apache.flink.streaming.api.windowing.time.Time;

...

DataStream<Integer> orangeStream = ...
DataStream<Integer> greenStream = ...

orangeStream
    .keyBy(<KeySelector>)
    .intervalJoin(greenStream.keyBy(<KeySelector>))
    .between(Time.milliseconds(-2), Time.milliseconds(1))
    .process (new ProcessJoinFunction<Integer, Integer, String(){

        @Override
        public void processElement(Integer left, Integer right, Context ctx, Collector<String> out) {
            out.collect(first + "," + second);
        }
    });

2.操作DataSet

实例如下：

public class JoinDemo {

    public static void main(String[] args) throws Exception {
        final ExecutionEnvironment env=ExecutionEnvironment.getExecutionEnvironment();

        DataSet<Tuple2<String,Integer>> data1=env.fromElements(
                Tuple2.of("class1",100),
                Tuple2.of("class1",400),
                Tuple2.of("class2",200),
                Tuple2.of("class2",400)
        );

        DataSet<Tuple2<String,Integer>> data2=env.fromElements(
                Tuple2.of("class1",300),
                Tuple2.of("class1",600),
                Tuple2.of("class2",200),
                Tuple2.of("class3",200)
        );

        data1.join(data2)
                .where(0).equalTo(0)
                .with(new JoinFunction<Tuple2<String,Integer>, Tuple2<String,Integer>, Object>() {

                    @Override
                    public Object join(Tuple2<String, Integer> tuple1,
                                       Tuple2<String, Integer> tuple2) throws Exception {
                        return new String(tuple1.f0+" : "+tuple1.f1+" "+tuple2.f1);
                    }
                }).print();
    }
}

运行结果：

class1 : 100 300
class1 : 400 300
class1 : 100 600
class1 : 400 600
class2 : 200 200
class2 : 400 200

除此之外，在操作DataSet时还有很多join，如Outer Join，Flat Join等等，具体可以查看官方文档：

https://ci.apache.org/projects/flink/flink-docs-release-1.6/dev/batch/dataset_transformations.html#join

本文来自博客园，作者：秋华，转载请注明原文链接：https://www.cnblogs.com/qiu-hua/p/15172001.html

相关阅读:
差分约束系统详解
 AC自动机详解
 KMP算法详解
 ST算法详解
 Trie详解
 欧拉路径详解
 树上差分详解
 LCA详解
 树链剖分详解
 树的直径详解
原文地址：https://www.cnblogs.com/qiu-hua/p/15172001.html

最新文章
Java中request请求之
 页面小功能
 细话
 数据库连接池配置
 web.xml配置详情
 CF_321_B_NetFlow
插值计算的简单理解
 G_M_C_美食节
 CF_400_D
CF_884_F(NetFlow)