• Flink WordCount Job Submit


    原创转载请注明出处:https://www.cnblogs.com/agilestyle/p/15127770.html

    Project Directory

    Maven Dependency

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>org.fool</groupId>
        <artifactId>flink</artifactId>
        <version>1.0-SNAPSHOT</version>
    
        <properties>
            <maven.compiler.source>8</maven.compiler.source>
            <maven.compiler.target>8</maven.compiler.target>
        </properties>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-java</artifactId>
                <version>1.12.5</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-java_2.12</artifactId>
                <version>1.12.5</version>
            </dependency>
    
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-clients_2.12</artifactId>
                <version>1.12.5</version>
            </dependency>
        </dependencies>
    
    </project>

    words.txt

    hello world
    hello spark
    hello fink
    hello hadoop
    hello es
    hello java

    批处理

    BatchWordCount.java

    package org.fool.flink;
    
    import org.apache.flink.api.common.functions.FlatMapFunction;
    import org.apache.flink.api.java.DataSet;
    import org.apache.flink.api.java.ExecutionEnvironment;
    import org.apache.flink.api.java.tuple.Tuple2;
    import org.apache.flink.util.Collector;
    
    import java.util.Objects;
    
    public class BatchWordCount {
        public static void main(String[] args) throws Exception {
            ExecutionEnvironment environment = ExecutionEnvironment.getExecutionEnvironment();
    
            String inputPath = Objects.requireNonNull(ClassLoader.getSystemClassLoader().getResource("words.txt")).getPath();
    
            DataSet<String> inputDataSet = environment.readTextFile(inputPath);
    
            DataSet<Tuple2<String, Integer>> resultSet = inputDataSet.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
                @Override
                public void flatMap(String value, Collector<Tuple2<String, Integer>> collector) throws Exception {
                    String[] words = value.split(" ");
                    for (String word : words) {
                        collector.collect(new Tuple2<>(word, 1));
                    }
                }
            }).groupBy(0).sum(1);
    
            resultSet.print();
        }
    }

    Run

    流处理

    使用有界数据来进行流处理

    StreamWordCount.java

    package org.fool.flink;
    
    import org.apache.flink.api.common.functions.FlatMapFunction;
    import org.apache.flink.api.java.functions.KeySelector;
    import org.apache.flink.api.java.tuple.Tuple2;
    import org.apache.flink.streaming.api.datastream.DataStream;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.util.Collector;
    
    import java.util.Objects;
    
    public class StreamWordCount {
        public static void main(String[] args) throws Exception {
            StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
            environment.setParallelism(4);
    
            String inputPath = Objects.requireNonNull(ClassLoader.getSystemClassLoader().getResource("words.txt")).getPath();
    
            DataStream<String> inputDataStream = environment.readTextFile(inputPath);
    
            DataStream<Tuple2<String, Integer>> resultStream = inputDataStream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
                @Override
                public void flatMap(String value, Collector<Tuple2<String, Integer>> collector) throws Exception {
                    String[] words = value.split(" ");
                    for (String word : words) {
                        collector.collect(new Tuple2<>(word, 1));
                    }
                }
            }).keyBy(new KeySelector<Tuple2<String, Integer>, Object>() {
                @Override
                public Object getKey(Tuple2<String, Integer> stringIntegerTuple2) throws Exception {
                    return stringIntegerTuple2.getField(0);
                }
            }).sum(1);
    
            resultStream.print();
    
            environment.execute();
        }
    }

    Note: 程序中设置了并行度为4,Flink 的每个算子都是可以设置并行度的。

    Run

    Note:> 左边的 数字 是计算处理的线程号

    使用无界数据来进行流处理

    StreamSocketWordCount.java

    package org.fool.flink;
    
    import org.apache.flink.api.common.functions.FlatMapFunction;
    import org.apache.flink.api.java.functions.KeySelector;
    import org.apache.flink.api.java.tuple.Tuple2;
    import org.apache.flink.api.java.utils.ParameterTool;
    import org.apache.flink.streaming.api.datastream.DataStream;
    import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
    import org.apache.flink.util.Collector;
    
    public class StreamSocketWordCount {
        public static void main(String[] args) throws Exception {
            StreamExecutionEnvironment environment = StreamExecutionEnvironment.getExecutionEnvironment();
            environment.setParallelism(4);
    
            ParameterTool parameterTool = ParameterTool.fromArgs(args);
            String host = parameterTool.get("host");
            int port = parameterTool.getInt("port");
            DataStream<String> inputDataStream = environment.socketTextStream(host, port);
    
            DataStream<Tuple2<String, Integer>> resultStream = inputDataStream.flatMap(new FlatMapFunction<String, Tuple2<String, Integer>>() {
                @Override
                public void flatMap(String value, Collector<Tuple2<String, Integer>> collector) throws Exception {
                    String[] words = value.split(" ");
                    for (String word : words) {
                        collector.collect(new Tuple2<>(word, 1));
                    }
                }
            }).keyBy(new KeySelector<Tuple2<String, Integer>, Object>() {
                @Override
                public Object getKey(Tuple2<String, Integer> stringIntegerTuple2) throws Exception {
                    return stringIntegerTuple2.getField(0);
                }
            }).sum(1);
    
            resultStream.print();
    
            environment.execute();
        }
    }

    Run

    使用 nc 开启一个 socket

    nc -lk 7878

    查看控制台输出

    Job 提交运行

    使用 UI 提交

    停止 IDEA 中的 StreamSocketWordCount 程序,注释掉

    // environment.setParallelism(4);

    项目根目录下打包

    mvn clean package

    启动 Flink Cluster

    ./start-cluster.sh

    使用 nc 开启一个 socket

    nc -lk 7878

    将打包好的 flink-1.0-SNAPSHOT.jar 提交到 Flink Cluster 中

    设置 并行度为 3,点击 Show Plan

    关闭后,点击 Submit

    查看 Stdout

    使用 命令行 提交

    cd 到 flink 的 bin 目录

    cd ~/app/flink-1.12.5/bin

    flink run job

    flink run -c org.fool.flink.StreamSocketWordCount -p 3 ~/IdeaProjects/helloflink/target/flink-1.0-SNAPSHOT.jar --host 127.0.0.1 -port 7878

    flink list job

    flink list

    flink cancel job

    flink cancel 8ba760464860d28fe4e30109a721be93

    flink list

    flink list -a

    Note: 可以看到 使用 命令行提交 job 比操作 UI 提交更高效和方便。


    欢迎点赞关注和收藏

    强者自救 圣者渡人
  • 相关阅读:
    使用beautiful soup解析xml
    mongodb下载以及连接
    beautiful soup解析有空格的class
    爬取糗事百科的热门段子,以及热图链接
    结果记录
    安装自然语言处理工具Nltk以及初次使用
    AD文献分析 整体框架和数据设计
    遍历目录,目录下文件名存入文件
    dict,列表方法
    工具集
  • 原文地址:https://www.cnblogs.com/agilestyle/p/15127770.html
Copyright © 2020-2023  润新知