• Flink helloworld及理论(一)


    我使用的是flink的单机版,过程在这:https://www.cnblogs.com/wuxiaolong4/p/16548910.html

    helloworld
          import org.apache.flink.api.scala.ExecutionEnvironment
          val env = ExecutionEnvironment.getExecutionEnvironment;//批处理运行上下文环境

          import org.apache.flink.streaming.api.scala.StreamExecutionEnvironment
          val streamenv = StreamExecutionEnvironment.getExecutionEnvironment//流处理运行上下文环境

          streamenv.fromElements[String]("1","2","3","4","5").print()
    运行命令
        flink run ../examples/batch/WordCount.jar --input /opt/flink/flink/README.txt --output /opt/flink/flink/result
    jobmanager:当前集群空值任务管理和调度的核心
        jobmaster:和一个job一一对应。先接收要执行的应用。一般是由客户端提交来的包括:jar,数据流图,作业图
            jobmaster会把作业图转换为物理层面的数据流图,这个图被叫做执行图,它包含了所有可刑罚执行的任务
            jobmaster会向资源管理器发出请求,申请资源,将执行图分发到TaskManager上。
            jobmaster会负责中央协调的操作,比如说检查点的协调
        resourcemanager:主要是管理任务槽 (task slots) 。每一个任务分配到一个任务槽上
        Dispacher:提供REST接口、webui,用来提交应用,并启动jobmaster。(不是必须的)
    taskmanager:(多个) 一个taskmanager包含了一定数量的任务槽
        启动以后向资源管理器注册他的任务槽,收到资源管理器的指令以后,
            taskmanager提供一个或者多个任务槽给jobmanager。
        一个任务的taskmanager之间可以交换数据
        1.app提交应用给分发器
        2.分发器启动并提交应用给jobmanager
        3.jobmanager拆分任务,jobmanager请求资源管理器分配任务槽
        4.资源管理器请求taskmanager任务槽
        5.taskmanager提供任务槽给jobmasterjobmaster执行任务
        flink上运行的程序会被映射成逻辑数据流,它包含了三部分(source,transformation,sink)
        我们后续编程也是按照source,transformation,sink三部分来变成。
        对应spark中输入、转换、输出
        熟悉spark的同学应该能理解这个。
        flink中每一个算子可以包含多个子任务,子任务的个数被称之为并行度(所有算子都可以设置并行度)
        也可以使用全局并行度  env.setParallelism(2)//全局的并行度
        数据传输格式:one-to-one(窄依赖) redistributing(宽依赖)
        (算子)任务链合并(UI界面上可见):one-to-one且并行度相同可以合并
        disableChaining()前后都不合并算子链    startNewChain()开启新算子链
    执行图:可以分为四层(StreamGraph、jobGraph、ExecutionGraph、物理执行图)
        StreamGraph:用户使用stream API编写的代码生成的最初的图,用来表示拓扑结构。
        jobGraph:StreamGraph经过优化(合并算子链)后生成了jobGraph,提交给jobmanager。
        ExecutionGraph:jobmanager根据jobGraph生成最终的ExecutionGraph,
            ExecutionGraph是并行化的jobGraph。是调度最核心的数据结构
        物理执行图:jobmanager根据ExecutionGraph对job进行调度,在各个taskmanager上部署
            task形成的图,并不是一个具体的数据结构
        任务槽(类似于yarn container汇总,只有一个container里面可以细分资源)
        flink可以允许任务槽共享。

        slotSharingGroup("1") 设置共享组     后面默认都是当前共享组
    依赖
            <flink.version>1.13.0</flink.version> 

            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-cep-scala_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-table-planner-blink_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-table-planner_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-table-api-scala-bridge_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>

            <dependency>
                <groupId>org.apache.parquet</groupId>
                <artifactId>parquet-avro</artifactId>
                <version>1.10.0</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-parquet_2.11</artifactId>
                <version>1.9.0</version>
            </dependency>

            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-scala_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-streaming-scala_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>
            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-clients_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>

            <dependency>
                <groupId>redis.clients</groupId>
                <artifactId>jedis</artifactId>
                <version>3.2.0</version>
                <exclusions>
                    <exclusion>
                        <groupId>org.slf4j</groupId>
                        <artifactId>slf4j-api</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>
            <dependency>
                <groupId>org.apache.bahir</groupId>
                <artifactId>flink-connector-redis_2.11</artifactId>
                <version>1.0</version>
                <exclusions>
                    <exclusion>
                        <groupId>org.apache.flink</groupId>
                        <artifactId>flink-streaming-java_2.11</artifactId>
                    </exclusion>
                </exclusions>
            </dependency>

            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-connector-jdbc_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>

            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-connector-kafka_2.11</artifactId>
                <version>${flink.version}</version>
            </dependency>

            <dependency>
                <groupId>org.apache.flink</groupId>
                <artifactId>flink-connector-base</artifactId>
                <version>1.13.6</version>
            </dependency>

            <dependency>
                <groupId>mysql</groupId>
                <artifactId>mysql-connector-java</artifactId>
                <version>5.1.47</version>
            </dependency>
            <dependency>
                <groupId>org.apache.httpcomponents</groupId>
                <artifactId>httpcore</artifactId>
                <version>4.4.9</version>
            </dependency>

            <dependency>
                <groupId>org.apache.httpcomponents</groupId>
                <artifactId>httpclient</artifactId>
                <version>4.5.5</version>
            </dependency>
            <dependency>
                <groupId>com.alibaba</groupId>
                <artifactId>fastjson</artifactId>
                <version>1.2.62</version>
            </dependency>
            <dependency>
                <groupId>org.scala-lang</groupId>
                <artifactId>scala-library</artifactId>
                <version>${scala.version}</version>
            </dependency>
  • 相关阅读:
    根据进程id pid 查容器id
    jenkins 持续集成笔记1 --- 安装配置
    PMM 监控 MySQL 使用钉钉告警
    PMM 监控 MySQL
    docker HealthCheck健康检查
    顶层const和底层const
    Windows下使用VS2017搭建FLTK开发环境
    解决FAT32格式U盘安装win10时0x8007000D错误
    在VS中为C/C++源代码文件生成对应的汇编代码文件(.asm)
    VS2017设置主题和代码字体
  • 原文地址:https://www.cnblogs.com/wuxiaolong4/p/16741390.html
Copyright © 2020-2023  润新知