• Flink开发_Flink的概念理解


    1.Model level

    ###1. DataStream API  
        use Data Source
         environment.fromSource(
             Source<OUT, ?, ?> source,
             WatermarkStrategy<OUT> timestampsAndWatermarks,
             String sourceName)
    	 StreamExecutionEnvironment.addSource(sourceFunction).
    
    ###2.DataSet API
           DataSet Transformations
    
    ###3.Table API & SQL
        使用Java开发 依赖
    	   flink-table-common
    	   flink-table-api-java-bridge
    	   flink-table-planner-blink
    	   flink-table-runtime-blink
    	 引入:
    	    org.apache.flink.table.api.Table;
            org.apache.flink.table.api.bridge.java.StreamTableEnvironment;
    		org.apache.flink.table.api.bridge.java.BatchTableEnvironment
    	 Flink 1.11 引入了新的 Table Source 和 Sink 接口(即  DynamicTableSource 和 DynamicTableSink ),
    		    org.apache.flink.table.connector.source
    			org.apache.flink.table.connector.sink
    		  这一接口可以统一批作业和流作业
    

    2.Data Types

      Supported Data Types
      Type handling
      Creating a TypeInformation or TypeSerializer
    
     Data Types in the Table API
        org.apache.flink.table.types.DataType within the Table API 
    	   or when defining connectors, catalogs, 
    	   or user-defined functions.
    

    3.Connector

    从数据讲,有三类connector
      DataStream Connectors
      DataSet Connectors
      Table & SQL Connectors
    作用:
       01.DataStream Connectors
         Predefined Sources and Sinks
    	 Bundled Connectors
    	 Connectors in Apache Bahir
    	Other Ways to Connect to Flink
    	  Data Enrichment via Async I/O
    	  Queryable State
       02.DataSet Connectors
           file systems
    	    other systems using Input/OutputFormat wrappers for Hadoop
       03.Table & SQL Connectors : register table sources and table sinks
       Flink’s table connectors.
    	     User-defined Sources & Sinks  ==  develop a custom, user-defined connector.
             Metadata  Planning  Runtime
    		  实现:
    		     Dynamic Table Source    Dynamic Table Factories
    			 Dynamic Table Sink     Encoding / Decoding Formats
     Predefined Sources and Sinks
       1.pre-defined source connectors
              自定义的Source   SourceOperators 
       flink-core
           org.apache.flink.api.connector.source.SourceSplit
           org.apache.flink.api.connector.source.SourceReader
           org.apache.flink.api.connector.source.SplitEnumerator
           org.apache.flink.api.connector.source.event.NoMoreSplitsEvent
           自定义一个新的 数据源或者理解Fink的数据源的原理
      Sources and sinks are often summarized under the term connector.
    

    4.Refactor Source Interface

    . Data Source API

      Flink提供的Source - Data Source API
     01. A Data Source has three core components: 
    	    Splits , the SplitEnumerator, and the SourceReader.
             在有界或者批处理的情况下,
    	     the enumerator generates a fix set of splits, and each split is necessarily finite. 
    		 读取完成后,会返回 NoMoreSplits ,即 有限的splits,且每一个 split是有界的
    	  在无界的流处理情况下
    	      one of the two is not true (splits are not finite, or the enumerator keep generating new splits).
    	例如:
    	  Bounded File Source
    	  Unbounded Streaming File Source
    	    SplitEnumerator 不对 NoMoreSplits做回应,且周期的查看内容 
     02.The Source API is  工厂模式的接口来创建以下组件
         Split Serializer
            Split Enumerator
            Enumerator Checkpoint Serializer
            Source Reader                  消费来自Split的消息
    03.
      The SplitReader is the high-level API for simple synchronous reading/polling-based source implementations,
       SourceReaderBase
       SplitFetcherManager
       数据源的Event Time and Watermarks ,不要使用老的函数了,因为数据源已经assigned
    

    2. Data Source Function

      01.预定义的 Source 和 Sink,
           (内置在 Flink 里 直接使用,一般用于调试验证等,不需要引入外部依赖)
        pre-implemented source functions,
        File-based
        Socket-based
        Collection-based
       02.Connectors provide code for interfacing with various third-party systems
        连接器可以和多种多样的第三方系统进行交互
        001.Flink里已经提供了一些绑定的Connector(需要 将相应的connetor相关类打包进)
           public abstract class KafkaDynamicSinkBase implements DynamicTableSink 
    	   public interface ScanTableSource extends DynamicTableSource 
    	   org.apache.flink.table.connector.sink.DynamicTableSink
    	   org.apache.flink.table.connector.source.DynamicTableSource
        002.Apache Bahir中的连接器
       03.Flink 提供了异步 I/O API  连接Fink,一般用于访问外部数据库
       异步I/O可以并发处理多个请求,提高吞吐,减少延迟
    
    04.可查询状态
  • 相关阅读:
    Ubuntu 16.04 快速搭建ftp服务(转载)
    emmc协议简介(转载)
    产品经理简介
    小程序常用代码
    微信登录
    sql临时表与变量表
    备份数据库与还原数据库
    JQuery多个异步操作后执行(resolve,promise,when,done)
    苹果IOS下text-shadow与box-shadow失效的解决办法
    jS弹出新窗口被拦截的解决方法
  • 原文地址:https://www.cnblogs.com/ytwang/p/14081573.html
Copyright © 2020-2023  润新知