• Flink学习笔记——DataSet API


    Flink中的DataSet任务用于实现data sets的转换,data set通常是固定的数据源,比如可读文件,或者本地集合等。

    Ref

    https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/batch/
    

     使用DataSet API需要使用 批处理 env

    ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
    

    DataSet支持的Data Source有:File-based,Collection-based,Generic

    1.File-based

    readTextFile(path) / TextInputFormat - Reads files line wise and returns them as Strings.
    
    readTextFileWithValue(path) / TextValueInputFormat - Reads files line wise and returns them as StringValues. StringValues are mutable strings.
    
    readCsvFile(path) / CsvInputFormat - Parses files of comma (or another char) delimited fields. Returns a DataSet of tuples or POJOs. Supports the basic java types and their Value counterparts as field types.
    
    readFileOfPrimitives(path, Class) / PrimitiveInputFormat - Parses files of new-line (or another char sequence) delimited primitive data types such as String or Integer.
    
    readFileOfPrimitives(path, delimiter, Class) / PrimitiveInputFormat - Parses files of new-line (or another char sequence) delimited primitive data types such as String or Integer using the given delimiter.
    

    2.Collection-based

    fromCollection(Collection) - Creates a data set from a Java.util.Collection. All elements in the collection must be of the same type.
    
    fromCollection(Iterator, Class) - Creates a data set from an iterator. The class specifies the data type of the elements returned by the iterator.
    
    fromElements(T ...) - Creates a data set from the given sequence of objects. All objects must be of the same type.
    
    fromParallelCollection(SplittableIterator, Class) - Creates a data set from an iterator, in parallel. The class specifies the data type of the elements returned by the iterator.
    
    generateSequence(from, to) - Generates the sequence of numbers in the given interval, in parallel.
    

    3.Generic

    readFile(inputFormat, path) / FileInputFormat - Accepts a file input format.
    
    createInput(inputFormat) / InputFormat - Accepts a generic input format.
    

    Data Set支持的transformations算子

    https://ci.apache.org/projects/flink/flink-docs-release-1.12/zh/dev/batch/dataset_transformations.html
    

    DataSet支持的Data Sink有:

    writeAsText() / TextOutputFormat - Writes elements line-wise as Strings. The Strings are obtained by calling the toString() method of each element.
    writeAsFormattedText() / TextOutputFormat - Write elements line-wise as Strings. The Strings are obtained by calling a user-defined format() method for each element.
    writeAsCsv(...) / CsvOutputFormat - Writes tuples as comma-separated value files. Row and field delimiters are configurable. The value for each field comes from the toString() method of the objects.
    print() / printToErr() / print(String msg) / printToErr(String msg) - Prints the toString() value of each element on the standard out / standard error stream. Optionally, a prefix (msg) can be provided which is prepended to the output. This can help to distinguish between different calls to print. If the parallelism is greater than 1, the output will also be prepended with the identifier of the task which produced the output.
    write() / FileOutputFormat - Method and base class for custom file outputs. Supports custom object-to-bytes conversion.
    output()/ OutputFormat - Most generic output method, for data sinks that are not file based (such as storing the result in a database).
    

      

  • 相关阅读:
    JAX XML 实例
    javascript获取浏览器的全部信息
    如何防止ASP.NET应用程序中的SQL注入安全漏洞
    测试Web应用程序是否存在跨站点脚本漏洞()
    C# 中的委托和事件详解(四)
    C# 中的委托和事件详解(一)
    远程启动SSIS包
    AX2012学习笔记Date Effectiveness(有效期间)
    AX2010学习笔记Surrogate Key(代理键)
    SSRS 2005 400 Bad Request
  • 原文地址:https://www.cnblogs.com/tonglin0325/p/14121353.html
Copyright © 2020-2023  润新知