• Flink From Stream to Batch insert


    Recently when I work on ETL task: Consume data from Kafka to DB. Frequent a row a connect->insert->close to DB is not a good option. 

    First setParalism(n) can help to save total operation time. but number of inserting keeps the same. 

    at dataSource, every operator, and dataSink, al the spot can setParalism.

    DataStream<String> text = env.socketTextStream("127.0.0.1", 9000).setParallelism(1);

    Secondly, set time Window/count Window, then transfer data to List<data>, afterwards, batch insert. (question is difference of window type, and keys, and I have question on process and apply)

    text.timeWindowAll(Time.seconds(5)).process(new ProcessWindow()).addSink(new RichSinkFunction<List<String>>() {
    private Connection connection;
    private PreparedStatement stmt;
    private String SQL = "";
    public void open(Configuration parameters) throws Exception {
    String url = "jdbc:mysql://localhost:3306/test";
    String user = "root";
    String password = "root";
    String driverClass = "com.mysql.jdbc.Driver";
    Connection connection = null;
    try {
    Class.forName(driverClass);
    Properties info = new Properties();
    info.setProperty("user",user);
    info.setProperty("password",password);
    this.connection = DriverManager.getConnection(url,info);
    } catch (ClassNotFoundException e) {
    log.error("Not found driver class, load driver failed.");
    } catch (SQLException e) {
    log.error("create db connection error,",e);
    }
    }

    @Override
    public void invoke(List<String> value, Context context) throws Exception {
    this.connection.setAutoCommit(false);
    this.stmt = this.connection.prepareStatement(SQL);
    Iterator<String> str_iter = value.iterator();
    while(str_iter.hasNext()){
    stmt.setString(1, str_iter.next());
    stmt.addBatch();
    }
    int[] result = stmt.executeBatch();
    this.connection.commit();
    log.info("inserted:"+result.length);
    }

    public void close() throws Exception {
    super.close();
    if(this.connection!=null){
    this.connection = null;
    }
    if(this.stmt!=null){
    this.connection = null;
    }
    }
    }).setParallelism(1);
    // execute program

    env.execute("Java WordCount from SocketTextStream Example");
    }

    private static class ProcessWindow extends ProcessAllWindowFunction<String, List<String>, TimeWindow> {

    @Override
    public void process(Context context, Iterable<String> iterable, Collector<List<String>> collector) throws Exception {
    List<String> aa = new ArrayList<String>();
    Iterator<String> iter= iterable.iterator();
    while(iter.hasNext()){
    aa.add(iter.next());
    }
    collector.collect(aa);
    }
    }

    Thirdly, Transfer streamData to Table StreamData, them use table sql operation, like JDBCOutputFormat, AppendTableStream<extends addSink>, setInterval to make batch insert.

    final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
    StreamTableEnvironment tableEnv = TableEnvironment.getTableEnvironment(env);

    DataStream<String> text = env.socketTextStream("127.0.0.1", 9000).setParallelism(1);

    text.flatMap(new FlatMapFunction<String, WordCount>() {
    @Override
    public void flatMap(String s, Collector<WordCount> collector) throws Exception {
    String[] ss= s.split(" ");
    for(String item:ss){
    collector.collect(new WordCount(item, 1));
    }
    }
    }).setParallelism(1);
    // execute program

    tableEnv.registerDataStreamInternal("firsttable", text);
    Table table = tableEnv.sqlQuery("select word from firsttable");
    String [] fields = {"word"};
    JDBCAppendTableSink sink = JDBCAppendTableSink.builder()
    .setDBUrl("").setDrivername("").setPassword("").setBatchSize(3).build();
    tableEnv.registerTableSink("test", fields, new TypeInformation[]{Types.STRING}, sink);
    env.execute("ss");

    Lastly, FinkKafkaConsumer may support batch consumer, add time interval.  

    To de researched.

  • 相关阅读:
    C# Split 分割字符串
    vim 编辑器命令
    不靠谱的FLOAT数据类型
    linux系统常用命令
    PHP运算方法
    PHP数据类型
    Centos7 系统在安装时指定使用老式网卡命名方式
    PHP代码编写
    PHP变量介绍
    PHP语言介绍
  • 原文地址:https://www.cnblogs.com/connie313/p/12346317.html
Copyright © 2020-2023  润新知