• 通过waterdrop导入clickhourse数据


    hive表导入waterdrop数据

    配置batch.conf.template拷贝一个为batch.conf(以下是我例子配置,按照自己需求可以做调整)

    ######
    ###### This config file is a demonstration of batch processing in waterdrop config
    ######
    
    spark {
      # You can set spark configuration here
      # see available properties defined by spark: https://spark.apache.org/docs/latest/configuration.html#available-properties
      spark.app.name = "Waterdrop"
      spark.executor.instances = 2
      spark.executor.cores = 1
      spark.executor.memory = "1g"
    }
    
    input {
      # This is a example input plugin **only for test and demonstrate the feature input plugin**
      hive {
        pre_sql = "select * from terminal.XX"
        result_table_name = "XX"
      }
    
    
    
      # You can also use other input plugins, such as hdfs
      # hdfs {
      #   result_table_name = "accesslog"
      #   path = "hdfs://hadoop-cluster-01/nginx/accesslog"
      #   format = "json"
      # }
    
      # If you would like to get more information about how to configure waterdrop and see full list of input plugins,
      # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
    }
    
    filter {
    #  # split data by specific delimiter
    #  split {
    #    fields = ["msg", "name"]
    #    delimiter = " "
    #    result_table_name = "accesslog"
    #    remove {
    #        source_field = ["imei1", "imei2"]
    #    }
    #  }
    
    
    
      # you can also you other filter plugins, such as sql
      # sql {
      #   sql = "select * from accesslog where request_time > 1000"
      # }
    
      # If you would like to get more information about how to configure waterdrop and see full list of filter plugins,
      # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
    }
    
    output {
      # choose stdout output plugin to output data to console
      #stdout {
      #}
    
          clickhouse {
            host = "127.0.0.1:8123"
            database = "waterdrop"
            table = "access_log"
            fields = ["XX","day"]
            username = "user_richdm"
            password = "richdm"
        }
    
      # you can also you other output plugins, such as sql
      # hdfs {
      #   path = "hdfs://hadoop-cluster-01/nginx/accesslog_processed"
      #   save_mode = "append"
      # }
    
      # If you would like to get more information about how to configure waterdrop and see full list of output plugins,
      # please go to https://interestinglab.github.io/waterdrop/#/zh-cn/configuration/base
    }

    执行命令

    ./start-waterdrop.sh --master yarn --deploy-mode client --config ../config/batch.conf

    clickhouse的库表都要预先建立好。不会自动给你建立

  • 相关阅读:
    Delphi 10.4.2使用传统代码提示方案(auto complete)(转)
    Sqlserver 清除维护相关日志
    postgresql 时间戳自动更新
    sqlserver 修改电脑名或是 修复数据引擎
    postgresql uuid(guid)生成函数及使用
    List.toArray使用方法
    HashMap 1.7与1.8区别
    设计模式之观察者模式实现(JAVA)
    ubuntun下安装rabbitMq
    Java中replace与replaceAll区别
  • 原文地址:https://www.cnblogs.com/yaohaitao/p/15612377.html
Copyright © 2020-2023  润新知