• kafka 导入数据到 doris 


    具体见官网:http://doris.apache.org/master/zh-CN/sql-reference/sql-statements/Data%20Manipulation/ROUTINE%20LOAD.html#description

    下面是个人测试例子:

    1.创建DB

    CREATE TABLE IF NOT EXISTS user
    (
        siteid INT DEFAULT '10',
        citycode SMALLINT,
        username VARCHAR(32) DEFAULT '',
        pv BIGINT SUM DEFAULT '0'
    )
    AGGREGATE KEY(siteid, citycode, username)
    DISTRIBUTED BY HASH(siteid) BUCKETS 10
    PROPERTIES("replication_num" = "1");

    2.准备数据 (kafka  topic  --doris)

    pls input topic:doris
    pls input msg:6|12|pp|123
    send over !!!
    pls input msg:7|32|ww|231
    send over !!!
    pls input msg:8|12|ee|213
    send over !!!
    pls input msg:

    3.导入数据到doris         sea.user:任务标示(唯一)     columus:列名

     CREATE ROUTINE LOAD sea.user ON 
     user COLUMNS TERMINATED BY "|",
     COLUMNS(siteid,citycode,username,pv)
     PROPERTIES(
     "desired_concurrent_number"="1",
     "max_batch_interval"="20",
     "max_batch_rows"="300000",
     "max_batch_size"="209715200")
     FROM KAFKA(
     "kafka_broker_list"="192.168.18.129:9092",
     "kafka_topic"="doris",
     "property.group.id"="gid",
     "property.clinet.id"="cid",
     "property.kafka_default_offsets"="OFFSET_BEGINNING");
    1) OFFSET_BEGINNING: 从有数据的位置开始订阅。
    2) OFFSET_END: 从末尾开始订阅

    3.1 查看routine load状态

    SHOW ALL ROUTINE LOAD FOR  sea.user;

    3.1.1 显示 example_db 下,所有的例行导入作业(包括已停止或取消的作业)。结果为一行或多行。

    use example_db;
    SHOW ALL ROUTINE LOAD;
    Ⅴ).查看routine load状态
    SHOW ALL ROUTINE LOAD FOR datasource_name.kafka_load;
    Ⅵ).常用routine load命令
    a).暂停routine load
    PAUSE ROUTINE LOAD FOR datasource_name.kafka_load;
    b).恢复routine load
    RESUME ROUTINE LOAD FOR datasource_name.kafka_load;
    c).停止routine load
    STOP ROUTINE LOAD FOR datasource_name.kafka_load;
    d).查看所有routine load
    SHOW [ALL] ROUTINE LOAD FOR datasource_name.kafka_load;
    e).查看routine load任务
    SHOW ROUTINE LOAD TASK datasource_name.kafka_load;
    Ⅶ).查看数据
    SELECT * FROM datasource_name.table_name LIMIT 10;

    4.查看数据

     

    官网:

    example:

    4. 简单模式导入json
        CREATE ROUTINE LOAD example_db.test_json_label_1 ON table1
        COLUMNS(category,price,author)
        PROPERTIES
        (
        "desired_concurrent_number"="3",
        "max_batch_interval" = "20",
        "max_batch_rows" = "300000",
        "max_batch_size" = "209715200",
        "strict_mode" = "false",
        "format" = "json"
        )
        FROM KAFKA
        (
        "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
        "kafka_topic" = "my_topic",
        "kafka_partitions" = "0,1,2",
        "kafka_offsets" = "0,0,0"
        );
        支持两种json数据格式:
      1){"category":"a9jadhx","author":"test","price":895}
      2)[
                {"category":"a9jadhx","author":"test","price":895},
                {"category":"axdfa1","author":"EvelynWaugh","price":1299}
         ]

    说明: 1)如果json数据是以数组开始,并且数组中每个对象是一条记录,则需要将strip_outer_array设置成true,表示展平数组。

       2)如果json数据是以数组开始,并且数组中每个对象是一条记录,在设置jsonpath时,我们的ROOT节点实际上是数组中对象。

    6. 用户指定根节点json_root
        CREATE ROUTINE LOAD example_db.test1 ON example_tbl
        COLUMNS(category, author, price, timestamp, dt=from_unixtime(timestamp, '%Y%m%d'))
        PROPERTIES
        (
            "desired_concurrent_number"="3",
            "max_batch_interval" = "20",
            "max_batch_rows" = "300000",
            "max_batch_size" = "209715200",
            "strict_mode" = "false",
            "format" = "json",
            "jsonpaths" = "["$.category","$.author","$.price","$.timestamp"]",
            "strip_outer_array" = "true",
            "json_root" = "$.RECORDS"
        )
        FROM KAFKA
        (
            "kafka_broker_list" = "broker1:9092,broker2:9092,broker3:9092",
            "kafka_topic" = "my_topic",
            "kafka_partitions" = "0,1,2",
            "kafka_offsets" = "0,0,0"
        );

    json数据格式:

    {
      "RECORDS": [
        {
          "category": "11",
          "title": "SayingsoftheCentury",
          "price": 895,
          "timestamp": 1589191587
        },
        {
          "category": "22",
          "author": "2avc",
          "price": 895,
          "timestamp": 1589191487
        },
        {
          "category": "33",
          "author": "3avc",
          "title": "SayingsoftheCentury",
          "timestamp": 1589191387
        }
      ]
    }
  • 相关阅读:
    Maybe You Don't Know ! 如何比较两个引用是否指向同一个对象?
    记录一点项目心得...
    SharePoint 站点模版
    ObjectSpaces,See you in 2006...
    CLR如何实现线程同步
    Using 1.1, Waiting 2.0 & EasyThread
    在SharePoint中的Workflow引擎开发完成
    ViewState
    《WalkThrough WebPart 入门指南二》完成
    隐藏在.NET中的IoC?
  • 原文地址:https://www.cnblogs.com/lshan/p/14790587.html
Copyright © 2020-2023  润新知