• 使用Mongo dump 将数据导入到hive


    概述:使用dump 方式将mongo数据导出,上传到hdfs,然后在hive中建立外部表。

    1.     使用mongodump 将集合导出

    mongodump --host=localhost:27017  --db=mydb --collection=users  --out=/tmp/root/mongodump0712   

    [root@slave2 root]# mongodump --host=localhost:27017  --db=mydb --collection=users  --out=/tmp/root/mongodump0712 
    2018-07-12T10:07:27.894+0800    writing mydb.users to 
    2018-07-12T10:07:27.896+0800    done dumping mydb.users (2 documents)
    [root@slave2 root]# cd /tmp/root
    [root@slave2 root]# ls
    3604abd2-a359-4c53-a7b4-e4ea84185801  3604abd2-a359-4c53-a7b4-e4ea841858017799130181720133073.pipeout  dump  hive.log  hive.log.2018-07-11  mongodump0712
    [root@slave2 root]# ll
    total 624
    drwx------. 2 root root      6 Jul 12 09:34 3604abd2-a359-4c53-a7b4-e4ea84185801
    -rw-r--r--. 1 root root      0 Jul 12 09:34 3604abd2-a359-4c53-a7b4-e4ea841858017799130181720133073.pipeout
    drwxr-xr-x. 5 root root     44 Jul 12 10:04 dump
    -rw-r--r--. 1 root root  88700 Jul 12 09:39 hive.log
    -rw-r--r--. 1 root root 547126 Jul 11 21:07 hive.log.2018-07-11
    drwxr-xr-x. 3 root root     18 Jul 12 10:07 mongodump0712
    [root@slave2 root]# cd mongodump0712/
    [root@slave2 mongodump0712]# ls
    mydb
    [root@slave2 mongodump0712]# cd mydb
    [root@slave2 mydb]# ls
    users.bson  users.metadata.json    

    2.     将dump文件上传到hdfs

    hdfs dfs -mkdir /user/hive/warehouse/mongo

       hdfs dfs -put /tmp/root/mongodump0712/mydb/users.bson /user/hive/warehouse/mongo/    

    [root@slave2 mydb]# hdfs dfs -mkdir /user/hive/warehouse/mongo 
    [root@slave2 mydb]# hdfs dfs -put /tmp/root/mongodump0712/mydb/users.bson /user/hive/warehouse/mongo/  

    3.     创建表并测试

    hive> create EXTERNAL table muser
        > (
        >   id string,
        >   userid string,
        >   age bigint,
        >   status string
        > )
        > row format serde 'com.mongodb.hadoop.hive.BSONSerDe'
        > WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","userid":"user_id","age":"age","status":"status"}')
        > stored as inputformat 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
        > outputformat 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
        > location '/user/hive/warehouse/muser';
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:hdfs://ns1/user/hive/warehouse/muser is not a directory or unable to create one)
    hive> create EXTERNAL table muser
        > (
        >   id string,
        >   userid string,
        >   age bigint,
        >   status string
        > )
        > row format serde 'com.mongodb.hadoop.hive.BSONSerDe'
        > WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","userid":"user_id","age":"age","status":"status"}')
        > stored as inputformat 'com.mongodb.hadoop.mapred.BSONFileInputFormat'
        > outputformat 'com.mongodb.hadoop.hive.output.HiveBSONFileOutputFormat'
        > location '/user/hive/warehouse/mongo';
    OK
    Time taken: 0.123 seconds
    hive> select * from muser;
    OK
    5b456e33a93daf7ae53e6419        abc123  58      D
    5b45705ca93daf7ae53e8b2a        bcd001  45      C
    Time taken: 0.181 seconds, Fetched: 2 row(s)

     


  • 相关阅读:
    你好,世界!
    zabbix监控系统(四)创建监控项
    zabbix监控系统(三)监控主机
    zabbix监控系统(二)登录并配置用户
    zabbix监控系统(一)zabbix安装
    python介绍
    Cohen-Sutherland裁剪算法
    eclipse安装openGL方法(完整版)
    java第一课:安装环境变量和用记事本实现Hello World
    实验二
  • 原文地址:https://www.cnblogs.com/abcdwxc/p/9298299.html
Copyright © 2020-2023  润新知