• HIVE之 Sqoop 1.4.6 安装、hive与oracle表互导


    1. sqoop数据迁移

    1.1 概述

    sqoop是apache旗下一款“Hadoop和关系数据库服务器之间传送数据”的工具。
    导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;
    导出数据:从Hadoop的文件系统中导出数据到关系数据库

    1.2 工作机制

    将导入或导出命令翻译成mapreduce程序来实现
    在翻译出的mapreduce中主要是对inputformat和outputformat进行定制

    1.3 sqoop实战及原理

    1.3.1 sqoop安装

    安装sqoop的前提是已经具备java和hadoop的环境
    1、下载并解压
    最新版下载地址http://ftp.wayne.edu/apache/sqoop/1.4.6/
    比如:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz

    [root@hadoop1 sqoop]# tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
    [root@hadoop1 sqoop]# mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop
    [root@hadoop1 sqoop]# ls
    apache-hive-1.2.1-bin hadoop-2.7.4 hdfs pig-0.17.0 pig_1517170893185.log sqoop tmp

    2、修改配置文件
    在/etc/profile中配置sqoop_home,代码如下:

    vim /etc/profile
    export SQOOP_HOME=/usr/local/hadoop/sqoop
    追加path
    export PATH=$PATH:$SQOOP_HOME/bin
    [root@hadoop1 sqoop]# source /etc/profile
    $ cd $SQOOP_HOME/conf
    $ mv sqoop-env-template.sh sqoop-env.sh

    打开sqoop-env.sh并编辑下面几行: ## 去掉前面的##

    export HADOOP_COMMON_HOME=/usr/local/hadoop/hadoop-2.7.4/
    export HADOOP_MAPRED_HOME=/usr/local/hadoop/hadoop-2.7.4/
    export HIVE_HOME=/usr/local/hadoop/apache-hive-1.2.1-bin/

    配置后的界面效果如下:
    这里写图片描述

    3.1 加入oracle的驱动包
    将 ojdbc6.jar 放到 $SQOOP_HOME/lib/ 下。

    3.2 加入mysql的jdbc驱动包
    将mysql-connector-java-5.1.38.jar 放到 $SQOOP_HOME/lib/ 下。

    4、验证启动

    $ cd $SQOOP_HOME/bin
    $ sqoop-version

    预期的输出:


    [root@hadoop1 sqoop]# sqoop-version
    Warning: /usr/local/hadoop/sqoop/../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /usr/local/hadoop/sqoop/../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /usr/local/hadoop/sqoop/../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /usr/local/hadoop/sqoop/../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    18/01/29 19:09:34 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    Sqoop 1.4.6
    git commit id c0c5a81723759fa575844a0a1eae8f510fa32c25
    Compiled by root on Mon Apr 27 14:38:36 CST 2015
    [root@hadoop1 sqoop]#

    到这里,整个Sqoop安装工作完成。


    数据迁移> oracle to hive ## 注意 HIVE 表名需要大写
    sqoop# sqoop import --hive-import --connect jdbc:oracle:thin:@10.100.25.8:1521:devdb1 --username andy --password andy
    --table ANDY --hive-database oracletohive --hive-table ANDY -m 1
    说明: 迁移的表时,如果 hive中已经存在,则默认会追加在原表中。 如果 hive 中不存在,则自动创建。


    日志输出:
    18/01/29 19:35:46 INFO hive.HiveImport: Loading uploaded data into Hive
    18/01/29 19:35:51 INFO hive.HiveImport:
    18/01/29 19:35:51 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/usr/local/hadoop/apache-hive-1.2.1-bin/lib/hive-common-1.2.1.jar!/hive-log4j.properties
    18/01/29 19:36:02 INFO hive.HiveImport: OK
    18/01/29 19:36:02 INFO hive.HiveImport: Time taken: 2.42 seconds
    18/01/29 19:36:03 INFO hive.HiveImport: Loading data to table oracletohive.andy
    18/01/29 19:36:04 INFO hive.HiveImport: Table oracletohive.andy stats: [numFiles=1, totalSize=1996]
    18/01/29 19:36:04 INFO hive.HiveImport: OK
    18/01/29 19:36:04 INFO hive.HiveImport: Time taken: 1.579 seconds
    18/01/29 19:36:04 INFO hive.HiveImport: Hive import complete.
    18/01/29 19:36:04 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

    > show databases;
    OK
    default
    oracletohive
    Time taken: 0.027 seconds, Fetched: 2 row(s)
    hive>
    > use oracletohive;
    OK
    Time taken: 0.034 seconds
    hive>
    > show tables;
    OK
    andy
    Time taken: 0.037 seconds, Fetched: 1 row(s)
    hive> select count(*) from andy;
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-01-29 19:43:46,040 Stage-1 map = 0%, reduce = 0%
    2018-01-29 19:43:54,738 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.49 sec
    2018-01-29 19:44:03,323 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.39 sec
    MapReduce Total cumulative CPU time: 3 seconds 390 msec
    Ended Job = job_1517222580457_0002
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.39 sec HDFS Read: 16343 HDFS Write: 2 SUCCESS
    Total MapReduce CPU Time Spent: 3 seconds 390 msec
    OK
    7
    Time taken: 34.543 seconds, Fetched: 1 row(s)

    数据迁移> hive to oracle

    hive迁移oracle,需要提前在oracle中创建迁移的表,否则报 java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist


    sqoop# sqoop export --table ANDY --connect jdbc:oracle:thin:@10.100.25.8:1521:devdb1 --username andy --password andy
    --export-dir /user/hive/warehouse/oracletohive.db/andy --input-fields-terminated-by '01'
    --input-lines-terminated-by ' '

    日志输出:
    18/01/29 20:21:34 INFO mapreduce.Job: Job job_1517222580457_0005 completed successfully
    18/01/29 20:21:34 INFO mapreduce.Job: Counters: 30
    。。。。。 省略输出
    18/01/29 20:21:34 INFO mapreduce.ExportJobBase: Transferred 5.502 KB in 116.7414 seconds (48.2605 bytes/sec)
    18/01/29 20:21:34 INFO mapreduce.ExportJobBase: Exported 7 records.

    -- oracle端查看
    SQL> select count(*) from andy;

    COUNT(*)
    ----------
    14 > 由 7条 变为了 14条 , 说明 hive 导入 oracle 成功!

  • 相关阅读:
    2016/10/18 数据库设计三大范式
    2016/10/13 Oracle COALESCE()
    2016/10/13 oracle中的round()
    2016/10/10 数据、数据元素和数据项
    2016/09/29 Maven简介
    2016/09/29 瀑布模型开发和敏捷开发
    python2和python3中的类
    使用JQuery完成页面定时弹出广告
    JQuery入门+js库文件分享
    使用JavaScript完成控制下拉列表左右选择
  • 原文地址:https://www.cnblogs.com/andy6/p/8378450.html
Copyright © 2020-2023  润新知