• Sqoop-1.4.7-部署与常见案例


    该文章是基于 Hadoop2.7.6_01_部署Hive-1.2.1_01_安装部署 进行的

    1. 前言

           在一个完整的大数据处理系统中,除了hdfs+mapreduce+hive组成分析系统的核心之外,还需要数据采集、结果数据导出、任务调度等不可或缺的辅助系统,而这些辅助工具在hadoop生态体系中都有便捷的开源框架,如图所示:

    1.1. 概述

    sqoop是apache旗下一款Hadoop和关系数据库服务器之间传送数据”的工具。

      导入数据:MySQL,Oracle导入数据到Hadoop的HDFS、HIVE、HBASE等数据存储系统;

      导出数据:从Hadoop的文件系统中导出数据到关系数据库

    1.3. 工作机制

    将导入或导出命令翻译成mapreduce程序来实现

    在翻译出的mapreduce中主要是对inputformat和outputformat进行定制

    2. Sqoop的安装部署

    2.1. 软件部署

     1 [yun@mini01 software]$ pwd
     2 /app/software
     3 [yun@mini01 software]$ tar xf sqoop-1.4.7.bin__hadoop-2.6.0.tar.gz 
     4 [yun@mini01 software]$ mv sqoop-1.4.7.bin__hadoop-2.6.0 /app/sqoop-1.4.7
     5 [yun@mini01 software]$ cd /app/
     6 [yun@mini01 ~]$ ln -s sqoop-1.4.7/ sqoop
     7 [yun@mini01 ~]$ ll
     8 total 28
     9 …………
    10 lrwxrwxrwx  1 yun yun    12 Aug  2 11:40 sqoop -> sqoop-1.4.7/
    11 drwxr-xr-x  9 yun yun   318 Dec 19  2017 sqoop-1.4.7

    2.2. 配置修改

     1 [yun@mini01 conf]$ pwd
     2 /app/sqoop/conf
     3 [yun@mini01 conf]$ ll
     4 total 32
     5 -rw-rw-r-- 1 yun yun 3895 Dec 19  2017 oraoop-site-template.xml
     6 -rw-rw-r-- 1 yun yun 1404 Dec 19  2017 sqoop-env-template.cmd
     7 -rwxr-xr-x 1 yun yun 1345 Dec 19  2017 sqoop-env-template.sh
     8 -rw-rw-r-- 1 yun yun 6044 Dec 19  2017 sqoop-site-template.xml
     9 -rw-rw-r-- 1 yun yun 6044 Dec 19  2017 sqoop-site.xml
    10 [yun@mini01 conf]$ cp -a sqoop-env-template.sh sqoop-env.sh 
    11 [yun@mini01 conf]$ cat sqoop-env.sh 
    12 # Licensed to the Apache Software Foundation (ASF) under one or more
    13 ………………
    14 
    15 # Set Hadoop-specific environment variables here.
    16 
    17 #Set path to where bin/hadoop is available  # 修改的配置
    18 export HADOOP_COMMON_HOME=${HADOOP_HOME}
    19 
    20 #Set path to where hadoop-*-core.jar is available  # 修改的配置
    21 export HADOOP_MAPRED_HOME=${HADOOP_HOME}
    22 
    23 #set the path to where bin/hbase is available
    24 #export HBASE_HOME=
    25 
    26 #Set the path to where bin/hive is available  # 修改的配置
    27 export HIVE_HOME=${HIVE_HOME}
    28 
    29 #Set the path for where zookeper config dir is
    30 #export ZOOCFGDIR=

    2.3. 加入mysql的jdbc驱动包

    1 [yun@mini01 software]$ pwd
    2 /app/software
    3 [yun@mini01 software]$ cp -a mysql-connector-java-5.1.46.jar /app/sqoop/lib 

    2.4. 加入hive的执行包

    1 [yun@mini01 lib]$ pwd
    2 /app/hive/lib
    3 [yun@mini01 lib]$ cp -a hive-exec-1.2.1.jar /app/sqoop/lib/

           避免出现

    2.5. 验证启动

     1 [yun@mini01 bin]$ pwd
     2 /app/sqoop/bin
     3 [yun@mini01 bin]$ ./sqoop-version 
     4 18/08/02 14:56:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
     5 Sqoop 1.4.7
     6 git commit id 2328971411f57f0cb683dfb79d19d4d19d185dd8
     7 Compiled by maugli on Thu Dec 21 15:59:58 STD 2017
     8 [yun@mini01 bin]$ ./sqoop help  # 查看帮助 
     9 18/08/02 15:30:00 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
    10 usage: sqoop COMMAND [ARGS]
    11 
    12 Available commands:
    13   codegen            Generate code to interact with database records
    14   create-hive-table  Import a table definition into Hive
    15   eval               Evaluate a SQL statement and display the results
    16   export             Export an HDFS directory to a database table
    17   help               List available commands
    18   import             Import a table from a database to HDFS
    19   import-all-tables  Import tables from a database to HDFS
    20   import-mainframe   Import datasets from a mainframe server to HDFS
    21   job                Work with saved jobs
    22   list-databases     List available databases on a server
    23   list-tables        List available tables in a database
    24   merge              Merge results of incremental imports
    25   metastore          Run a standalone Sqoop metastore
    26   version            Display version information
    27 
    28 See 'sqoop help COMMAND' for information on a specific command.

    3. 数据库信息

    1 # 建库
    2 CREATE DATABASE sqoop_test DEFAULT CHARACTER SET utf8 ;
    3 # 建账号  数据库在mini03机器上
    4 grant all on sqoop_test.* to sqoop_test@'%' identified by 'sqoop_test';
    5 grant all on sqoop_test.* to sqoop_test@'mini03' identified by 'sqoop_test';
    6 # 刷新权限
    7 flush privileges;

    3.1. 表信息

    表emp:

    id

    name

    deg

    salary

    dept

    1201

    gopal

    manager

    50000

    TP

    1202

    manisha

    Proof reader

    50000

    TP

    1203

    khalil

    php dev

    30000

    AC

    1204

    prasanth

    php dev

    30000

    AC

    1205

    kranthi

    admin

    20000

    TP

    表emp_add:

    id

    name

    deg

    salary

    1201

    288A

    vgiri

    jublee

    1202

    108I

    aoc

    sec-bad

    1203

    144Z

    pgutta

    hyd

    1204

    78B

    old city

    sec-bad

    1205

    720X

    hitec

    sec-bad

    表emp_conn:

    id

    name

    deg

    1201

    2356742

    gopal@tp.com

    1202

    1661663

    manisha@tp.com

    1203

    8887776

    khalil@ac.com

    1204

    9988774

    prasanth@ac.com

    1205

    1231231

    kranthi@tp.com

    4. Sqoop的数据导入

           “导入工具”导入单个表从RDBMS到HDFS。表中的每一行被视为HDFS的记录。所有记录都存储为文本文件的文本数据(或者Avro、sequence文件等二进制数据) 

    1 $ sqoop import (generic-args) (import-args)
    2 $ sqoop-import (generic-args) (import-args)

    4.1. 导入表数据到HDFS

     1 [yun@mini01 sqoop]$ pwd
     2 /app/sqoop 
     3 [yun@mini01 sqoop]$ bin/sqoop import   
     4   --connect jdbc:mysql://mini03:3306/sqoop_test   
     5   --username sqoop_test  
     6   --password sqoop_test   
     7   --table emp   
     8   --m 1
     9 18/08/02 15:57:55 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
    10 18/08/02 15:57:55 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
    11 ………………
    12 18/08/02 15:58:47 INFO mapreduce.Job:  map 100% reduce 0%
    13 18/08/02 15:58:47 INFO mapreduce.Job: Job job_1533196573365_0001 completed successfully
    14 ………………
    15 18/08/02 15:58:47 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 47.2865 seconds (3.1933 bytes/sec)
    16 18/08/02 15:58:47 INFO mapreduce.ImportJobBase: Retrieved 5 records.

    查看导入的数据

     1 [yun@mini02 ~]$ hadoop fs -ls /user/yun/emp
     2 Found 2 items
     3 -rw-r--r--   2 yun supergroup          0 2018-08-02 15:58 /user/yun/emp/_SUCCESS
     4 -rw-r--r--   2 yun supergroup        151 2018-08-02 15:58 /user/yun/emp/part-m-00000
     5 [yun@mini02 ~]$ hadoop fs -cat /user/yun/emp/part-m-00000
     6 1201,gopal,manager,50000,TP
     7 1202,manisha,Proof reader,50000,TP
     8 1203,khalil,php dev,30000,AC
     9 1204,prasanth,php dev,30000,AC
    10 1205,kranthi,admin,20000,TP

    4.2. 导入表到HDFS指定目录

    1 [yun@mini01 sqoop]$ pwd
    2 /app/sqoop
    3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
    4   --username sqoop_test --password sqoop_test 
    5   --target-dir /sqoop_test/table_emp/queryresult 
    6   --table emp --num-mappers 1

      注意:如果没有目录,那么会创建

    查看导入的数据

     1 [yun@mini02 ~]$ hadoop fs -ls /sqoop_test/table_emp/queryresult
     2 Found 2 items
     3 -rw-r--r--   2 yun supergroup          0 2018-08-02 17:57 /sqoop_test/table_emp/queryresult/_SUCCESS
     4 -rw-r--r--   2 yun supergroup        151 2018-08-02 17:57 /sqoop_test/table_emp/queryresult/part-m-00000
     5 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult/part-m-00000
     6 1201,gopal,manager,50000,TP
     7 1202,manisha,Proof reader,50000,TP
     8 1203,khalil,php dev,30000,AC
     9 1204,prasanth,php dev,30000,AC
    10 1205,kranthi,admin,20000,TP

    4.3. 导入关系表到HIVE

     1 [yun@mini01 sqoop]$ pwd
     2 /app/sqoop
     3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
     4   --username sqoop_test --password sqoop_test 
     5   --table emp --hive-import 
     6   --num-mappers 1
     7 18/08/02 17:40:45 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
     8 18/08/02 17:40:45 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
     9 ………………
    10 18/08/02 17:41:09 INFO mapreduce.ImportJobBase: Transferred 151 bytes in 20.6744 seconds (7.3037 bytes/sec)
    11 18/08/02 17:41:09 INFO mapreduce.ImportJobBase: Retrieved 5 records.
    12 18/08/02 17:41:09 INFO mapreduce.ImportJobBase: Publishing Hive/Hcat import job data to Listeners for table emp
    13 18/08/02 17:41:10 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `emp` AS t LIMIT 1
    14 18/08/02 17:41:10 WARN hive.TableDefWriter: Column salary had to be cast to a less precise type in Hive
    15 18/08/02 17:41:10 INFO hive.HiveImport: Loading uploaded data into Hive
    16 18/08/02 17:41:12 INFO hive.HiveImport: 
    17 18/08/02 17:41:12 INFO hive.HiveImport: Logging initialized using configuration in jar:file:/app/sqoop-1.4.7/lib/hive-exec-1.2.1.jar!/hive-log4j.properties
    18 18/08/02 17:41:20 INFO hive.HiveImport: OK
    19 18/08/02 17:41:20 INFO hive.HiveImport: Time taken: 1.677 seconds
    20 18/08/02 17:41:20 INFO hive.HiveImport: Loading data to table default.emp
    21 18/08/02 17:41:21 INFO hive.HiveImport: Table default.emp stats: [numFiles=1, totalSize=151]
    22 18/08/02 17:41:21 INFO hive.HiveImport: OK
    23 18/08/02 17:41:21 INFO hive.HiveImport: Time taken: 0.629 seconds
    24 18/08/02 17:41:21 INFO hive.HiveImport: Hive import complete.
    25 18/08/02 17:41:21 INFO hive.HiveImport: Export directory is contains the _SUCCESS file only, removing the directory.

    查看导入的数据

     1 hive (default)> show tables;
     2 OK
     3 emp
     4 Time taken: 0.031 seconds, Fetched: 1 row(s)
     5 hive (default)> select * from emp;
     6 OK
     7 1201    gopal    manager    50000.0    TP
     8 1202    manisha    Proof reader    50000.0    TP
     9 1203    khalil    php dev    30000.0    AC
    10 1204    prasanth    php dev    30000.0    AC
    11 1205    kranthi    admin    20000.0    TP
    12 Time taken: 0.489 seconds, Fetched: 5 row(s)

    4.4. 导入表到HIVE指定库指定表

     1 [yun@mini01 sqoop]$ pwd
     2 /app/sqoop
     3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
     4   --username sqoop_test --password sqoop_test 
     5   --table emp 
     6   --delete-target-dir 
     7   --fields-terminated-by '	' 
     8   --hive-import 
     9   --hive-database sqoop_test 
    10   --hive-table hive_emp 
    11   --num-mappers 1

           注意:hive的库sqoop_test,必须先建立。否则会报:FAILED: SemanticException [Error 10072]: Database does not exist: sqoop_test

    查看导入的数据

     1 0: jdbc:hive2://mini01:10000> use sqoop_test;
     2 No rows affected (0.049 seconds)
     3 0: jdbc:hive2://mini01:10000> show tables;
     4 +-----------+--+
     5 | tab_name  |
     6 +-----------+--+
     7 | hive_emp  |
     8 +-----------+--+
     9 1 row selected (0.076 seconds)
    10 0: jdbc:hive2://mini01:10000> select * from hive_emp;
    11 +--------------+----------------+---------------+------------------+----------------+--+
    12 | hive_emp.id  | hive_emp.name  | hive_emp.deg  | hive_emp.salary  | hive_emp.dept  |
    13 +--------------+----------------+---------------+------------------+----------------+--+
    14 | 1201         | gopal          | manager       | 50000.0          | TP             |
    15 | 1202         | manisha        | Proof reader  | 50000.0          | TP             |
    16 | 1203         | khalil         | php dev       | 30000.0          | AC             |
    17 | 1204         | prasanth       | php dev       | 30000.0          | AC             |
    18 | 1205         | kranthi        | admin         | 20000.0          | TP             |
    19 +--------------+----------------+---------------+------------------+----------------+--+
    20 5 rows selected (0.162 seconds)

    4.5. 导入表数据子集

    4.5.1. where子句的导入

    1 [yun@mini01 sqoop]$ pwd
    2 /app/sqoop
    3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
    4   --username sqoop_test --password sqoop_test 
    5   --table emp_add 
    6   --where "city ='sec-bad'" 
    7   --target-dir /sqoop_test/table_emp/queryresult2 
    8   --num-mappers 1

    查看导入的数据

    1 [yun@mini02 ~]$ hadoop fs -ls /sqoop_test/table_emp/queryresult2
    2 Found 2 items
    3 -rw-r--r--   2 yun supergroup          0 2018-08-02 19:33 /sqoop_test/table_emp/queryresult2/_SUCCESS
    4 -rw-r--r--   2 yun supergroup         72 2018-08-02 19:33 /sqoop_test/table_emp/queryresult2/part-m-00000
    5 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult2/part-m-00000
    6 1202,108I,aoc,sec-bad
    7 1204,78B,old city,sec-bad
    8 1205,720X,hitec,sec-bad

    4.5.2. query按需导入

    1 [yun@mini01 sqoop]$ pwd
    2 /app/sqoop
    3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
    4   --username sqoop_test --password sqoop_test 
    5   --query 'select id,name,deg from emp WHERE  id>1203 and $CONDITIONS' 
    6   --split-by id 
    7   --fields-terminated-by '	' 
    8   --target-dir /sqoop_test/table_emp/queryresult4 
    9   --num-mappers 1

    查看导入的数据

    1 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult4/part-m-00000
    2 1204    prasanth    php dev
    3 1205    kranthi    admin

    4.6. 增量导入

           所需参数

    1 --check-column (col)
    2 --incremental (mode)        # mode include append and lastmodified  
    3 --last-value (value)

    示例

     1 [yun@mini01 sqoop]$ pwd
     2 /app/sqoop
     3 [yun@mini01 sqoop]$ bin/sqoop import --connect jdbc:mysql://mini03:3306/sqoop_test 
     4   --username sqoop_test --password sqoop_test 
     5   --table emp 
     6   --incremental append 
     7   --check-column id 
     8   --last-value 1202 
     9   --fields-terminated-by '	' 
    10   --target-dir /sqoop_test/table_emp/queryresult4 
    11   --num-mappers 1

    查看导入的数据

     1 [yun@mini02 ~]$ hadoop fs -ls /sqoop_test/table_emp/queryresult4/
     2 Found 3 items
     3 -rw-r--r--   2 yun supergroup          0 2018-08-02 19:47 /sqoop_test/table_emp/queryresult4/_SUCCESS
     4 -rw-r--r--   2 yun supergroup         41 2018-08-02 19:47 /sqoop_test/table_emp/queryresult4/part-m-00000
     5 -rw-r--r--   2 yun supergroup         88 2018-08-02 20:16 /sqoop_test/table_emp/queryresult4/part-m-00001
     6 [yun@mini02 ~]$ 
     7 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult4/part-m-00000
     8 1204    prasanth    php dev
     9 1205    kranthi    admin
    10 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult4/part-m-00001
    11 1203    khalil    php dev    30000    AC
    12 1204    prasanth    php dev    30000    AC
    13 1205    kranthi    admin    20000    TP

    5. Sqoop的数据导出

    将数据从HDFS导出到RDBMS数据库

    导出前,目标表必须存在于目标数据库中。

    • 默认操作是从将文件中的数据使用INSERT语句插入到表中
    • 更新模式下,是生成UPDATE语句更新表数据
    1 $ sqoop export (generic-args) (export-args)
    2 $ sqoop-export (generic-args) (export-args)

    5.1. 示例

    数据

     1 [yun@mini02 ~]$ hadoop fs -ls /sqoop_test/table_emp/queryresult
     2 Found 2 items
     3 -rw-r--r--   2 yun supergroup          0 2018-08-02 17:57 /sqoop_test/table_emp/queryresult/_SUCCESS
     4 -rw-r--r--   2 yun supergroup        151 2018-08-02 17:57 /sqoop_test/table_emp/queryresult/part-m-00000
     5 [yun@mini02 ~]$ hadoop fs -cat /sqoop_test/table_emp/queryresult/part-m-00000
     6 1201,gopal,manager,50000,TP
     7 1202,manisha,Proof reader,50000,TP
     8 1203,khalil,php dev,30000,AC
     9 1204,prasanth,php dev,30000,AC
    10 1205,kranthi,admin,20000,TP

    1、首先需要手动创建mysql中的目标表

     1 MariaDB [(none)]> use sqoop_test;
     2 Database changed
     3 
     4 MariaDB [sqoop_test]> CREATE TABLE employee (
     5           id INT NOT NULL PRIMARY KEY,
     6           name VARCHAR(20),
     7           deg VARCHAR(20),
     8           salary INT,
     9           dept VARCHAR(10));
    10 Query OK, 0 rows affected (0.00 sec)
    11 
    12 MariaDB [sqoop_test]> show tables;
    13 +----------------------+
    14 | Tables_in_sqoop_test |
    15 +----------------------+
    16 | emp                  |
    17 | emp_add              |
    18 | emp_conn             |
    19 | employee             |
    20 +----------------------+
    21 4 rows in set (0.00 sec)

    2、然后执行导出命令

    1 [yun@mini01 sqoop]$ pwd
    2 /app/sqoop
    3 [yun@mini01 sqoop]$ bin/sqoop export 
    4   --connect jdbc:mysql://mini03:3306/sqoop_test 
    5   --username sqoop_test --password sqoop_test 
    6   --table employee 
    7   --export-dir /sqoop_test/table_emp/queryresult/

    3、验证表mysql命令行

     1 MariaDB [sqoop_test]> select * from employee;
     2 +------+----------+--------------+--------+------+
     3 | id   | name     | deg          | salary | dept |
     4 +------+----------+--------------+--------+------+
     5 | 1201 | gopal    | manager      |  50000 | TP   |
     6 | 1202 | manisha  | Proof reader |  50000 | TP   |
     7 | 1203 | khalil   | php dev      |  30000 | AC   |
     8 | 1204 | prasanth | php dev      |  30000 | AC   |
     9 | 1205 | kranthi  | admin        |  20000 | TP   |
    10 +------+----------+--------------+--------+------+
    11 5 rows in set (0.00 sec)
  • 相关阅读:
    Hbase表的管理
    Hbase指定规则扫描表
    vim配置
    caogao
    go on shell
    实习总结
    shell 脚本
    hadoop实战
    awk使用
    java reflect
  • 原文地址:https://www.cnblogs.com/zhanglianghhh/p/9427570.html
Copyright © 2020-2023  润新知