• sqoop使用详解


    import-to-hdfs 关系数据库导出数据到hdfs&hive

    sqoop用于关系数据库和hadoop家族(hdfs、hive、hbase)之间的ETL

    数据库导出到hadoop家族:sqoop import hadoop家族导出到数据库:sqoop export

    下载:
    http://www.apache.org/dyn/closer.lua/sqoop/1.4.7

    官方:
    https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html

    参考文档: https://www.cnblogs.com/yinzhengjie/p/9183069.html

    sqoop 1.4.7 安装

    cd /usr/app

    tar -zxvf sqoop-1.4.7.binhadoop-2.6.0.tar.gz mv sqoop-1.4.7.binhadoop-2.6.0/ sqoop-1.4.7

    cd sqoop-1.4.7/conf/ mv sqoop-env-template.sh sqoop-env.sh vi sqoop-env.sh

    export HADOOPCOMMONHOME=/usr/local/hadoop/hadoop-2.10.0

    export HADOOPMAPREDHOME=/usr/local/hadoop/hadoop-2.10.0

    export HBASE_HOME=/usr/app/hbase-1.6.0

    export HIVE_HOME=/usr/app/apache-hive-2.3.7-bin

    export ZOOCFGDIR=/usr/app/apache-zookeeper-3.6.1-bin/conf

    cp mysql-connector-java-5.1.49-bin.jar /usr/app/sqoop-1.4.7/lib/

    vi /etc/profile

    export SQOOPHOME=/usr/app/sqoop-1.4.7 export PATH=$PATH:$SQOOPHOME/bin

    source /etc/profile

    验证是否成功 sqoop-version

    mysql需要配置root@hadoop001的权限 GRANT ALL PRIVILEGES ON . TO 'root'@'%' IDENTIFIED BY 'mysql'

    如果是集群环境试试关防火墙 sqoop import --connect jdbc:mysql://192.168.10.3:3306/mydb --username root --password mysql --table employees -m 2

    看看效果(默认路径)

    hadoop fs -ls /user/root/employees hadoop fs -cat /user/root/employees/part-m-00000

    命令

    • codegen Generate code to interact with database records
    • create-hive-table Import a table definition into Hive
    • eval Evaluate a SQL statement and display the results
    • export Export an HDFS directory to a database table
    • help List available commands
    • import Import a table from a database to HDFS
    • import-all-tables Import tables from a database to HDFS
    • import-mainframe Import datasets from a mainframe server to HDFS
    • job Work with saved jobs
    • list-databases List available databases on a server
    • list-tables List available tables in a database
    • merge Merge results of incremental imports
    • metastore Run a standalone Sqoop metastore
    • version Display version information

    参数

    • --connect <jdbc-uri> Specify JDBC connect string
    • --connection-manager <class-name> Specify connection manager class to use
    • --driver <class-name> Manually specify JDBC driver class to use
    • --hadoop-mapred-homeOverride $HADOOPMAPREDHOME
    • --help Print usage instructions
    • --password-file Set path for a file containing the authentication password
    • -P Read password from console
    • --password Set authentication password
    • --username Set authentication username
    • --verbose Print more information while working
    • --connection-param-file Optional properties file that provides connection parameters
    • --relaxed-isolation Set connection transaction isolation to read uncommitted for the mappers.

    数据库连接

    • mysql

    sqoop import --connect jdbc:mysql://192.168.10.3:3306/mydb --username root --password mysql

    实现需要驱动

    cp mysql-connector-java-5.1.40-bin.jar apps/sqoop-1.4.6/lib/

    数据表列表

    sqoop list-tables --connect jdbc:mysql://192.168.10.3:3306/mydb --username root --password mysql

    数据表单表导出到hdfs

    导入的默认路径:/user/hadoop/tablename1

    sqoop import --connect jdbc:mysql://192.168.10.3:3306/mydb?useSSL=false --username root --password mysql --table employees --target-dir /user/hadoop11/employees -m 1

    指定分隔符、split-by分区、map并发

    sqoop import
    --connect jdbc:mysql://hadoop1:3306/mysql
    --username root
    --password root
    --table helpkeyword
    --target-dir /user/hadoop11/my
    help_keyword1
    --fields-terminated-by ' '
    -m 2

    --fields-terminated-by ' ' 用于分隔符
    -m:表明需要使用几个map任务并发执行

    sqoop import
    --query 'SELECT a., b. FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS'
    --split-by a.id -m 10 --target-dir /user/foo/joinresults

    --split-by 用于分区处理 
    假设有一张表test,sqoop命令中--split-by 'id',-m 10,会发生怎样奇特的事情。首先呢,sqoop会去查表的元数据等等,重点说一下sqoop是如何根据--split-by进行分区的。首先sqoop会向关系型数据库比如mysql发送一个命令:select max(id),min(id) from test。然后会把max、min之间的区间平均分为10分,最后10个并行的map去找数据库

    增量导入

    id增量

    sqoop import
    --connect jdbc:mysql://hadoop1:3306/mysql
    --username root
    --password root
    --table helpkeyword
    --target-dir /user/hadoop/myimport
    add
    --incremental append
    --check-column helpkeywordid
    --last-value 500
    -m 1

    时间增量

    sqoop import --connect jdbc:mysql://hadoop1:3316/testdb --username root --password transwarp --query “select orderid, name from ordertable where $CONDITIONS” --target-dir /user/root/orderall --split-by id -m 4 --incremental lastmodified --merge-key orderid --check-column time --last-value “2014-11-09 21:00:00”

    导入到Hive中

    sqoop import --connect jdbc:mysql://192.168.10.3:3306/mydb?useSSL=false --username root --password mysql --table employees --target-dir /user/sqoop/employees --hive-import -m 2

    创建在默认的default库中.看看效果(默认路径
    hadoop fs -cat /user/sqoop/employees//part-m-00000

    Hive 指定行分隔符和列分隔符,指定hive-import,指定覆盖导入,指定自动创建hive表,指定表名,指定删除中间结果数据目录

    先创建一个hive库 hive> create database mydb;

    sqoop import
    --connect jdbc:mysql://192.168.10.3:3306/mydb?useSSL=false
    --username root
    --password mysql
    --table employees
    --fields-terminated-by " "
    --lines-terminated-by " "
    --hive-import
    --hive-overwrite
    --create-hive-table
    --delete-target-dir
    --hive-database mydb
    --hive-table employees

    查看效果 
    use mydb;
    select * from employees limit 20;

    • --create-hive-table //改参数表示如果表不存在就创建,若存在就忽略该参数
    • --external-table-dir //指定外部表路径
    • --hive-database <database-name> //指定hive的数据库
    • --hive-import //指定导入hive表
    • --hive-partition-key <partition-key> //指定分区的key
    • --hive-partition-value <partition-value> //指定分区的value
    • --hive-table <table-name> //指定hive的表

    HDFS导出 > MySQL

    sqoop export --connect jdbc:mysql://192.168.1.59:3307/bigdata --username root --password mysql --export-dir /sqoop --table student1

    HIVE导出 > MySQL

    sqoop export --connect jdbc:mysql://192.168.1.59:3307/bigdata --username root --password mysql --table employee1 --export-dir /user/hive/warehouse/sqoop.db/employee -input-fields-terminated-by '01' --m 3

    java代码生成

    sqoop codegen --connect jdbc:mysql://192.168.1.59:3307/bigdata --username root --password mysql --table employee1

  • 相关阅读:
    子序列自动机
    poj 暴力水题
    小白贪心题
    组合数+费马大/小定理
    随机数法
    vector的二维用法+前缀和
    巨思维题
    思维水题
    Codeforces Round #323 (Div. 2) D.Once Again... (nlogn LIS)
    Codeforces Round #325 (Div. 2) D. Phillip and Trains (BFS)
  • 原文地址:https://www.cnblogs.com/starcrm/p/14034304.html
Copyright © 2020-2023  润新知