• sqoop-1.4.6安装与使用


    一、安装
    1.下载sqoop-1.4.6-bin.tar.gz并解压
    2.修改conf/sqoop-env.sh,设置如下变量:
    1. export HADOOP_COMMON_HOME=/usr/local/hadoop-2.6.3
    2. export HADOOP_MAPRED_HOME=/usr/local/hadoop-2.6.3
    3. export HBASE_HOME=/usr/local/hbase-1.1.3
    4. export HIVE_HOME=/usr/local/hive-2.0.0
    5. #export ZOOCFGDIR=
    或者在用户的环境变量中做以上设置

    二、sqoop使用
    sqoop通过bin下的各种工具完成任务
    1.连接数据库
    参数:
    1. Argument Description
    2. --connect <jdbc-uri> Specify JDBC connect string
    3. --connection-manager <class-name> Specify connection manager class to use
    4. --driver <class-name> Manually specify JDBC driver class to use
    5. --hadoop-mapred-home <dir> Override $HADOOP_MAPRED_HOME
    6. --help Print usage instructions
    7. --password-file Set path for a file containing the authentication password
    8. -P Read password from console
    9. --password <password> Set authentication password
    10. --username <username> Set authentication username
    11. --verbose Print more information while working
    12. --connection-param-file <filename> Optional properties file that provides connection parameters
    13. --relaxed-isolation Set connection transaction isolation to read uncommitted for the mappers.
    $ sqoop import --connect jdbc:mysql://database.example.com/employees
    --connect参数中主机名不能用localhost代替,否则各个结点都查询自己机器上的数据库。
    安全是验证方式是把数据库的密码写入在/home/${user}下,并赋400权限。如下:
    $ sqoop import --connect jdbc:mysql://database.example.com/employees 
        --username venkatesh --password-file ${user.home}/.password

    2.导出数据到HDFS
    以下是参数:
    --appendAppend data to an existing dataset in HDFS
    --as-avrodatafileImports data to Avro Data Files
    --as-sequencefileImports data to SequenceFiles
    --as-textfileImports data as plain text (default)
    --as-parquetfileImports data to Parquet Files
    --boundary-query <statement>Boundary query to use for creating splits
    --columns <col,col,col…>Columns to import from table   --columns "name,employee_id,jobtitle"
    --delete-target-dirDelete the import target directory if it exists
    --directUse direct connector if exists for the database
    --fetch-size <n>Number of entries to read from database at once.
    --inline-lob-limit <n>Set the maximum size for an inline LOB
    -m,--num-mappers <n>Use n map tasks to import in parallel
    -e,--query <statement>Import the results of statement.  select min(<split-by>), max(<split-by>) from <table name> 
    --split-by <column-name>Column of the table used to split work units. Cannot be used with --autoreset-to-one-mapper option.
    以某个字段平衡负载
    --autoreset-to-one-mapperImport should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with--split-by <col> option.
    --table <table-name>Table to read
    --target-dir <dir>HDFS destination dir
    --warehouse-dir <dir>HDFS parent for table destination
    --where <where clause>WHERE clause to use during import
    -z,--compressEnable compression
    --compression-codec <c>Use Hadoop codec (default gzip)
    --null-string <null-string>The string to be written for a null value for string columns
    --null-non-string <null-string>The string to be written for a null value for non-string columns
    示例:
    1. bin/sqoop list-databases --connect jdbc:mysql://yangxw:3306/mysql --username root --password root
    2. bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table customers --target-dir /mysql_hadoop
    3. $ sqoop import 
        --query 'SELECT a.*, b.* FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' 
        --split-by a.id --target-dir /user/foo/joinresults
    其它参数:
    控制字段类型:
    $ sqoop import ... --map-column-java id=String,value=Integer
    增量导入:
    使用append 或者lastmodified 模式。http://blog.csdn.net/ryantotti/article/details/14226635


    大对象(BLOB CLOB):
    对于16M以下的LOB,存储在常规的地方,大于16M的对象,存储在_lobs 目录下,并且格式与常规数据不一样,每个存储对象可以存储2^63大小的字节。

    3.导入HIVE
    导入HIVE的步骤:
    dbms->hdfs->hive(load inpath)
    参数:
    1. --hive-home <dir> Override $HIVE_HOME
    2. --hive-import Import tables into Hive (Uses Hives default delimiters if none are set.)
    3. --hive-overwrite Overwrite existing data in the Hive table.
    4. --create-hive-table If set, then the job will fail if the target hive table exits. By default this property is false.
    5. --hive-table <table-name> Sets the table name to use when importing to Hive.
    6. --hive-drop-import-delims Drops , , and 1 from string fields when importing to Hive.
    7. --hive-delims-replacement Replace , , and 1 from string fields with user defined string when importing to Hive.
    8. --hive-partition-key Name of a hive field to partition are sharded on
    9. --hive-partition-value <v> String-value that serves as partition key for this imported into hive in this job.
    10. --map-column-hive <map> Override default mapping from SQL type to Hive type for configured columns.
    示例:
    bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table products --hive-import --create-hive-table 

    如果原表是压缩表,导入HIVE时可能无法分割任务(无法并行),但lzop编码可以分割以并行执行

    4.导入hbase
    相关参数:
    --column-family <family>Sets the target column family for the import
    --hbase-create-tableIf specified, create missing HBase tables
    --hbase-row-key <col>Specifies which input column to use as the row key
    In case, if input table contains composite(复合)
    key, then <col> must be in the form of a
    comma-separated(逗号分隔) list of composite key
    attributes
    --hbase-table <table-name>Specifies an HBase table to use as the target instead of HDFS
    --hbase-bulkloadEnables bulk loading 指导入
        sqoop使用hdfs的put功能把数据导入hdfs中。默认会使用split key做为rowkey,如果没有定义split key,则尝试用primary key.如果原表是组合键,--hbase-row-key要设置组合键。如果hbase中没有表或者列簇,则会报错,可以添加--hbase-create-table解决。如果不使用--hbase-create-table,则要设置--column-family,所有的输出列都放在一个--column-family 里。
        sqoop先导入hdfs中再以utf8导入hbase,忽略除row-key外的空值。为了减轻负载,可以使用批量导入bulk
    示例:
    1. bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table orders --target-dir /mysql_hadoop/orders4 --hbase-table orders --column-family orders --hbase-create-table
    1. 报以下错误:无法创建hbase的表:
    2. 16/03/24 18:30:23 INFO mapreduce.HBaseImportJob: Creating missing HBase table orders
    3. Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.hbase.HTableDescriptor.addFamily(Lorg/apache/hadoop/hbase/HColumnDescriptor;)V
    4. at org.apache.sqoop.mapreduce.HBaseImportJob.jobSetup(HBaseImportJob.java:222)
    5. at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:264)
    6. at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:673)
    7. at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
    8. at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:497)
    9. at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:605)
    10. at org.apache.sqoop.Sqoop.run(Sqoop.java:143)
    11. at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    12. at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:179)
    13. at org.apache.sqoop.Sqoop.runTool(Sqoop.java:218)
    可能是hadoop和hbase兼容性差的原因引起的:http://www.aboutyun.com/thread-12236-1-1.html
    那么先将hbase的表创建好:
    1. hbase(main):002:0> create 'orders','CF1'
    2. 0 row(s) in 1.6730 seconds
    3. => Hbase::Table - orders
    再执行:
    1. bin/sqoop import --connect jdbc:mysql://yangxw:3306/classicmodels --username root --password root --table orders --target-dir /mysql_hadoop/orders5 --hbase-table orders --column-family CF1
    然后执行成功了!


    5.从数据库导入到HADOOP中$CONDITIONS
        $CONDITIONS 前面要写个



















  • 相关阅读:
    [Operate System & Algorithm] 页面置换算法
    [Network] 计算机网络基础知识总结
    [Search Engine] 搜索引擎技术之网络爬虫
    [Search Engine] 搜索引擎分类和基础架构概述
    [Math] 常见的几种最优化方法
    磁盘告警之---神奇的魔法(Sparse file)
    Kubernetes概念之RC
    Kubernetes概念之mater、node
    Centos7允许使用密码登录
    安装gcc-c++报错解决办法
  • 原文地址:https://www.cnblogs.com/skyrim/p/7455942.html
Copyright © 2020-2023  润新知