• [sqoop1.99.7] sqoop入门-下载、安装、运行和常用命令


    一、简介

    Apache Sqoop is a tool designed for efficiently transferring data betweeen structured, semi-structured and unstructured data sources. Relational databases are examples of structured data sources with well defined schema for the data they store. Cassandra, Hbase are examples of semi-structured data sources and HDFS is an example of unstructured data source that Sqoop can support.
    
    Apache Sqoop 是设计来用于在结构化、半结构化和非结构化数据源之间有效转换数据的工具之一。
    关系型数据库存储了良好定义的结构化的模型数据。
    Cassandra, Hbase 存储的是半结构化的数据。
    HDFS 存储的是非结构化的数据。
    这些都是Sqoop支持数据转换的数据库。

    官网:

    http://sqoop.apache.org/

    版本:

    Sqoop版本分Sqoop1和Sqoop2,其中Sqoop1目前最高释出版本为1.4.6,Sqoop2最高释出版本为1.99.7,Sqoop1与Sqoop2相互间不兼容,而且Sqoop2目的并不是作为产品,主要是致力于开发。再者,其对Hadoop的支持版本有些特别要求,比如Hadoop1和Hadoop0.x还有Hadoop2.x的兼容性等。在下载时一般要注意其兼容的Hadoop版本(Sqoop官网上我没有看到相关具体的描述,只是通过下载的文件名辨别与Hadoop的兼容性)。
    
    Sqoop进行数据转移时必须依赖于Hadoop的MapReduce作业,所以Hadoop必须在环境中存在,且能被Sqoop访问。
    下载时直接选择已编译好的bin版本,直接用。也可以下源代码到本地编译安装,确保有Java环境,因为Sqoop用Java编写的。
    1、sqoop1 稳定版本 sqoop 1.4.6 http://sqoop.apache.org/docs/1.4.6/index.html http://mirror.bit.edu.cn/apache/sqoop/1.4.6/ 下载文件名: sqoop-1.4.6.bin__hadoop-0.23.tar.gz sqoop-1.4.6.bin__hadoop-1.0.0.tar.gz sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz 源码:sqoop-1.4.6.tar.gz 2、sqoop2 最新版本 sqoop 1.99.7 http://sqoop.apache.org/docs/1.99.7/index.html http://mirror.bit.edu.cn/apache/sqoop/1.99.7/ 下载文件名: sqoop-1.99.7-bin-hadoop200.tar.gz 源码:sqoop-1.99.7.tar.gz

    二、安装配置

    下载版本:

    sqoop-1.99.7-bin-hadoop200.tar.gz

    安装:直接解压放在任意目录即可。

    tar -zxvf sqoop-1.99.7-bin-hadoop200.tar.gz
    
    mv sqoop-1.99.7-bin-hadoop200 sqoop1.99.7

    sqoop目录

    bin:可执行脚本,一般使用sqoop都是通过这个目录中的工具调用,是一些shell或batch脚本。
    
    conf:存放配置文件、目前仅有两个配置文件:sqoop_bootstrap.properties 和 sqoop.properties
    
    docs:目前不清楚具体是什么,可能是帮助文档,不过一般使用sqoop不会用到。
    
    server:里面只有一个lib目录,存了很多jar文件,是sqoop2 的server包。
    
    shell:里面只有一个lib目录,存了很多jar文件,sqoop2的shell包。
    
    tools:里面只有一个lib目录,存了很多jar文件,sqoop2的工具包。

    配置

    (1)安装Java JDK

    版本

    [root@hadoop-allinone-200-123 hadoop-2.7.3]# java -version
    java version "1.7.0_67"

    JAVA_HOME

    [root@hadoop-allinone conf]# echo $JAVA_HOME
    /wdcloud/app/jdk1u7

    (2)Hadoop环境

    版本
    [root@hadoop-allinone-200-123 bin]# ./hadoop version
    Hadoop 2.7.3
    
    HADOOP_HOME
    [root@hadoop-allinone-200-123 hadoop-2.7.3]# pwd
    /wdcloud/app/hadoop-2.7.3

    (3)配置环境变量

    添加一个系统环境变量,HADOOP_HOME,本例中设置为/home/hadoop/hadoop2.6。

    无论是/etc/profile还是在/etc/profile.d中创建一个脚本导入变量,亦或是在~/.bashrc文件中写,都可以:

    在/etc/profile(全局环境变量)中加入hadoop环境变量
    export HADOOP_HOME=/wdcloud/app/hadoop-2.7.3
    
    [root@hadoop-allinone-200-123 hadoop-2.7.3]# source /etc/profile
    
    [root@hadoop-allinone-200-123 hadoop-2.7.3]# echo $HADOOP_HOME
    /wdcloud/app/hadoop-2.7.3
    注意:配置这个变量主要是让Sqoop能找到以下目录的jar文件和Hadoop配置文件:
    $HADOOP_HOME/share/hadoop/common
    $HADOOP_HOME/share/hadoop/hdfs
    $HADOOP_HOME/share/hadoop/mapreduce
    $HADOOP_HOME/share/hadoop/yarn
    
    官网上说名了可以单独对各个组建进行配置,使用以下变量:
    
    $HADOOP_COMMON_HOME = /wdcloud/app/hadoop-2.7.3/share/hadoop/common
    $HADOOP_HDFS_HOME = /wdcloud/app/hadoop-2.7.3/share/hadoop/hdfs
    $HADOOP_MAPRED_HOME = /wdcloud/app/hadoop-2.7.3/share/hadoop/mapreduce
    $HADOOP_YARN_HOME = /wdcloud/app/hadoop-2.7.3/share/hadoop/yarn
    
    若$HADOOP_HOME已经配置了,最好不要再配置下面的变量,可能会有些莫名错误。

    配置sqoop根目录和第三方jar引用路径

    [root@hadoop-allinone-200-123 hadoop-2.7.3]# vim /etc/profile
    
    export SQOOP_HOME=/wdcloud/app/sqoop1.99.7
    export SQOOP_SERVER_EXTRA_LIB=/wdcloud/app/sqoop1.99.7/extra 

      [root@hadoop-allinone-200-123 sqoop-1.99.7]# mkdir extra

    把mysql的驱动jar文件复制到这个目录下。

    (4)配置Hadoop代理访问

    因为sqoop访问Hadoop的MapReduce使用的是代理的方式,必须在Hadoop中配置所接受的proxy用户和组。
    找到Hadoop的core-site.xml配置文件(本例是$HADOOP_HOME/etc/hadoop/core-site.xml):

    <property>
      <name>hadoop.proxyuser.$SERVER_USER.hosts</name>
      <value>*</value>
    </property>
    <property>
      <name>hadoop.proxyuser.$SERVER_USER.groups</name>
      <value>*</value>
    </property>
    $SERVER_USER是运行Sqoop2 Server的系统用户,本例我使用了hadoop用户运行server,所以将之代替为hadoop。
    注意:保证你的用户id大于1000(可用id命令查看),否则作为系统变量运行时,可能需要其他配置,参照官网。

    (5)sqoop核心配置文件

    sqoop_bootstrap.properties

    配置config支持类,这里一般使用默认值即可:
    
    sqoop.config.provider=org.apache.sqoop.core.PropertiesConfigurationProvider  

    sqoop.properties

    org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/wdcloud/app/hadoop-2.7.3/etc/hadoop  
      
    org.apache.sqoop.security.authentication.type=SIMPLE  
    org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler  
    org.apache.sqoop.security.authentication.anonymous=true  

    注意:官方文档上只说了配置上面第一项,mapreduce的配置文件路径,但后来运行出现authentication异常,找到sqoop文档描述security部分,发现sqoop2支持hadoop的simple和kerberos两种验证机制。所以配置了一个simple验证,这个异常才消除。

    三、运行

     验证配置是否有效

    bin/sqoop2-tool verify
    [root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-tool verify  
    Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
    Sqoop home directory: /wdcloud/app/sqoop-1.99.7
    Sqoop tool executor:
        Version: 1.99.7
        Revision: 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb
        Compiled on Tue Jul 19 16:08:27 PDT 2016 by abefine
    Running tool: class org.apache.sqoop.tools.tool.VerifyTool
    0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
    20   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
    Verification was successful.
    Tool class org.apache.sqoop.tools.tool.VerifyTool has finished correctly.

    开启服务器

    bin/sqoop2-server start  
    [root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-server start  
    Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
    Sqoop home directory: /wdcloud/app/sqoop-1.99.7
    Starting the Sqoop2 server...
    0    [main] INFO  org.apache.sqoop.core.SqoopServer  - Initializing Sqoop server.
    22   [main] INFO  org.apache.sqoop.core.PropertiesConfigurationProvider  - Starting config file poller thread
    Sqoop2 server started.
    #开启服务器后生成了两个目录(在那个目录下运行就在哪个目录下生成)
    
    [root@hadoop-allinone-200-123 sqoop-1.99.7]# ll | grep @
    drwxr-xr-x 3 root root 23 Dec 18 22:19 @BASEDIR@
    drwxr-xr-x 2 root root 58 Dec 18 22:23 @LOGDIR@
    
    
    #查看sqoop运行日志:
    
    [root@hadoop-allinone-200-123 sqoop-1.99.7]# ll @LOGDIR@/
    total 136
    -rw-r--r-- 1 root root   165 Dec 18 22:22 audit.log
    -rw-r--r-- 1 root root   670 Dec 18 22:21 derbyrepo.log
    -rw-r--r-- 1 root root 78957 Dec 18 22:22 sqoop.log

    关闭服务器

    bin/sqoop2-server stop
    [root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-server stop
    Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
    Sqoop home directory: /wdcloud/app/sqoop-1.99.7
    Stopping the Sqoop2 server...
    Sqoop2 server stopped.

     

    开启客户端

    bin/sqoop2-shell
    [root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-shell  
    Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
    Sqoop home directory: /wdcloud/app/sqoop-1.99.7
    Sqoop Shell: Type 'help' or 'h' for help.
    
    sqoop:000> 

    若成功会开启sqoop的shell命令行提示符:sqoop:000>

    至此,sqoop1.99.7的配置和启动已经完成。

    四、sqoop客户端常用命令

    使用sqoop前请确保hadoop服务和Sqoop2服务器均已启动。其中Hadoop不仅要启动hdfs(NameNode、DataNode),还要启动yarn(NodeManager、ResourceManager),当然,一般还会有一个SecondaryNameNode,用于原始NameNode的备援进程。

    [root@hadoop-allinone-200-123 sqoop-1.99.7]# jps
    4352 ResourceManager
    4195 SecondaryNameNode
    2835 QuorumPeerMain
    21167 HMaster
    4451 NodeManager
    2986 QuorumPeerMain
    2803 QuorumPeerMain
    4030 DataNode
    21256 HRegionServer
    3905 NameNode
    5024 SqoopJettyServer
    5186 Jps

    sqoop2客户端提供各种命令行交互接口,供用户使用。sqoop2客户端先连接Sqoop Server,将参数传递过去,再调用mapreduce进行数据导入到出作业。

    配置sqoop server参数

    [root@hadoop-allinone-200-123 sqoop-1.99.7]# bin/sqoop2-shell 
    Setting conf dir: /wdcloud/app/sqoop-1.99.7/bin/../conf
    Sqoop home directory: /wdcloud/app/sqoop-1.99.7
    Sqoop Shell: Type 'help' or 'h' for help.
    
    sqoop:000>set server --host 192.168.200.123 --port 12000 --webapp sqoop
    Server is set successfully

    注意:当设置host port 和 webapp 时,--url可以忽略
    如果使用--url,用法如下:
    set server --url http://sqoop2.company.net:80/sqoop

    port是默认值,最后一个--webapp官方文档说是指定的sqoop jetty服务器名称。

    配置完毕后验证服务器是否正确连接:

    sqoop:000> show version --all 
    client version:
      Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
      Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
    0    [main] WARN  org.apache.hadoop.util.NativeCodeLoader  - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    server version:
      Sqoop 1.99.7 source revision 435d5e61b922a32d7bce567fe5fb1a9c0d9b1bbb 
      Compiled by abefine on Tue Jul 19 16:08:27 PDT 2016
    API versions:
      [v1]

    若server版本信息能正确显示,则没问题!能正确链接上。

    查看帮助

    Available commands:
      :exit    (:x  ) Exit the shell
      :history (:H  ) Display, manage and recall edit-line history
      help     (h  ) Display this help message
      set      (st ) Configure various client options and settings
      show     (sh ) Display various objects and configuration options
      create   (cr ) Create new object in Sqoop repository
      delete   (d  ) Delete existing object in Sqoop repository
      update   (up ) Update objects in Sqoop repository
      clone    (cl ) Create new object based on existing one
      start    (sta) Start job
      stop     (stp) Stop job
      status   (stu) Display status of a job
      enable   (en ) Enable object in Sqoop repository
      disable  (di ) Disable object in Sqoop repository
      grant    (g  ) Grant access to roles and assign privileges
      revoke   (
      ) Revoke access from roles and remove privileges
    
    For help on a specific command type: help command
    查看命令帮助:

    sqoop:000> st Usage: set [server|option|truststore] sqoop:000> sh Usage: show [server|version|connector|driver|link|job|submission|option|role|principal|privilege] sqoop:000> cr Usage: create [link|job|role] sqoop:000> d Usage: delete [link|job|role] sqoop:000> up Usage: update [link|job] sqoop:000> cl Usage: clone [link|job] sqoop:000> sta Usage: start [job] sqoop:000> stp Usage: stop [job] sqoop:000> stu Usage: status [job] sqoop:000> en Usage: enable [link|job] sqoop:000> di Usage: disable [link|job] sqoop:000> g Usage: grant [role|privilege] sqoop:000> Usage: revoke [role|privilege]

    例如:如果需要退出命令行交互工具,请输入[:x]命令

    sqoop:000> :x
    [root@hadoop-allinone-200-123 sqoop-1.99.7]# 
  • 相关阅读:
    Spring5基础
    前端离职工作项目交接清单
    【Elastic2】SpringBoot整合ELK、SpringBoot写ES
    【Elastic1】ELK基本概念、环境搭建、快速开始文档
    文件挂载(四) windows挂载linux文件夹
    mysql锁排查
    文件挂载(二) windows挂载windows文件夹
    Netty源码死磕(ChannelPipeline的执行过程)
    使用 NetCoreBeauty 优化 .NET CORE 独立部署目录结构
    WPF学习笔记(四):AvalonEdit 代码高亮编辑控件专题
  • 原文地址:https://www.cnblogs.com/avivaye/p/6196485.html
Copyright © 2020-2023  润新知