• sqoop组件的安装与使用


    一、sqoop安装

    安装sqoop的前提是已经具备java和hadoop的环境

    1、下载并解压

    最新版下载地址http://ftp.wayne.edu/apache/sqoop/1.4.6/

    2、修改配置文件

    $ cd sqoop/conf(默认将sqoop解压到当前目录下)
    
    $ mv sqoop-env-template.sh sqoop-env.sh

    打开sqoop-env.sh并编辑下面几行:

    export HADOOP_COMMON_HOME=/home/hadoop/apps/hadoop-2.6.4/(可用which (hadoop)命令查看位置)
    
    export HADOOP_MAPRED_HOME=/home/hadoop/apps/hadoop-2.6.4/
    
    export HIVE_HOME=/home/hadoop/apps/hive

    3、加入mysql的jdbc驱动包

    cp  ~/apps/hive/lib/mysql-connector-java-5.1.28.jar   /sqoop/lib/(在之前的hive安装中已经导入过mysql驱动包)

    若无,则可以将驱动包上传至虚拟机,拷入至sqoop/lib中

    4、验证启动

    $ cd sqoop/bin

    $ sqoop-version

    预期的输出:

    15/12/17 14:52:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

    Sqoop 1.4.6 git commit id 5b34accaca7de251fc91161733f906af2eddbe83

    Compiled by abe on Fri Aug 1 11:19:26 PDT 2015

    到这里,整个Sqoop安装工作完成。

    5、使用

    cd sqoop/

    使用bin/sqoop help 可查看其中操作指令

    Available commands:
      codegen            Generate code to interact with database records
      create-hive-table  Import a table definition into Hive
      eval               Evaluate a SQL statement and display the results
      export             Export an HDFS directory to a database table
      help               List available commands
      import             Import a table from a database to HDFS
      import-all-tables  Import tables from a database to HDFS
      import-mainframe   Import datasets from a mainframe server to HDFS
      job                Work with saved jobs
      list-databases     List available databases on a server
      list-tables        List available tables in a database
      merge              Merge results of incremental imports
      metastore          Run a standalone Sqoop metastore
      version            Display version information

    举例:将mysql数据库中某表数据导入HDFS中,由于前边的hadoop配置,最终数据会导向HDFS中的/usr/hadoop/文件夹中:

    $bin/sqoop import   
    --connect jdbc:mysql://min1:3306/mysql      数据库链接信息
    --username root  
    --password 123456   
    --table db         选择导出哪张表数据
    --m 1 

    系统会运行mapreduce程序,在min1:8088以及min1:50070中可看到运行过程,如果成功执行,那么会得到下面的输出。

    9/03/18 16:53:53 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1552575186473_0030
    19/03/18 16:53:54 INFO impl.YarnClientImpl: Submitted application application_1552575186473_0030
    19/03/18 16:53:54 INFO mapreduce.Job: The url to track the job: http://min1:8088/proxy/application_1552575186473_0030/
    19/03/18 16:53:54 INFO mapreduce.Job: Running job: job_1552575186473_0030
    19/03/18 16:54:11 INFO mapreduce.Job: Job job_1552575186473_0030 running in uber mode : false
    19/03/18 16:54:11 INFO mapreduce.Job:  map 0% reduce 0%
    19/03/18 16:54:32 INFO mapreduce.Job:  map 100% reduce 0%
    19/03/18 16:54:33 INFO mapreduce.Job: Job job_1552575186473_0030 completed successfully
    19/03/18 16:54:33 INFO mapreduce.Job: Counters: 30
            File System Counters
                    FILE: Number of bytes read=0
                    FILE: Number of bytes written=124571
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=87
                    HDFS: Number of bytes written=95
                    HDFS: Number of read operations=4
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=1
                    Other local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=16705
                    Total time spent by all reduces in occupied slots (ms)=0
                    Total time spent by all map tasks (ms)=16705
                    Total vcore-milliseconds taken by all map tasks=16705
                    Total megabyte-milliseconds taken by all map tasks=17105920
            Map-Reduce Framework
                    Map input records=2
                    Map output records=2
                    Input split bytes=87
                    Spilled Records=0
                    Failed Shuffles=0
                    Merged Map outputs=0
                    GC time elapsed (ms)=110
                    CPU time spent (ms)=2980
                    Physical memory (bytes) snapshot=103735296
                    Virtual memory (bytes) snapshot=2064986112
                    Total committed heap usage (bytes)=30474240
            File Input Format Counters 
                    Bytes Read=0
            File Output Format Counters 
                    Bytes Written=95
    19/03/18 16:54:33 INFO mapreduce.ImportJobBase: Transferred 95 bytes in 52.9843 seconds (1.793 bytes/sec)
    19/03/18 16:54:33 INFO mapreduce.ImportJobBase: Retrieved 2 records.
  • 相关阅读:
    Qt为啥从4.8直接就跳到5.3了呢?这不科学吧
    一段程序的人生 第10章: server
    项目记录23--unity-tolua框架MediatorManager
    Raft 为什么是更易理解的分布式一致性算法
    Caused by: java.lang.UnsatisfiedLinkError: Couldn't load BaiduMapVOS_v2_1_3: findLibrary returned nu
    Apache OFBIZ高速上手(二)--MVC框架
    Eclipse 导入逆向工程
    mysql 报错从 新安装
    maven项目创建4
    maven报错
  • 原文地址:https://www.cnblogs.com/fjlcoding/p/sqoop.html
Copyright © 2020-2023  润新知