• Hadoop生态组件Hive,Sqoop安装及Sqoop从HDFS/hive抽取数据到关系型数据库Mysql


    一般Hive依赖关系型数据库Mysql,故先安装Mysql

    $:  yum install mysql-server mysql-client [yum安装]
    $:  /etc/init.d/mysqld start  [启动mysql服务]
    $:  mysql  [登录mysql客户端]
    mysql> create database hive;
    

    安装配置Hive

    $:  tar zvxf apache-hive-2.1.1-bin.tar
    
    1. 配置环境变量
    export HIVE_HOME=/usr/local/apache-hive-2.1.1-bin
    export PATH=$PATH:${HIVE_HOME}/bin
    
    2. 配置Hive的基本信息
    $: cd /home/hadoop/apache-hive-2.1.1-bin/conf
    $: cp hive-default.xml.template hive-site.xml                     #默认配置
    $: cp hive-env.sh.template hive-env.sh                           #环境配置文件
    $: cp hive-exec-log4j.properties.template hive-exec-log4j.properties   #exec默认配置
    $: cp hive-log4j.properties.template hive-log4j.properties           #log默认配置
    
    3. 编辑hive-env.sh,为了方便,直接在最后加上以下信息:
    export JAVA_HOME=/home/hadoop/jdk1.8.0_144
    export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
    export HIVE_HOME=/home/hadoop/apache-hive-2.1.1-bin
    export HIVE_CONF_DIR=/home/hadoop/apache-hive-2.1.1-bin/conf
    

    hive-site.xml配置,这个文件较大,只配置name和以下对应的即可,其他信息可以不用管

    <property>
          <name>javax.jdo.option.ConnectionURL</name>
          <value>jdbc:mysql://127.0.0.1:3306/metastore?createDatabaseIfNotExist=true</value>
          <description>the URL of the MySQL database</description>
        </property>
    
    <property>
      <name>javax.jdo.option.ConnectionDriverName</name>
      <value>com.mysql.jdbc.Driver</value>
      <description>Driver class name for a JDBC metastore</description>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionUserName</name>
      <value>hive</value>
    </property>
    <property>
      <name>javax.jdo.option.ConnectionPassword</name>
      <value>hive</value>
    </property>
    

    这里主要配置与mysql的驱动连接等,类似java的JDBC,一定要保证配置正确

    4. Hive是将HDFS作为数据的文件系统,所以需要创建一些存储目录并赋权限
    hadoop fs -mkdir /home/hive/log
    hadoop fs -mkdir /home/hive/warehouse
    hadoop fs -mkdir /home/hive/tmp
    hadoop fs -chmod g+w /home/hive/log
    hadoop fs -chmod g+w /home/hive/warehouse
    hadoop fs -chmod g+w /home/hive/tmp
    
    5. 将JDBC 驱动 mysql-connect-java-xxx.jar 复制至$HIVE_HOME/lib目录下
    6. 初始化数据库
    schematool -initSchema -dbType mysql
    

    在这一步通常会报错,

    [root@slave1 bin]# schematool -initSchema -dbType mysql
    which: no hbase in (/home/hadoop/sqoop-1.4.6/bin:/home/hadoop/apache-hive-2.1.1-bin/bin:/home/hadoop/hadoop-2.7.3/bin:/home/hadoop/jdk1.8.0_144/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:	 jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
    Metastore Connection Driver :	 com.mysql.jdbc.Driver
    Metastore connection User:	 hive
    Starting metastore schema initialization to 2.1.0
    Initialization script hive-schema-2.1.0.mysql.sql
    Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
    org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
    Underlying cause: java.io.IOException : Schema script failed, errorcode 2
    Use --verbose for detailed stacktrace.
    *** schemaTool failed ***
    

    总结了一下,只要按照以上步骤配置,一般报错都是在hive-site.xml配置时所配置的mysql信息和实际的不对应,所以会报错。另外如果配置好了,启动了hive的服务和客户端做了很多操作然后再关闭,再次启动初始化数据库时也会报错,这个时候最好的方法是删除掉mysql的与hive对应的数据库实例,然后新建一个相同的即可。
    如果出现以下信息则是数据库的问题

    java.sql.SQLException: Access denied for user 'root'@'****' (using password: YES)
    

    这是因为mysql在验证用户登陆的时候,首先是验证host列,如果host列在验证user列,再password列,而现在按照我之前的连接语句:按照host列找到为空的那列(空匹配所有用户名),所以匹配到了这条记录,然后发现这条记录的密码为空,而我的语句里面有密码,那么就会报错。
    解决方案:

    mysql> use mysql;
    Database changed
    mysql> delete from user where user='';
    Query OK, 1 row affected (0.00 sec)
    mysql> flush privileges;
    Query OK, 0 rows affected (0.00 sec)
    

    出现以下信息则安装配置成功

    [root@slave1 bin]# schematool -initSchema -dbType mysql
    which: no hbase in (/home/hadoop/sqoop-1.4.6/bin:/home/hadoop/apache-hive-2.1.1-bin/bin:/home/hadoop/hadoop-2.7.3/bin:/home/hadoop/jdk1.8.0_144/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:	 jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true&characterEncoding=UTF-8&useSSL=false
    Metastore Connection Driver :	 com.mysql.jdbc.Driver
    Metastore connection User:	 hive
    Starting metastore schema initialization to 2.1.0
    Initialization script hive-schema-2.1.0.mysql.sql
    Initialization script completed
    schemaTool completed
    
    7. 启动hive服务和客户端并创建数据库和一张表
    $: hiveserver2
    $: hive
    hive> create database dock;
    hive> use dock;
    hive> create table if not exists dock.dock_tb(
        > id varchar(64) COMMENT 'dock id',
        > md5code varchar(64) COMMENT 'dock md5 code',
        > number varchar(64) COMMENT 'dock number',
        > ip varchar(64) COMMENT 'dock ip',
        > game varchar(64) COMMENT 'dock game',
        > time varchar(64) COMMENT 'dock time',
        > day varchar(64) COMMENT 'dock day',
        > year varchar(64) COMMENT 'dock year',
        > month varchar(64) COMMENT 'dock month',
        > type varchar(64) COMMENT 'dock type')
        > COMMENT 'Description of the table'
        > LOCATION '/data/wscn/dock_test_log/20171101/EtlResult/dockClick';
    

    可以看到dock_tb表以HDFS上/data/wscn/dock_test_log/20171101/EtlResult/dockClick下的文件作为数据源和存储路径。

    安装配置sqoop

    1. 解压并配置环境变量
    $: tar –zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
    $: mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha sqoop-1.4.6
    $: export SQOOP_HOME=/home/hadoop/sqoop-1.4.6
    $: export PATH=$PATH:${ SQOOP_HOME }/bin
    
    2. 配置基本信息
    $:cd /home/hadoop/sqoop-1.4.6/conf
    

    如下配置,默认是被注释的

    #Set path to where bin/hadoop is available
    export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.3
    
    #Set path to where hadoop-*-core.jar is available
    export HADOOP_MAPRED_HOME=/home/hadoop/hadoop-2.7.3
    
    #set the path to where bin/hbase is available
    #export HBASE_HOME=
    
    #Set the path to where bin/hive is available
    export HIVE_HOME=/home/hadoop/apache-hive-2.1.1-bin
    
    3. 配置完成后测试
    $: sqoop help
    

    如果出现以下信息证明安装配置成功

    Warning: /home/hadoop/sqoop-1.4.6/bin/../../hbase does not exist! HBase imports will fail.
    Please set $HBASE_HOME to the root of your HBase installation.
    Warning: /home/hadoop/sqoop-1.4.6/bin/../../hcatalog does not exist! HCatalog jobs will fail.
    Please set $HCAT_HOME to the root of your HCatalog installation.
    Warning: /home/hadoop/sqoop-1.4.6/bin/../../accumulo does not exist! Accumulo imports will fail.
    Please set $ACCUMULO_HOME to the root of your Accumulo installation.
    Warning: /home/hadoop/sqoop-1.4.6/bin/../../zookeeper does not exist! Accumulo imports will fail.
    Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
    17/11/11 02:26:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
    usage: sqoop COMMAND [ARGS]
    
    Available commands:
      codegen            Generate code to interact with database records
      create-hive-table  Import a table definition into Hive
      eval               Evaluate a SQL statement and display the results
      export             Export an HDFS directory to a database table
      help               List available commands
      import             Import a table from a database to HDFS
      import-all-tables  Import tables from a database to HDFS
      import-mainframe   Import datasets from a mainframe server to HDFS
      job                Work with saved jobs
      list-databases     List available databases on a server
      list-tables        List available tables in a database
      merge              Merge results of incremental imports
      metastore          Run a standalone Sqoop metastore
      version            Display version information
    
    See 'sqoop help COMMAND' for information on a specific command.
    

    利用sqoop将HDFS上格式化后的数据导入到mysql,前提是mysql事先有对应的表

    sqoop export --connect jdbc:mysql://127.0.0.1:3306/hive --username hive --password hive --table dock_tb1 --export-dir hdfs://127.0.0.1:9000/data/wscn/dock_test_log/20171101/EtlResult/dockClick --input-fields-terminated-by '01'
    

    利用sqoop将mysql的数据导入到hdfs

    $: sqoop import --connect jdbc:mysql://127.0.0.1:3306/hive --username hive --password hive --table dock_tb --target-dir /data/wscn/dock_test_log/20171101/EtlResult/dockClick1 -m 1
    

    具体的sqoop命令参照http://blog.csdn.net/whiteForever/article/details/52609422

  • 相关阅读:
    Could not resolve com.android.support:appcompat-v7:28.0.0 错误处理
    解决 Could not resolve com.android.tools.build:gradle:3.1.3
    https://maven.google.com 连接不上的解决办法(转)
    jquery操作select(取值,设置选中)
    django 使用 request 获取浏览器发送的参数
    jquery下载,实时更新jquery1.2到最新3.3.1所有版本下载
    myeclipse 8.5反编译插件失效
    再探java基础——对面向对象的理解(2)
    庖丁解牛FPPopover
    去大连
  • 原文地址:https://www.cnblogs.com/jiashengmei/p/7815984.html
Copyright © 2020-2023  润新知