• [Hadoop大数据]——Hive部署入门教程


    Hive是为了解决hadoop中mapreduce编写困难,提供给熟悉sql的人使用的。只要你对SQL有一定的了解,就能通过Hive写出mapreduce的程序,而不需要去学习hadoop中的api。

    在部署前需要确认安装jdk以及Hadoop

    如果需要安装jdk以及hadoop可以参考我之前的博客:

    Linux下安装jdk
    Linux下安装hadoop伪分布式

    在安装之前,先了解下Hive都有哪些东西。

    下载并解压缩

    去主页选择镜像地址:

    http://www.apache.org/dyn/closer.cgi/hive/

    在镜像地址中选择下载的版本,我这里使用的是最新的2.1.0

    https://mirrors.tuna.tsinghua.edu.cn/apache/hive/hive-2.1.0/

    下载后,解压缩

    # 解压缩
    tar -zxvf apache-hive-2.1.0-bin.tar.gz
    
    # 拷贝到指定目录
    mv apache-hive-2.1.0-bin /usr
    

    修改环境变量

    # 编辑/etc/profile
    vi /etc/profile
    
    # set hive environment
    export HIVE_HOME=/usr/apache-hive-2.1.0-bin
    export PATH=$PATH:$HIVE_HOME/bin
    
    # 配置文件生效
    source /etc/profile
    

    创建并修改配置文件

    # 进入conf目录
    cd $HIVE_HOME/conf
    
    # 拷贝hive-site.xml
    cp hive-default.xml.template hive-site.xml
    
    # 修改其中的参数
    修改所有的system.io.tmpdir为指定的目录(自己定,我放在/usr/hive/tmp下了)
    修改所有的system.user.name为自己的名字(我直接修改成xingoo了)
    
    

    初始化schema

    由于Hive中所有的原信息都需要存储到关系型数据库里面,因此需要初始化数据库表。我这里直接使用的是默认的derby数据库,关于数据的配置就不用修改了,直接使用默认的就好了。

    # 进入指定的目录
    cd $HIVE_HOME/bin
    
    # 初始化,如果是mysql则derby可以直接替换成mysql
    ./schematool -initSchema -dbType derby
    

    注意这里有可能报错:

    [root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
    which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:	 jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connection Driver :	 org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:	 APP
    Starting metastore schema initialization to 2.1.0
    Initialization script hive-schema-2.1.0.derby.sql
    Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
    org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
    Underlying cause: java.io.IOException : Schema script failed, errorcode 2
    Use --verbose for detailed stacktrace.
    *** schemaTool failed ***
    

    这时需要修改derby初始化脚本

    # 进入指定的目录
    cd $HIVE_HOME/scripts/metastore/upgrade/derby
    
    # 修改错误堆栈中的脚本hive-schema-2.1.0.derby.sql,注释上面两句
    -- ----------------------------------------------
    -- DDL Statements for functions
    -- ----------------------------------------------
    
    -- CREATE FUNCTION "APP"."NUCLEUS_ASCII" (C CHAR(1)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.ascii' ;
    
    -- CREATE FUNCTION "APP"."NUCLEUS_MATCHES" (TEXT VARCHAR(8000),PATTERN VARCHAR(8000)) RETURNS INTEGER LANGUAGE JAVA PARAMETER STYLE JAVA READS SQL DATA CALLED ON NULL INPUT EXTERNAL NAME 'org.datanucleus.store.rdbms.adapter.DerbySQLFunction.matches' ;
    
    

    再次执行,就可以了

    [root@localhost bin]# vim ../scripts/metastore/upgrade/derby/hive-schema-2.1.0.derby.sql 
    [root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
    which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:	 jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connection Driver :	 org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:	 APP
    Starting metastore schema initialization to 2.1.0
    Initialization script hive-schema-2.1.0.derby.sql
    Initialization script completed
    schemaTool completed
    
    

    启动hive cli测试

    [root@localhost bin]# hive
    which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> show databases;
    OK
    default
    Time taken: 1.931 seconds, Fetched: 1 row(s)
    
    

    至此,Hive就算安装完啦!

    踩过的坑

    没有初始化数据库,定义Schema

    [root@localhost bin]# hive
    which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
    Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:578)
    	at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:483)
    	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:226)
    	at org.apache.hadoop.hive.ql.metadata.Hive.<init>(Hive.java:366)
    	at org.apache.hadoop.hive.ql.metadata.Hive.create(Hive.java:310)
    	at org.apache.hadoop.hive.ql.metadata.Hive.getInternal(Hive.java:290)
    	at org.apache.hadoop.hive.ql.metadata.Hive.get(Hive.java:266)
    	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:545)
    	... 9 more
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3593)
    	at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:236)
    	at org.apache.hadoop.hive.ql.metadata.Hive.registerAllFunctionsOnce(Hive.java:221)
    	... 14 more
    Caused by: MetaException(message:Hive metastore database is not initialized. Please use schematool (e.g. ./schematool -initSchema -dbType ...) to create the schema. If needed, don't forget to include the option to auto-create the underlying database in your JDBC connection string (e.g. ?createDatabaseIfNotExist=true for mysql))
    	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3364)
    	at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3336)
    	at org.apache.hadoop.hive.ql.metadata.Hive.getAllFunctions(Hive.java:3590)
    	... 16 more
    
    

    解决办法:

    执行命令$HIVE_HOME/bin/schematool -initSchema -dbType derby

    derby数据库脚本定义有问题

    [root@localhost bin]# ./schematool -initSchema -dbType derby createDatabaseIfNotExist=true
    which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Metastore connection URL:	 jdbc:derby:;databaseName=metastore_db;create=true
    Metastore Connection Driver :	 org.apache.derby.jdbc.EmbeddedDriver
    Metastore connection User:	 APP
    Starting metastore schema initialization to 2.1.0
    Initialization script hive-schema-2.1.0.derby.sql
    Error: FUNCTION 'NUCLEUS_ASCII' already exists. (state=X0Y68,code=30000)
    org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
    Underlying cause: java.io.IOException : Schema script failed, errorcode 2
    Use --verbose for detailed stacktrace.
    *** schemaTool failed ***
    

    解决办法:

    注释掉sql脚本中的两句定义..

    配置文件中${system.io.tmpdir}不识别

    [root@localhost bin]# hive
    which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
    Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    	at org.apache.hadoop.fs.Path.initialize(Path.java:206)
    	at org.apache.hadoop.fs.Path.<init>(Path.java:172)
    	at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:631)
    	at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:550)
    	at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:518)
    	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:705)
    	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:483)
    	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: java.net.URISyntaxException: Relative path in absolute URI: ${system:java.io.tmpdir%7D/$%7Bsystem:user.name%7D
    	at java.net.URI.checkPath(URI.java:1823)
    	at java.net.URI.<init>(URI.java:745)
    	at org.apache.hadoop.fs.Path.initialize(Path.java:203)
    	... 12 more
    
    

    解决办法:

    替换hive-site.xml中的${system.io.tmpdir}属性

    hive-site.xml中${system.user.name}不识别

    [root@localhost bin]# hive
    which: no hbase in (/usr/lib64/qt-3.3/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/java/jdk1.8.0/bin:/root/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/apache-hive-2.1.0-bin/bin:/usr/java/jdk1.8.0/bin:/usr/hadoop/hadoop-2.6.4/bin:/usr/apache-hive-2.1.0-bin/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/apache-hive-2.1.0-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/hadoop/hadoop-2.6.4/share/hadoop/common/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    
    Logging initialized using configuration in jar:file:/usr/apache-hive-2.1.0-bin/lib/hive-common-2.1.0.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> show databases;
    OK
    Failed with exception java.io.IOException:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: ${system:user.name%7D
    Time taken: 1.781 seconds
    
    

    解决办法:

    替换hive-site.xml中的${system.user.name}属性

    参考

    1 FUNCTION 'NUCLEUS_ASCII' already exists.
    2 如何在hive中定义schema
    3 Hive部署注意事项
    4 Hive配置入门

  • 相关阅读:
    fmri降噪,利用spatial+temporal信息
    matlab中,计算,记录,程序运行,起始,结束 时间,间隔 &matlab中 tic,toc函数的用法
    第十五章 动态规划——矩阵链乘法
    第十五章 动态规划——钢条切割
    第十四章 数据结构的扩张
    第十四章 红黑树——C++代码实现
    第十三章 红黑树
    第十二章 二叉搜索树
    第十一章 散列表
    第十章 基本数据结构——二叉树
  • 原文地址:https://www.cnblogs.com/xing901022/p/5775954.html
Copyright © 2020-2023  润新知