• 【Hive学习之一】Hive简介


    环境
      虚拟机:VMware 10
      Linux版本:CentOS-6.5-x86_64
      客户端:Xshell4
      FTP:Xftp4
      jdk8
      hadoop-3.1.1
      apache-hive-3.1.1

    一、简介

    Hive是基于Hadoop的一个数据仓库工具,可以将结构化的数据文件映射成一张表,并提供类sql语句的查询功能;
    Hive使用Hql作为查询接口,使用HDFS存储,使用mapreduce计算;Hive的本质是将Hql转化为mapreduce;让非java编程者对hdfs的数据做mapreduce操作

    Hive: 数据仓库。与关系型数据库区别:
    ①数据库可以用在Online的应用中,Hive主要进行离线的大数据分析;

    ②数据库的查询语句为SQL,Hive的查询语句为HQL;

    ③数据库数据存储在LocalFS,Hive的数据存储在HDFS;

    ④Hive执行MapReduce,MySQL执行Executor;

    ⑤Hive没有索引;

    ⑥Hive延迟性高;

    ⑦Hive可扩展性高;

    ⑧Hive数据规模大;

    二、架构
    Hive的架构


    (1)用户接口主要有三个:Hive命令行模式(CLI),最常用模式;Hive Client(如JavaApi方式),对外提供RPC服务 ; Hive Web Interface(HWI):浏览器方式。 
    (2)Hive运行时将元数据存储在数据库中,如mysql、derby。Hive中的元数据包括表的名字,表的列和分区及其属性,表的属性(是否为外部表等),表的数据所在目录等。 
    (3)解释器、编译器、优化器完成HQL查询语句从词法分析、语法分析、编译、优化以及查询计划的生成。生成的查询计划存储在HDFS中,并在随后有MapReduce调用执行。  
    (4)Hive的数据存储在HDFS中,大部分的查询、计算由MapReduce完成(包含*的查询,比如select * from tbl不会生成MapRedcue任务)。

    Operator操作符

    编译器将一个Hive SQL转换操作符;
    操作符是Hive的最小的处理单元;
    每个操作符代表HDFS的一个操作或者一道MapReduce作业;

    ANTLR词法语法分析工具解析hql

    三、搭建

    划分的维度:对关系型数据库的访问和管理来划分的
    1、local模式:此模式连接到一个In-memory的数据库Derby,一般用于Unit Test。因为在内存不易维护,建议不用。


    2、单用户模式:通过网络连接到一个数据库中,是最经常使用到的模式;

    步骤一:安装mysql
    参考:搭建Linux-java web运行环境之二:安装mysql

    步骤二:解压apache-hive-3.1.1-bin.tar.gz 并设置环境变量
    [root@PCS102 src]# tar -xf apache-hive-3.1.1-bin.tar.gz -C /usr/local

    步骤三:修改配置

    [root@PCS102 conf]# cd /usr/local/apache-hive-3.1.1-bin/conf && ll
    total 332
    -rw-r--r--. 1 root root 1596 Apr 4 2018 beeline-log4j2.properties.template
    -rw-r--r--. 1 root root 299970 Oct 24 08:19 hive-default.xml.template
    -rw-r--r--. 1 root root 2365 Apr 4 2018 hive-env.sh.template
    -rw-r--r--. 1 root root 2274 Apr 4 2018 hive-exec-log4j2.properties.template
    -rw-r--r--. 1 root root 3086 Oct 24 07:49 hive-log4j2.properties.template
    -rw-r--r--. 1 root root 2060 Apr 4 2018 ivysettings.xml
    -rw-r--r--. 1 root root 3558 Oct 24 07:49 llap-cli-log4j2.properties.template
    -rw-r--r--. 1 root root 7163 Oct 24 07:49 llap-daemon-log4j2.properties.template
    -rw-r--r--. 1 root root 2662 Apr 4 2018 parquet-logging.properties

    #复制配置文件

    [root@PCS102 conf]# cp hive-default.xml.template hive-site.xml
    #修改配置hive-site.xml
    <!--hive数据上传到HDFS中的目录-->
    <property>  
      <name>hive.metastore.warehouse.dir</name>  
      <value>/root/hive_remote/warehouse</value>  
    </property>
    <!--hive是否本地模式-->
    <property>  
      <name>hive.metastore.local</name>  
      <value>false</value>  
    </property>
    <!--hive连接mysql地址-->
    <property>  
      <name>javax.jdo.option.ConnectionURL</name>  
      <value>jdbc:mysql://PCS101/hive_remote?createDatabaseIfNotExist=true</value> 
    </property> 
    <!--hive连接mysql驱动类-->
    <property>  
      <name>javax.jdo.option.ConnectionDriverName</name>  
      <value>com.mysql.jdbc.Driver</value>  
    </property>  
    <!--hive连接mysql用户名-->
    <property>  
      <name>javax.jdo.option.ConnectionUserName</name>  
      <value>root</value>  
    </property>  
    <!--hive连接mysql 密码-->
    <property>  
      <name>javax.jdo.option.ConnectionPassword</name>  
      <value>123456</value>  
    </property>
    
    
    :.,$-1d  删除当前行到倒数第二行

    步骤四:拷贝mysql驱动包mysql-connector-java-5.1.32-bin
    /usr/local/apache-hive-3.1.1-bin/lib

    步骤五:初始化数据库  第一次启动之前需要hive元数据库初始化

    [root@PCS102 bin]# /usr/local/apache-hive-3.1.1-bin/bin/schematool -dbType mysql -initSchema

    未初始化 会报错:

    hive> create table test01(id int,age int);
    FAILED: HiveException java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient

    步骤六:启动 默认进入命令行

    [root@PCS102 bin]# /usr/local/apache-hive-3.1.1-bin/bin/hive 
    which: no hbase in (/usr/local/jdk1.8.0_65/bin:/home/cluster/subversion-1.10.3/bin:/home/cluster/apache-storm-0.9.2/bin:/usr/local/hadoop-3.1.1/bin:/usr/local/hadoop-3.1.1/sbin:/usr/local/apache-hive-3.1.1-bin/bin:/usr/local/jdk1.7.0_80/bin:/home/cluster/subversion-1.10.3/bin:/home/cluster/apache-storm-0.9.2/bin:/usr/local/hadoop-3.1.1/bin:/usr/local/hadoop-3.1.1/sbin:/usr/local/jdk1.7.0_80/bin:/home/cluster/subversion-1.10.3/bin:/home/cluster/apache-storm-0.9.2/bin:/usr/local/hadoop-3.1.1/bin:/usr/local/hadoop-3.1.1/sbin:/usr/local/jdk1.7.0_80/bin:/home/cluster/subversion-1.10.3/bin:/home/cluster/apache-storm-0.9.2/bin:/usr/local/sbin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/openssh/bin:/root/bin)
    SLF4J: Class path contains multiple SLF4J bindings.
    SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.1-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: Found binding in [jar:file:/usr/local/hadoop-3.1.1/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
    SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
    SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
    Hive Session ID = e08b6258-e2eb-40af-ba98-87abcb2d1728
    
    Logging initialized using configuration in jar:file:/usr/local/apache-hive-3.1.1-bin/lib/hive-common-3.1.1.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> create table test01(id int,age int);
    OK
    Time taken: 1.168 seconds
    hive> desc test01;
    OK
    id int 
    age int 
    Time taken: 0.181 seconds, Fetched: 2 row(s)
    hive> insert into test01 values(1,23);
    Query ID = root_20190125164516_aa852f47-a9b0-4c59-9043-efb557965a5b
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
    set mapreduce.job.reduces=<number>
    Starting Job = job_1548397153910_0001, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0001/
    Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job -kill job_1548397153910_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2019-01-25 16:45:26,923 Stage-1 map = 0%, reduce = 0%
    2019-01-25 16:45:32,107 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.96 sec
    2019-01-25 16:45:38,271 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.01 sec
    MapReduce Total cumulative CPU time: 8 seconds 10 msec
    Ended Job = job_1548397153910_0001
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to directory hdfs://PCS102:9820/root/hive_remote/warehouse/test01/.hive-staging_hive_2019-01-25_16-45-16_011_1396999443961154869-1/-ext-10000
    Loading data to table default.test01
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 8.01 sec HDFS Read: 14187 HDFS Write: 236 SUCCESS
    Total MapReduce CPU Time Spent: 8 seconds 10 msec
    OK
    Time taken: 23.714 seconds
    hive>

    查看HDFS:

    [root@PCS102 bin]# hdfs dfs -cat /root/hive_remote/warehouse/test01/*
    123

    查看插入数据MR:

    查看mysql:

    字段:

    表:

    步骤七:退出
    hive>exit;
    或者
    hive>quit;


    3、远程服务器模式/多用户模式:用于非Java客户端访问元数据库,在服务器端启动MetaStoreServer,客户端利用Thrift协议通过MetaStoreServer访问元数据库;

    服务端需要启动metastore服务

    [root@PCS102 conf]# nohup hive --service metastore &
    [root@PCS102 conf]# jps
    24657 RunJar
    29075 Jps
    18534 NameNode
    20743 NodeManager
    18712 DataNode
    23609 JobHistoryServer
    28842 RunJar
    20523 ResourceManager
    19020 SecondaryNameNode

    1、服务端和客户端在同一个节点:
    PCS101:mysql服务端
    PCS102:hive服务端和客户端

    PCS102 配置文件:hive-site.xml

    <configuration>
    <!--hive数据上传到HDFS中的目录-->
    <property> 
    <name>hive.metastore.warehouse.dir</name> 
    <value>/root/hive_remote/warehouse</value> 
    </property>
    <!--hive是否本地模式-->
    <property> 
    <name>hive.metastore.local</name> 
    <value>false</value> 
    </property>
    <!--hive连接mysql地址-->
    <property> 
    <name>javax.jdo.option.ConnectionURL</name> 
    <value>jdbc:mysql://PCS101/hive_remote?createDatabaseIfNotExist=true</value> 
    </property> 
    <!--hive连接mysql驱动类-->
    <property> 
    <name>javax.jdo.option.ConnectionDriverName</name> 
    <value>com.mysql.jdbc.Driver</value> 
    </property> 
    <!--hive连接mysql用户名-->
    <property> 
    <name>javax.jdo.option.ConnectionUserName</name> 
    <value>root</value> 
    </property> 
    <!--hive连接mysql 密码-->
    <property> 
    <name>javax.jdo.option.ConnectionPassword</name> 
    <value>123456</value> 
    </property>
    <!--hive meta store client地址-->
    <property> 
    <name>hive.metastore.uris</name> 
    <value>thrift://PCS102:9083</value> 
    </property> 
    </configuration>

    2、服务端和客户端在不同节点(客户端 服务端都要依赖hadoop)
    PCS101:mysql服务端
    PCS102:hive服务端
    PCS103:hive客户端

    PCS102:hive服务端配置文件:hive-site.xml

    <configuration>
    <!--hive数据上传到HDFS中的目录-->
    <property> 
    <name>hive.metastore.warehouse.dir</name> 
    <value>/root/hive_remote/warehouse</value> 
    </property>
    <!--hive连接mysql地址-->
    <property> 
    <name>javax.jdo.option.ConnectionURL</name> 
    <value>jdbc:mysql://PCS101/hive_remote?createDatabaseIfNotExist=true</value> 
    </property> 
    <!--hive连接mysql驱动类-->
    <property> 
    <name>javax.jdo.option.ConnectionDriverName</name> 
    <value>com.mysql.jdbc.Driver</value> 
    </property> 
    <!--hive连接mysql用户名-->
    <property> 
    <name>javax.jdo.option.ConnectionUserName</name> 
    <value>root</value> 
    </property> 
    <!--hive连接mysql 密码-->
    <property> 
    <name>javax.jdo.option.ConnectionPassword</name> 
    <value>123456</value> 
    </property>
    </configuration>

    PCS103:hive客户端配置文件:hive-site.xml

    <configuration>
    <!--hive数据上传到HDFS中的目录-->
    <property> 
    <name>hive.metastore.warehouse.dir</name> 
    <value>/root/hive_remote/warehouse</value> 
    </property>
    <!--hive是否本地模式-->
    <property> 
    <name>hive.metastore.local</name> 
    <value>false</value> 
    </property>
    <!--hive meta store client地址-->
    <property> 
    <name>hive.metastore.uris</name> 
    <value>thrift://PCS102:9083</value> 
    </property> 
    </configuration>

    问题:/usr/local/hadoop-2.6.5/share/hadoop/yarn/lib   下jar包jline-0.9.94.jar 比较老导致  将hive下jline拷贝到hadoop下就可以了

    [root@node101 bin]# schematool -dbType mysql -initSchema
    19/07/02 16:21:32 WARN conf.HiveConf: HiveConf of name hive.metastore.local does not exist
    Metastore connection URL:     jdbc:mysql://node102/hive_remote?createDatabaseIfNotExist=true
    Metastore Connection Driver :     com.mysql.jdbc.Driver
    Metastore connection User:     root
    Starting metastore schema initialization to 1.2.0
    Initialization script hive-schema-1.2.0.mysql.sql
    [ERROR] Terminal initialization failed; falling back to unsupported
    java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
        at jline.TerminalFactory.create(TerminalFactory.java:101)
        at jline.TerminalFactory.get(TerminalFactory.java:158)
        at org.apache.hive.beeline.BeeLineOpts.<init>(BeeLineOpts.java:74)
        at org.apache.hive.beeline.BeeLine.<init>(BeeLine.java:117)
        at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:346)
        at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
        at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:266)
        at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:243)
        at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:473)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    
    Exception in thread "main" java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected
        at org.apache.hive.beeline.BeeLineOpts.<init>(BeeLineOpts.java:102)
        at org.apache.hive.beeline.BeeLine.<init>(BeeLine.java:117)
        at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:346)
        at org.apache.hive.beeline.HiveSchemaTool.runBeeLine(HiveSchemaTool.java:326)
        at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:266)
        at org.apache.hive.beeline.HiveSchemaTool.doInit(HiveSchemaTool.java:243)
        at org.apache.hive.beeline.HiveSchemaTool.main(HiveSchemaTool.java:473)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:497)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    View Code
  • 相关阅读:
    php 1011
    php 1008
    php 1012
    php 1007
    解决由于找不到msvcp120.dll无法继续执行代码的问题
    给计算机专业学生的7个建议 来自书圈
    使用sshwifty搭建WebSHH服务器
    Springboot使用Swagger
    SpringBoot 启动时自动执行代码的几种方式
    Docker运行镜像报错处理 docker0: iptables: No chain/target/match by that name.
  • 原文地址:https://www.cnblogs.com/cac2020/p/10315105.html
Copyright © 2020-2023  润新知