• 《hadoop权威指南》关于hive的第一个小例子的演示


    本文是《hadoop权威指南》关于hive的小例子,通过这个例子可以很好地看出来hive是个什么东西。

    前提是已经配置好hive的远程连接版本的环境,我是用了MYSQL数据库保存元数据。

    环境要求:

    -配置好了Hadoop的HDFS文件系统,启动hdfs和yarn

    -配置好了hive的远程连接模式

    -配置好了MySQL用于metadata的储存

    输入文件下载: https://github.com/tomwhite/hadoop-book/blob/master/input/ncdc/micro-tab/sample.txt

    第一步,创建一个表格records,表格名字和数据源的字段,年份,温度和quality 。

    Logging initialized using configuration in file:/usr/local/hive/conf/hive-log4j.properties
    hive> Create table records(year String,temperature INT,quality INT)
        > ROW FORMAT DELIMITED
        > FIELDS TERMINATED BY '	'
        > ;
    OK

     第二部,把保存在linux上的数据上传到刚才创建的表格中。

    注意:数据是没有固定格式的,因为目前input是用分隔符“ ”分割的。所以上一步中使用了(FIELDS TERMINATED BY ' ')来

    HIVE没有专门数据格式,用户只要创建表的时候告诉Hive数据中的列分隔符和行分隔符,Hive就可以解析数据
    hive> LOAD DATA LOCAL INPATH 'sample.txt'
        > OVERWRITE INTO TABLE records;
    Loading data to table default.records
    Table default.records stats: [numFiles=1, numRows=0, totalSize=51, rawDataSize=0]
    OK
    Time taken: 6.03 seconds

    执行HiveQL语句,从刚才数据中抽取每年的温度最高值

    整个过程和MapReduce一致,一共耗费30秒。

    hive> SELECT year,MAX(temperature)
        >  FROM records
        > WHERE temperature !=999 AND quality IN (0,1,4,5,9)
        > GROUP BY year;
    Query ID = root_20171107090403_c61a6f9a-05d4-4d0f-a97b-d37fb83ef65d
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1510015112691_0001, Tracking URL = http://server71:8088/proxy/application_1510015112691_0001/
    Kill Command = /usr/local/hadoop/bin/hadoop job  -kill job_1510015112691_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2017-11-07 09:05:58,529 Stage-1 map = 0%,  reduce = 0%
    2017-11-07 09:06:59,061 Stage-1 map = 0%,  reduce = 0%
    2017-11-07 09:07:11,068 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 16.88 sec
    2017-11-07 09:07:53,824 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 20.75 sec
    2017-11-07 09:08:03,489 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 28.83 sec
    MapReduce Total cumulative CPU time: 28 seconds 830 msec
    Ended Job = job_1510015112691_0001
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 28.83 sec   HDFS Read: 8355 HDFS Write: 17 SUCCESS
    Total MapReduce CPU Time Spent: 28 seconds 830 msec
    OK
    1949    111
    1950    22
    Time taken: 243.092 seconds, Fetched: 2 row(s)

     我们可以看到整个过程和查询结果1949年和1950年的最高温度。

  • 相关阅读:
    SqlServer2008启动不了的问题
    .exe 不包含适合入口点的静态“Main”方法
    两种常用的序列化
    异或运算^
    SqlServer数据库实现C#中的split功能
    遍历电脑下面所有文件--递归
    ExecuteNonQuery引发了System.ArgumentException类型异常
    JQuery属性过滤(转)
    SQL SERVER 执行远端数据库的SQL命令
    马云:做一个静静的观察者 能学到更多
  • 原文地址:https://www.cnblogs.com/kouryoushine/p/7798434.html
Copyright © 2020-2023  润新知