• Hive查询


    Hive提供Hive Query language(HQL)用来进行数据操作

    Hive查询提供下面的特性

    • 数据模型,例如数据库和表的创建
    • ETL方法,例如提取,转变,加载数据到表里
    • join来合并不同的数据表
    • 用户自定义的脚本
    • 基于Hadoop快速查询工具

    首先创建表

    hive> 
        > create table employee(id int, name string, price int, area string)
        > row format delimited
        > fields terminated by '|'
        > location '/yandufeng/employee';
    
    hive> local data local inpath '/home/hive/test.txt' into table employee;
    
    hive> 
        > 
        > select * from employee;
    OK
    16    john    4000    USA
    17    robert    2000    USA
    18    andrew    4000    USA
    19    katty    2000    USA
    27    edward    4000    UK
    29    alan    3000    UK
    31    kerry    4000    UK
    34    tom    3000    UK
    35    zack    2000    UK
    Time taken: 0.106 seconds, Fetched: 9 row(s)

    排序查询

    hive> select * from employee order by id desc;
    Query ID = hive_20170223061813_9d7b31aa-7375-4773-b324-0214e6f55778
    Total jobs = 1
    Launching Job 1 out of 1
    
    
    Status: Running (Executing on YARN cluster with App id application_1479783640101_0035)
    
    --------------------------------------------------------------------------------
            VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
    --------------------------------------------------------------------------------
    Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
    Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
    --------------------------------------------------------------------------------
    VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 1.02 s     
    --------------------------------------------------------------------------------
    OK
    35    zack    2000    UK
    34    tom    3000    UK
    31    kerry    4000    UK
    29    alan    3000    UK
    27    edward    4000    UK
    19    katty    2000    USA
    18    andrew    4000    USA
    17    robert    2000    USA
    16    john    4000    USA
    Time taken: 1.541 seconds, Fetched: 9 row(s)

     分组查询

    hive> 
        > select count(*), area from employee group by area;
    Query ID = hive_20170223062041_5e496181-62d3-40e5-9202-4183b2facab7
    Total jobs = 1
    Launching Job 1 out of 1
    
    
    Status: Running (Executing on YARN cluster with App id application_1479783640101_0035)
    
    --------------------------------------------------------------------------------
            VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
    --------------------------------------------------------------------------------
    Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
    Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
    --------------------------------------------------------------------------------
    VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 4.04 s     
    --------------------------------------------------------------------------------
    OK
    5    UK
    4    USA
    Time taken: 5.331 seconds, Fetched: 2 row(s)

    Sort by:

    hive> select * from employee sort by id desc;
    Query ID = hive_20170223062352_c02cbd86-9e49-4473-a951-5bdad33ca4b9
    Total jobs = 1
    Launching Job 1 out of 1
    
    
    Status: Running (Executing on YARN cluster with App id application_1479783640101_0035)
    
    --------------------------------------------------------------------------------
            VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
    --------------------------------------------------------------------------------
    Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
    Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
    --------------------------------------------------------------------------------
    VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 4.87 s     
    --------------------------------------------------------------------------------
    OK
    35    zack    2000    UK
    34    tom    3000    UK
    31    kerry    4000    UK
    29    alan    3000    UK
    27    edward    4000    UK
    19    katty    2000    USA
    18    andrew    4000    USA
    17    robert    2000    USA
    16    john    4000    USA
    Time taken: 5.445 seconds, Fetched: 9 row(s)

    Cluster By:

    hive>
        > select * from employee cluster by id;
    Query ID = hive_20170223062557_c6c1b261-645d-4785-b017-0f4ceda9bb0b
    Total jobs = 1
    Launching Job 1 out of 1
    
    
    Status: Running (Executing on YARN cluster with App id application_1479783640101_0035)
    
    --------------------------------------------------------------------------------
            VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
    --------------------------------------------------------------------------------
    Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
    Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
    --------------------------------------------------------------------------------
    VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 3.85 s     
    --------------------------------------------------------------------------------
    OK
    16    john    4000    USA
    17    robert    2000    USA
    18    andrew    4000    USA
    19    katty    2000    USA
    27    edward    4000    UK
    29    alan    3000    UK
    31    kerry    4000    UK
    34    tom    3000    UK
    35    zack    2000    UK
    Time taken: 4.496 seconds, Fetched: 9 row(s)

     Distribute By:

    hive> select * from employee distribute by id;
    Query ID = hive_20170223062709_3c81e53e-fbaf-4360-85d5-8573a2c17a7d
    Total jobs = 1
    Launching Job 1 out of 1
    
    
    Status: Running (Executing on YARN cluster with App id application_1479783640101_0035)
    
    --------------------------------------------------------------------------------
            VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
    --------------------------------------------------------------------------------
    Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
    Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
    --------------------------------------------------------------------------------
    VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 4.68 s     
    --------------------------------------------------------------------------------
    OK
    16    john    4000    USA
    18    andrew    4000    USA
    34    tom    3000    UK
    17    robert    2000    USA
    19    katty    2000    USA
    27    edward    4000    UK
    29    alan    3000    UK
    31    kerry    4000    UK
    35    zack    2000    UK
    Time taken: 5.211 seconds, Fetched: 9 row(s)

     Join查询

    hive
        > select * from employee e join employee p on(e.id=p.id);
    Query ID = hive_20170223062925_ac84adc4-1daf-417c-8eee-f13ec1a1657e
    Total jobs = 1
    Launching Job 1 out of 1
    
    
    Status: Running (Executing on YARN cluster with App id application_1479783640101_0035)
    
    --------------------------------------------------------------------------------
            VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
    --------------------------------------------------------------------------------
    Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
    Map 2 ..........   SUCCEEDED      1          1        0        0       0       0
    --------------------------------------------------------------------------------
    VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 4.64 s     
    --------------------------------------------------------------------------------
    OK
    16    john    4000    USA    16    john    4000    USA
    17    robert    2000    USA    17    robert    2000    USA
    18    andrew    4000    USA    18    andrew    4000    USA
    19    katty    2000    USA    19    katty    2000    USA
    27    edward    4000    UK    27    edward    4000    UK
    29    alan    3000    UK    29    alan    3000    UK
    31    kerry    4000    UK    31    kerry    4000    UK
    34    tom    3000    UK    34    tom    3000    UK
    35    zack    2000    UK    35    zack    2000    UK
    Time taken: 5.907 seconds, Fetched: 9 row(s)

     子查询

    hive> 
        > select * from employee e where id not in (select id from employee p);
    Warning: Map Join MAPJOIN[28][bigTable=e] in task 'Map 1' is a cross product
    Query ID = hive_20170223063132_141d729e-eb03-40ec-a290-4885e79060ec
    Total jobs = 1
    Launching Job 1 out of 1
    
    
    Status: Running (Executing on YARN cluster with App id application_1479783640101_0035)
    
    --------------------------------------------------------------------------------
            VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
    --------------------------------------------------------------------------------
    Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
    Map 2 ..........   SUCCEEDED      1          1        0        0       0       0
    Map 3 ..........   SUCCEEDED      1          1        0        0       0       0
    Reducer 4 ......   SUCCEEDED      1          1        0        0       0       0
    --------------------------------------------------------------------------------
    VERTICES: 04/04  [==========================>>] 100%  ELAPSED TIME: 7.29 s     
    --------------------------------------------------------------------------------
    OK
    Time taken: 8.419 seconds
  • 相关阅读:
    行动- 一桌菜,十几盘,有荤有素,有凉有热,怎么吃呢?你可以找一盘看起来好吃的,也可以找一盘离自己近的,都行。但是,得动筷子。不管怎么吃,最重要的是得动筷子。学技术也是一样。 有的人死活不动筷子,还不断念叨:“我要开始吃了。我马上就要开始吃了。我只要开始吃就能吃饱。我吃饱了就不饿了。你能不能告诉我该先吃哪一盘?先冷盘后热菜再喝汤这样是不是最好?但是我听说广东人都是先...
    行动
    互相牵制的能力
    我的信仰
    my life
    PDCA循环
    命运
    健身 赚钱 ; 旅行 用心爱一个人就行了 其他的都会开挂来临~
    灵魂
    陪伴的意义
  • 原文地址:https://www.cnblogs.com/yandufeng/p/6433345.html
Copyright © 2020-2023  润新知