• 【Hive】数据类型实战


    Hive中的数据类型可以分为两种,分别是基本数据类型和复杂数据类型,本篇将通过一个例子来说明:


    1 数据样本
    [hadoop@strong ~]$ vim employee 
    Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer^DLead
    Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
    Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
    Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
    2 创建表
    0: jdbc:hive2://localhost:10000/hive> create table employee(
    . . . . . . . . . . . . . . . . . . > name string,
    . . . . . . . . . . . . . . . . . . > work_place array<string>,
    . . . . . . . . . . . . . . . . . . > sex_age struct<sex:string,age:int>,
    . . . . . . . . . . . . . . . . . . > skills_score map<string,int>,
    . . . . . . . . . . . . . . . . . . > depart_title map<string,array<string>>)
    . . . . . . . . . . . . . . . . . . > row format delimited
    . . . . . . . . . . . . . . . . . . > fields terminated by '|'
    . . . . . . . . . . . . . . . . . . > collection items terminated by ','
    . . . . . . . . . . . . . . . . . . > map keys terminated by ':';
    No rows affected (3.454 seconds)
    3 查看表创建
    0: jdbc:hive2://localhost:10000/hive> !table employee
    +------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
    | TABLE_CAT  | TABLE_SCHEM  | TABLE_NAME  | TABLE_TYPE  | REMARKS  | TYPE_CAT  | TYPE_SCHEM  | TYPE_NAME  | SELF_REFERENCING_COL_NAME  | REF_GENERATION  |
    +------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
    |            | hive         | employee    | TABLE       | NULL     | NULL      | NULL        | NULL       | NULL                       | NULL            |
    +------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
    0: jdbc:hive2://localhost:10000/hive> 
    4 加载数据
    0: jdbc:hive2://localhost:10000/hive> load data local inpath '/home/hadoop/employee' overwrite into table employee;
    No rows affected (3.957 seconds)
    5 查询数据

    1)查询整个数组与单个数组列
    0: jdbc:hive2://localhost:10000/hive> select work_place from employee;
    +-------------------------+
    |       work_place        |
    +-------------------------+
    | ["Montreal","Toronto"]  |
    | ["Montreal"]            |
    | ["New York"]            |
    | ["Vancouver"]           |
    +-------------------------+
    4 rows selected (0.431 seconds)
    0: jdbc:hive2://localhost:10000/hive> select work_place[0] as c1,work_place[1] as c2,work_place[2] from employee; 
    +------------+----------+-------+
    |     c1     |    c2    |  _c2  |
    +------------+----------+-------+
    | Montreal   | Toronto  | NULL  |
    | Montreal   | NULL     | NULL  |
    | New York   | NULL     | NULL  |
    | Vancouver  | NULL     | NULL  |
    +------------+----------+-------+
    4 rows selected (0.775 seconds)
    2)查询整个结构体和单个结构体列
    0: jdbc:hive2://localhost:10000/hive> select sex_age from employee;
    +----------------------------+
    |          sex_age           |
    +----------------------------+
    | {"sex":"Male","age":30}    |
    | {"sex":"Male","age":35}    |
    | {"sex":"Female","age":27}  |
    | {"sex":"Female","age":57}  |
    +----------------------------+
    4 rows selected (0.895 seconds)
    0: jdbc:hive2://localhost:10000/hive> select sex_age.sex ,sex_age.age from employee;
    +---------+------+
    |   sex   | age  |
    +---------+------+
    | Male    | 30   |
    | Male    | 35   |
    | Female  | 27   |
    | Female  | 57   |
    +---------+------+
    4 rows selected (0.825 seconds)
    3)查询整个map和单个map列
    0: jdbc:hive2://localhost:10000/hive> select skills_score from employee;
    +-----------------------+
    |     skills_score      |
    +-----------------------+
    | {"DB":80}             |
    | {"Perl":85}           |
    | {"Python":80}         |
    | {"Sales":89,"HR":94}  |
    +-----------------------+
    4 rows selected (1.011 seconds)
    0: jdbc:hive2://localhost:10000/hive> select name,skills_score['DB'] as db,skills_score['Perl'] as perl,
    . . . . . . . . . . . . . . . . . . > skills_score['Python'] as python,skills_score['Sales'] as sales,skills_score['HR'] as hr
    . . . . . . . . . . . . . . . . . . > from employee;
    +----------+-------+-------+---------+--------+-------+
    |   name   |  db   | perl  | python  | sales  |  hr   |
    +----------+-------+-------+---------+--------+-------+
    | michael  | 80    | NULL  | NULL    | NULL   | NULL  |
    | Will     | NULL  | 85    | NULL    | NULL   | NULL  |
    | Shelley  | NULL  | NULL  | 80      | NULL   | NULL  |
    | Lucy     | NULL  | NULL  | NULL    | 89     | 94    |
    +----------+-------+-------+---------+--------+-------+
    4 rows selected (0.576 seconds)
    4)查询组合数据类型
    0: jdbc:hive2://localhost:10000/hive> select depart_title from employee;
    +----------------------------------------+
    |              depart_title              |
    +----------------------------------------+
    | {"Product":["Developer","Lead"]}       |
    | {"Product":["Lead"],"Test":["Lead"]}   |
    | {"Test":["Lead"],"COE":["Architect"]}  |
    | {"Sales":["Lead"]}                     |
    +----------------------------------------+
    4 rows selected (0.511 seconds)
    0: jdbc:hive2://localhost:10000/hive> select name,
    . . . . . . . . . . . . . . . . . . > depart_title['Product'] as product,
    . . . . . . . . . . . . . . . . . . > depart_title['Test'] as test,
    . . . . . . . . . . . . . . . . . . > depart_title['COE'] as coe,
    . . . . . . . . . . . . . . . . . . > depart_title['Sales'] as sales 
    . . . . . . . . . . . . . . . . . . > from employee;
    +----------+-----------------------+-----------+----------------+-----------+
    |   name   |        product        |   test    |      coe       |   sales   |
    +----------+-----------------------+-----------+----------------+-----------+
    | Michael  | ["Developer","Lead"]  | NULL      | NULL           | NULL      |
    | Will     | ["Lead"]              | ["Lead"]  | NULL           | NULL      |
    | Shelley  | NULL                  | ["Lead"]  | ["Architect"]  | NULL      |
    | Lucy     | NULL                  | NULL      | NULL           | ["Lead"]  |
    +----------+-----------------------+-----------+----------------+-----------+
    4 rows selected (0.524 seconds)
    0: jdbc:hive2://localhost:10000/hive> select name,depart_title['Product'][0] as prd_c1,
    . . . . . . . . . . . . . . . . . . > depart_title['Test'][0] as tst_c1
    . . . . . . . . . . . . . . . . . . > from employee;
    +----------+------------+---------+
    |   name   |   prd_c1   | tst_c1  |
    +----------+------------+---------+
    | Michael  | Developer  | NULL    |
    | Will     | Lead       | Lead    |
    | Shelley  | NULL       | Lead    |
    | Lucy     | NULL       | NULL    |
    +----------+------------+---------+
    4 rows selected (0.469 seconds)
    注:Hive中默认的分隔符如下:
    • Row delimiter:Ctrl + A 或 ^A(创建表时用01);
    • Collection item delimiter:Ctrl + B 或 ^B(创建表时用02);
    • Map key delimiter:Ctrl + C 或 ^C(创建表时用03);
    • 嵌套Map的Array:Ctrl + D 或 ^D(创建表时用04);


  • 相关阅读:
    操作系统的概念
    流量监听的基础利用
    后门的学习与清理
    关于phpstudy打开phpmyadmin无法打开
    SSRF服务器请求伪造
    javascript基础流程控制
    Repository中自己写修改、删除语句是需要添加的注解
    SpringBoot链接数据库
    SpringBoot项目实战:企业项目管理系统
    ErrorUtils
  • 原文地址:https://www.cnblogs.com/alen-liu-sz/p/12975624.html
Copyright © 2020-2023  润新知