Hive中的数据类型可以分为两种,分别是基本数据类型和复杂数据类型,本篇将通过一个例子来说明:
1 数据样本
[hadoop@strong ~]$ vim employee
Michael|Montreal,Toronto|Male,30|DB:80|Product:Developer^DLead
Will|Montreal|Male,35|Perl:85|Product:Lead,Test:Lead
Shelley|New York|Female,27|Python:80|Test:Lead,COE:Architect
Lucy|Vancouver|Female,57|Sales:89,HR:94|Sales:Lead
2 创建表
0: jdbc:hive2://localhost:10000/hive> create table employee(
. . . . . . . . . . . . . . . . . . > name string,
. . . . . . . . . . . . . . . . . . > work_place array<string>,
. . . . . . . . . . . . . . . . . . > sex_age struct<sex:string,age:int>,
. . . . . . . . . . . . . . . . . . > skills_score map<string,int>,
. . . . . . . . . . . . . . . . . . > depart_title map<string,array<string>>)
. . . . . . . . . . . . . . . . . . > row format delimited
. . . . . . . . . . . . . . . . . . > fields terminated by '|'
. . . . . . . . . . . . . . . . . . > collection items terminated by ','
. . . . . . . . . . . . . . . . . . > map keys terminated by ':';
No rows affected (3.454 seconds)
3 查看表创建
0: jdbc:hive2://localhost:10000/hive> !table employee
+------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
| TABLE_CAT | TABLE_SCHEM | TABLE_NAME | TABLE_TYPE | REMARKS | TYPE_CAT | TYPE_SCHEM | TYPE_NAME | SELF_REFERENCING_COL_NAME | REF_GENERATION |
+------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
| | hive | employee | TABLE | NULL | NULL | NULL | NULL | NULL | NULL |
+------------+--------------+-------------+-------------+----------+-----------+-------------+------------+----------------------------+-----------------+
0: jdbc:hive2://localhost:10000/hive>
4 加载数据
0: jdbc:hive2://localhost:10000/hive> load data local inpath '/home/hadoop/employee' overwrite into table employee;
No rows affected (3.957 seconds)
5 查询数据
1)查询整个数组与单个数组列
0: jdbc:hive2://localhost:10000/hive> select work_place from employee;
+-------------------------+
| work_place |
+-------------------------+
| ["Montreal","Toronto"] |
| ["Montreal"] |
| ["New York"] |
| ["Vancouver"] |
+-------------------------+
4 rows selected (0.431 seconds)
0: jdbc:hive2://localhost:10000/hive> select work_place[0] as c1,work_place[1] as c2,work_place[2] from employee;
+------------+----------+-------+
| c1 | c2 | _c2 |
+------------+----------+-------+
| Montreal | Toronto | NULL |
| Montreal | NULL | NULL |
| New York | NULL | NULL |
| Vancouver | NULL | NULL |
+------------+----------+-------+
4 rows selected (0.775 seconds)
2)查询整个结构体和单个结构体列
0: jdbc:hive2://localhost:10000/hive> select sex_age from employee;
+----------------------------+
| sex_age |
+----------------------------+
| {"sex":"Male","age":30} |
| {"sex":"Male","age":35} |
| {"sex":"Female","age":27} |
| {"sex":"Female","age":57} |
+----------------------------+
4 rows selected (0.895 seconds)
0: jdbc:hive2://localhost:10000/hive> select sex_age.sex ,sex_age.age from employee;
+---------+------+
| sex | age |
+---------+------+
| Male | 30 |
| Male | 35 |
| Female | 27 |
| Female | 57 |
+---------+------+
4 rows selected (0.825 seconds)
3)查询整个map和单个map列
0: jdbc:hive2://localhost:10000/hive> select skills_score from employee;
+-----------------------+
| skills_score |
+-----------------------+
| {"DB":80} |
| {"Perl":85} |
| {"Python":80} |
| {"Sales":89,"HR":94} |
+-----------------------+
4 rows selected (1.011 seconds)
0: jdbc:hive2://localhost:10000/hive> select name,skills_score['DB'] as db,skills_score['Perl'] as perl,
. . . . . . . . . . . . . . . . . . > skills_score['Python'] as python,skills_score['Sales'] as sales,skills_score['HR'] as hr
. . . . . . . . . . . . . . . . . . > from employee;
+----------+-------+-------+---------+--------+-------+
| name | db | perl | python | sales | hr |
+----------+-------+-------+---------+--------+-------+
| michael | 80 | NULL | NULL | NULL | NULL |
| Will | NULL | 85 | NULL | NULL | NULL |
| Shelley | NULL | NULL | 80 | NULL | NULL |
| Lucy | NULL | NULL | NULL | 89 | 94 |
+----------+-------+-------+---------+--------+-------+
4 rows selected (0.576 seconds)
4)查询组合数据类型
0: jdbc:hive2://localhost:10000/hive> select depart_title from employee;
+----------------------------------------+
| depart_title |
+----------------------------------------+
| {"Product":["Developer","Lead"]} |
| {"Product":["Lead"],"Test":["Lead"]} |
| {"Test":["Lead"],"COE":["Architect"]} |
| {"Sales":["Lead"]} |
+----------------------------------------+
4 rows selected (0.511 seconds)
0: jdbc:hive2://localhost:10000/hive> select name,
. . . . . . . . . . . . . . . . . . > depart_title['Product'] as product,
. . . . . . . . . . . . . . . . . . > depart_title['Test'] as test,
. . . . . . . . . . . . . . . . . . > depart_title['COE'] as coe,
. . . . . . . . . . . . . . . . . . > depart_title['Sales'] as sales
. . . . . . . . . . . . . . . . . . > from employee;
+----------+-----------------------+-----------+----------------+-----------+
| name | product | test | coe | sales |
+----------+-----------------------+-----------+----------------+-----------+
| Michael | ["Developer","Lead"] | NULL | NULL | NULL |
| Will | ["Lead"] | ["Lead"] | NULL | NULL |
| Shelley | NULL | ["Lead"] | ["Architect"] | NULL |
| Lucy | NULL | NULL | NULL | ["Lead"] |
+----------+-----------------------+-----------+----------------+-----------+
4 rows selected (0.524 seconds)
0: jdbc:hive2://localhost:10000/hive> select name,depart_title['Product'][0] as prd_c1,
. . . . . . . . . . . . . . . . . . > depart_title['Test'][0] as tst_c1
. . . . . . . . . . . . . . . . . . > from employee;
+----------+------------+---------+
| name | prd_c1 | tst_c1 |
+----------+------------+---------+
| Michael | Developer | NULL |
| Will | Lead | Lead |
| Shelley | NULL | Lead |
| Lucy | NULL | NULL |
+----------+------------+---------+
4 rows selected (0.469 seconds)
注:Hive中默认的分隔符如下:
- Row delimiter:Ctrl + A 或 ^A(创建表时用 01);
- Collection item delimiter:Ctrl + B 或 ^B(创建表时用 02);
- Map key delimiter:Ctrl + C 或 ^C(创建表时用 03);
- 嵌套Map的Array:Ctrl + D 或 ^D(创建表时用 04);