• Hive静态分区和动态分区,对应Mysql中的元数据信息


    静态分区:

    手动指定分区加载数据,就是常说的静态分区的使用。但是在日常工作中用的比较多的是动态分区。

    创建:

    hive> create table order_mulit_partition(
        > order_number string,
        > event_time string
        > )
        > PARTITIONED BY(event_month string, step string)
        > row format delimited fields terminated by '	';
    

    加载数据:

    hive> load data local inpath '/opt/data/order_created.txt' overwrite into table order_mulit_partition PARTITION(event_month='201405', step='1');
    

    查看数据:

    hive> select * from order_mulit_partition;
    OK
    10703007267488  2014-05-01 06:01:12.334+01  NULL    201405  1
    10101043505096  2014-05-01 07:28:12.342+01  NULL    201405  1
    10103043509747  2014-05-01 07:50:12.33+01   NULL    201405  1
    10103043501575  2014-05-01 09:27:12.33+01   NULL    201405  1
    10104043514061  2014-05-01 09:03:12.324+01  NULL    201405  1
    

    动态分区:

    需求:按照不同部门作为分区导数据到目标表

    以上需求如果用静态分区的话,数据量大你是不是很懵逼??所以这个需求一般采用动态分区来实现。

    1、emp表的创建及内容

    创建:

    hive> create table emp(
        > empno int,
        > ename string,
        > job string,
        > mgr int,
        > hiredate string,
        > sal double,
        > comm double,
        > deptno int
        > )row format delimited fields terminated by '	';
    

    加载数据:

    hive> load data local inpath '/home/hadoop/datas/emp.txt' overwrite into table emp;
    

    字段内容:

    hive> select * from emp;
    OK
    7369    SMITH   CLERK   7902    1980-12-17  800.0   NULL    20
    7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0   30
    7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0   30
    7566    JONES   MANAGER 7839    1981-4-2    2975.0  NULL    20
    7654    MARTIN  SALESMAN    7698    1981-9-28   1250.0  1400.0  30
    7698    BLAKE   MANAGER 7839    1981-5-1    2850.0  NULL    30
    7782    CLARK   MANAGER 7839    1981-6-9    2450.0  NULL    10
    7788    SCOTT   ANALYST 7566    1987-4-19   3000.0  NULL    20
    7839    KING    PRESIDENT   NULL    1981-11-17  5000.0  NULL    10
    7844    TURNER  SALESMAN    7698    1981-9-8    1500.0  0.0 30
    7876    ADAMS   CLERK   7788    1987-5-23   1100.0  NULL    20
    7900    JAMES   CLERK   7698    1981-12-3   950.0   NULL    30
    7902    FORD    ANALYST 7566    1981-12-3   3000.0  NULL    20
    7934    MILLER  CLERK   7782    1982-1-23   1300.0  NULL    10
    8888    HIVE    PROGRAM 7839    1988-1-23   10300.0 NULL    NULL
    

    显示分区:

    hive> show partitions order_mulit_partition;
    OK
    event_month=201405/step=1
    

    2、创建目标表

    hive> create table emp_dynamic_partition(
                  > empno int, 
                  > ename string, 
                  > job string, 
                  > mgr int, 
                  > hiredate string, 
                  > sal double, 
                  > comm double)
                  > PARTITIONED BY(deptno int)
                  > row format delimited fields terminated by '	';
    

    3、 采用动态方式家在数据到目标表

    加载之前先设置下面的参数(设置为非严格模式)

    hive> set hive.exec.dynamic.partition.mode=nonstrict
    

    开始加载数据

    hive> insert into table emp_dynamic_partition partition(deptno)
    select empno , ename , job , mgr , hiredate , sal , comm, deptno from emp;
    

    注意: 上面加载数据方式并没有指定具体的分区,只是指出了分区字段。在select最后一个字段必须跟你的分区字段,这样就会自行根据deptno的value来分区。

    4、验证

    有值

    hive> select * from emp_dynamic_partition;
    OK
    7782    CLARK   MANAGER 7839    1981-6-9    2450.0  NULL    10
    7839    KING    PRESIDENT   NULL    1981-11-17  5000.0  NULL    10
    7934    MILLER  CLERK   7782    1982-1-23   1300.0  NULL    10
    7369    SMITH   CLERK   7902    1980-12-17  800.0   NULL    20
    7566    JONES   MANAGER 7839    1981-4-2    2975.0  NULL    20
    7788    SCOTT   ANALYST 7566    1987-4-19   3000.0  NULL    20
    7876    ADAMS   CLERK   7788    1987-5-23   1100.0  NULL    20
    7902    FORD    ANALYST 7566    1981-12-3   3000.0  NULL    20
    7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0   30
    7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0   30
    7654    MARTIN  SALESMAN    7698    1981-9-28   1250.0  1400.0  30
    7698    BLAKE   MANAGER 7839    1981-5-1    2850.0  NULL    30
    7844    TURNER  SALESMAN    7698    1981-9-8    1500.0  0.0 30
    7900    JAMES   CLERK   7698    1981-12-3   950.0   NULL    30
    8888    HIVE    PROGRAM 7839    1988-1-23   10300.0 NULL    NULL
    

    有分区(自动分区)

    hive> show partitions emp_dynamic_partition;
    OK
    deptno=10
    deptno=20
    deptno=30
    deptno=__HIVE_DEFAULT_PARTITION__
    Time taken: 0.375 seconds, Fetched: 4 row(s)
    

    5、各个分区中的内容

    deptno=10

    782 CLARK   MANAGER 7839    1981-6-9    2450.0  N
    7839    KING    PRESIDENT   N  1981-11-17  5000.0  N
    7934    MILLER  CLERK   7782    1982-1-23   1300.0  N
    

    deptno=20

    7369    SMITH   CLERK   7902    1980-12-17  800.0   N
    7566    JONES   MANAGER 7839    1981-4-2    2975.0  N
    7788    SCOTT   ANALYST 7566    1987-4-19   3000.0  N
    7876    ADAMS   CLERK   7788    1987-5-23   1100.0  N
    7902    FORD    ANALYST 7566    1981-12-3   3000.0  N
    

    deptno=30

    7499    ALLEN   SALESMAN    7698    1981-2-20   1600.0  300.0
    7521    WARD    SALESMAN    7698    1981-2-22   1250.0  500.0
    7654    MARTIN  SALESMAN    7698    1981-9-28   1250.0  1400.0
    7698    BLAKE   MANAGER 7839    1981-5-1    2850.0  N
    7844    TURNER  SALESMAN    7698    1981-9-8    1500.0  0.0
    7900    JAMES   CLERK   7698    1981-12-3   950.0   N
    

    deptno=__HIVE_DEFAULT_PARTITION__

    8888    HIVE    PROGRAM 7839    1988-1-23   10300.0 N
    

    6、对应mysql中元数据的信息:

    mysql> select * from TBLS G;
    *************************** 1. row ***************************
                TBL_ID: 1
           CREATE_TIME: 1561364642
                 DB_ID: 6
      LAST_ACCESS_TIME: 0
                 OWNER: hadoop
             RETENTION: 0
                 SD_ID: 1
              TBL_NAME: order_mulit_partition
              TBL_TYPE: MANAGED_TABLE
    VIEW_EXPANDED_TEXT: NULL
    VIEW_ORIGINAL_TEXT: NULL
    *************************** 2. row ***************************
                TBL_ID: 2
           CREATE_TIME: 1561365034
                 DB_ID: 6
      LAST_ACCESS_TIME: 0
                 OWNER: hadoop
             RETENTION: 0
                 SD_ID: 3
              TBL_NAME: emp_dynamic_partition
              TBL_TYPE: MANAGED_TABLE
    VIEW_EXPANDED_TEXT: NULL
    VIEW_ORIGINAL_TEXT: NULL
    *************************** 3. row ***************************
                TBL_ID: 6
           CREATE_TIME: 1561366290
                 DB_ID: 6
      LAST_ACCESS_TIME: 0
                 OWNER: hadoop
             RETENTION: 0
                 SD_ID: 6
              TBL_NAME: emp
              TBL_TYPE: MANAGED_TABLE
    VIEW_EXPANDED_TEXT: NULL
    VIEW_ORIGINAL_TEXT: NULL
    3 rows in set (0.00 sec)
    
    ERROR:
    No query specified
    
    mysql> select * from PARTITIONS;
    +---------+-------------+------------------+-----------------------------------+-------+--------+
    | PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME                         | SD_ID | TBL_ID |
    +---------+-------------+------------------+-----------------------------------+-------+--------+
    |       1 |  1561364780 |                0 | event_month=201405/step=1         |     2 |      1 |
    |       6 |  1561371156 |                0 | deptno=10                         |    11 |      2 |
    |       7 |  1561371157 |                0 | deptno=__HIVE_DEFAULT_PARTITION__ |    12 |      2 |
    |       8 |  1561371158 |                0 | deptno=30                         |    13 |      2 |
    |       9 |  1561371158 |                0 | deptno=20                         |    14 |      2 |
    +---------+-------------+------------------+-----------------------------------+-------+--------+
    5 rows in set (0.00 sec)
    
    mysql> select * from  PARTITION_KEYS;
    +--------+--------------+-------------+-----------+-------------+
    | TBL_ID | PKEY_COMMENT | PKEY_NAME   | PKEY_TYPE | INTEGER_IDX |
    +--------+--------------+-------------+-----------+-------------+
    |      1 | NULL         | event_month | string    |           0 |
    |      1 | NULL         | step        | string    |           1 |
    |      2 | NULL         | deptno      | int       |           0 |
    +--------+--------------+-------------+-----------+-------------+
    3 rows in set (0.00 sec)
    
    mysql> select * from PARTITION_KEY_VALS;
    +---------+----------------------------+-------------+
    | PART_ID | PART_KEY_VAL               | INTEGER_IDX |
    +---------+----------------------------+-------------+
    |       1 | 201405                     |           0 |
    |       1 | 1                          |           1 |
    |       6 | 10                         |           0 |
    |       7 | __HIVE_DEFAULT_PARTITION__ |           0 |
    |       8 | 30                         |           0 |
    |       9 | 20                         |           0 |
    +---------+----------------------------+-------------+
    6 rows in set (0.00 sec)
    
    mysql> select * from DBS;
    +-------+-----------------------+-------------------------------------------------+---------+------------+------------+
    | DB_ID | DESC                  | DB_LOCATION_URI                                 | NAME    | OWNER_NAME | OWNER_TYPE |
    +-------+-----------------------+-------------------------------------------------+---------+------------+------------+
    |     1 | Default Hive database | file:/hive/warehouse                            | default | public     | ROLE       |
    |     6 |                       | hdfs://hadoop000:8020/hive/warehouse/g6_hive.db | g6_hive | NULL       | USER       |
    +-------+-----------------------+-------------------------------------------------+---------+------------+------------+
    2 rows in set (0.00 sec)
    
    mysql> select * from COLUMNS_V2;
    +-------+---------+--------------+-----------+-------------+
    | CD_ID | COMMENT | COLUMN_NAME  | TYPE_NAME | INTEGER_IDX |
    +-------+---------+--------------+-----------+-------------+
    |     1 | NULL    | event_time   | string    |           1 |
    |     1 | NULL    | order_number | string    |           0 |
    |     2 | NULL    | comm         | double    |           6 |
    |     2 | NULL    | empno        | int       |           0 |
    |     2 | NULL    | ename        | string    |           1 |
    |     2 | NULL    | hiredate     | string    |           4 |
    |     2 | NULL    | job          | string    |           2 |
    |     2 | NULL    | mgr          | int       |           3 |
    |     2 | NULL    | sal          | double    |           5 |
    |     6 | NULL    | comm         | double    |           6 |
    |     6 | NULL    | deptno       | int       |           7 |
    |     6 | NULL    | empno        | int       |           0 |
    |     6 | NULL    | ename        | string    |           1 |
    |     6 | NULL    | hiredate     | string    |           4 |
    |     6 | NULL    | job          | string    |           2 |
    |     6 | NULL    | mgr          | int       |           3 |
    |     6 | NULL    | sal          | double    |           5 |
    +-------+---------+--------------+-----------+-------------+
    17 rows in set (0.00 sec)
    
    mysql> select * from VERSION;
    +--------+----------------+-------------------------------------+
    | VER_ID | SCHEMA_VERSION | VERSION_COMMENT                     |
    +--------+----------------+-------------------------------------+
    |      1 | 1.1.0          | Set by MetaStore hadoop@192.168.0.3 |
    +--------+----------------+-------------------------------------+
    1 row in set (0.00 sec)
    
  • 相关阅读:
    程序性能优化之布局检测与优化(二)下篇
    程序性能优化之布局检测与优化(二)上篇
    程序性能优化之启动速度与执行效率优化(一)下篇
    程序性能优化之启动速度与执行效率优化(一)上篇
    【前端图表】echarts散点图鼠标划过散点显示信息
    【干货】前端开发者最常用的六款IDE
    全网最全的小程序学习资料和文章,看这篇就够了
    Jquery+Ajax+Bootstrap Paginator实现分页的拼接
    推荐一款在线工具-程序员的工具箱
    推荐一款稳定快速免费的前端开源项目 CDN 加速服务
  • 原文地址:https://www.cnblogs.com/suixingc/p/hive-jing-tai-fen-qu-he-dong-tai-fen-qu-dui-yingmy.html
Copyright © 2020-2023  润新知