• Hive:动静态分区


     http://hugh-wangp.iteye.com/blog/1612268

    http://blog.csdn.net/opensure/article/details/46537969

    • 使用静态分区,创建分区表p_test:

    在创建分区表之前需要去人参数是否如下:

    0: jdbc:hive2://10.8.1.2:6611/> set hive.exec.dynamic.partition.mode;
    +------------------------------------------+--+
    |                   set                    |
    +------------------------------------------+--+
    | hive.exec.dynamic.partition.mode=strict  |
    +------------------------------------------+--+
    1 row selected (0.013 seconds)
    0: jdbc:hive2://10.8.1.2:6611/> set hive.exec.dynamic.partition;
    +-----------------------------------+--+
    |                set                |
    +-----------------------------------+--+
    | hive.exec.dynamic.partition=true  |
    +-----------------------------------+--+

    如果不是需要修改参数,创建分区表

    create table p_test(id string,name string)
    partitioned by (p_day string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

    添加静态分区

    alter table p_test add partition (p_day='20170619');

    查看分区表是否创建好,及存储目录

    0: jdbc:hive2://10.78.152.62:21066/> show partitions p_test;
    +-----------------+--+
    |    partition    |
    +-----------------+--+
    | p_day=20170619  |
    +-----------------+--+
    1 row selected (0.143 seconds)
    0: jdbc:hive2://10.78.152.62:21066/> 
    
    [tt@jh01 service]$ hadoop fs -ls /jrc/jrc_hive_db/p_test
    17/06/20 21:27:44 INFO hdfs.PeerCache: SocketCache disabled.
    Found 1 items
    drwxrwx---+  - jrc jrc_group          0 2017-06-20 21:24 /jrc/jrc_hive_db/p_test/p_day=20170619

    指定分区插入已经存在的分区中数据:

    insert into p_test partition(p_day='20170619')
    select '1' as id,'name1' as name;

    指定不存在的分区,并插入数据:

    0: jdbc:hive2://10.8.1.2:6611/> insert into p_test partition(p_day='20170620')
    0: jdbc:hive2://10.8.1.2:6611/> select '1' as id,'name1' as name;
    0: jdbc:hive2://10.8.1.2:6611/> select * from p_test;
    +------------+--------------+---------------+--+
    | p_test.id  | p_test.name  | p_test.p_day  |
    +------------+--------------+---------------+--+
    | 1          | name1        | 20170619      |
    | 1          | name1        | 20170620      |
    +------------+--------------+---------------+--+
    2 rows selected (1.8 seconds)
    0: jdbc:hive2://10.8.1.2:6611/> show partitions p_test;
    +-----------------+--+
    |    partition    |
    +-----------------+--+
    | p_day=20170619  |
    | p_day=20170620  |
    +-----------------+--+
    2 rows selected (0.281 seconds)
    0: jdbc:hive2://10.8.1.2:6611/> 

    动态分区插入:

    0: jdbc:hive2://10.8.1.2:6611/> insert into p_test partition(p_day)
    0: jdbc:hive2://10.8.1.2:6611/> select '3' as id,'name3' as name,'20170621' as p_day;
    Error: Error while compiling statement: FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column. 
    To turn this off set hive.exec.dynamic.partition.mode=nonstrict (state=42000,code=10096)

    插入失败,根据错误提示需要修改参数:set hive.exec.dynamic.partition.mode=nonstrict

    set hive.exec.dynamic.partition.mode=nonstrict;
    insert into p_test partition(p_day)
    select '3' as id,'name3' as name,'20170621' as p_day;
    
    0: jdbc:hive2://10.8.1.2:6611/> select * from p_test;
    +------------+--------------+---------------+--+
    | p_test.id  | p_test.name  | p_test.p_day  |
    +------------+--------------+---------------+--+
    | 1          | name1        | 20170619      |
    | 1          | name1        | 20170620      |
    | 3          | name3        | 20170621      |
    +------------+--------------+---------------+--+

     示例二:

    已知表mitem表包含了p_ciy,p_hour字段,我们新建一张分区表mitem_partition(分区字段包括:p_city,p_hour),将mitem数据导入mitem_partition.

    create table mitem_partition 
    (
     id int,
     ueid string,
     time timestamp,
     srp double,
     srq double,
     sul double,
     nid int,
     nrp double
    ) partitioned by (p_city string,p_hour string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
    
    set hive.exec.dynamic.partition.mode=nonstrict;
    insert into mitem_partition partition(p_city,p_hour) select (cast(split(s_cgi,'-')[2] as int)*256+cast(split(s_cgi,'-')[3] as int)) as id,ueid,time,srp-141 as srp,srq*0.5-20 as srq, sul-11 as sul,(cast(split(n_cgi,'-')[2] as int)*256+cast(split(n_cgi,'-')[3] as int)) as nid,nrp-141 as nrp, p_city,p_hour from mitem where p_city='wuhan' and p_hour>='2017101200' and p_hour<'2017101300'

     备注:

    1)如果分区表中想删除某个字段就比较费事。

  • 相关阅读:
    [转]lftp的致命错误:证书验证:不信任
    github每次push都需要密码以及用户名的解决办法
    Fedora最小化安装后没有ifconfig命令
    [转载]MySql常用命令总结
    chrome浏览器强制采用https加密链接
    红帽系列linux自行配置本地yum源
    linux 下dd命令直接清除分区表(不用再fdisk一个一个的删除啦)
    linux分区工具fdisk的使用
    Java多线程实现......(1,继承Thread类)
    第一篇文章--我为什么要写博客?
  • 原文地址:https://www.cnblogs.com/yy3b2007com/p/6603159.html
Copyright © 2020-2023  润新知