• Hive静态分区表&动态分区表


    静态分区表:

    一级分区表:

    CREATE TABLE order_created_partition (
        orderNumber STRING
      , event_time  STRING
    )
    PARTITIONED BY (event_month string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '	';

    加载数据方式一:从本地/HDFS目录加载

    load data local inpath '/home/spark/software/data/order_created.txt' overwrite into table order_created_partition PARTITION(event_month='2014-05');
    select * from order_created_partition where event_month='2014-05';
    +-----------------+-----------------------------+--------------+
    |   ordernumber   |         event_time          | event_month  |
    +-----------------+-----------------------------+--------------+
    | 10703007267488  | 2014-05-01 06:01:12.334+01  | 2014-05      |
    | 10101043505096  | 2014-05-01 07:28:12.342+01  | 2014-05      |
    | 10103043509747  | 2014-05-01 07:50:12.33+01   | 2014-05      |
    | 10103043501575  | 2014-05-01 09:27:12.33+01   | 2014-05      |
    | 10104043514061  | 2014-05-01 09:03:12.324+01  | 2014-05      |
    +-----------------+-----------------------------+--------------+

    加载数据方式二:手工上传文件到hdfs上,然后将数据添加到分区表指定的分区:

    1) 创建hdfs目录:在hdfs目录:/user/hive/warehouse/order_created_partition目录下创建event_month=2014-06

    hadoop fs -mkdir /user/hive/warehouse/order_created_partition/event_month=2014-06

    2)拷贝数据到新创建的目录下:

    hadoop fs -put /home/spark/software/data/order_created.txt /user/hive/warehouse/order_created_partition/event_month=2014-06

    select * from order_created_partition where event_month='2014-06'; #发现查询结果是空的

    3)添加新分区数据到元数据信息中:

    msck repair table order_created_partition;

    输出日志信息:

    Partitions not in metastore: order_created_partition:event_month=2014-06
    Repair: Added partition to metastore order_created_partition:event_month=2014-06

    或者: alter table order_created_partition add partition(dt='2014-06');

    select * from order_created_partition where event_month='2014-06'; 
    +-----------------+-----------------------------+--------------+
    |   ordernumber   |         event_time          | event_month  |
    +-----------------+-----------------------------+--------------+
    | 10703007267488  | 2014-05-01 06:01:12.334+01  | 2014-06      |
    | 10101043505096  | 2014-05-01 07:28:12.342+01  | 2014-06      |
    | 10103043509747  | 2014-05-01 07:50:12.33+01   | 2014-06      |
    | 10103043501575  | 2014-05-01 09:27:12.33+01   | 2014-06      |
    | 10104043514061  | 2014-05-01 09:03:12.324+01  | 2014-06      |
    +-----------------+-----------------------------+--------------+

    加载数据方式三:select查询方式insert/overwrite

    CREATE TABLE order_created_4_partition (
        orderNumber STRING
      , event_time  STRING
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '	';
    load data local inpath '/home/spark/software/data/order_created.txt' overwrite into table order_created_4_partition;
    
    insert into table order_created_partition partition(event_month='2014-07') select * from order_created_4_partition;
    insert overwrite table order_created_partition partition(event_month='2014-07') select * from order_created_4_partition;

    对比:

    insert overwrite table order_created_partition partition(event_month='2014-07') select ordernumber,event_time from order_created_4_partition;
    insert overwrite table order_created_partition partition(event_month='2014-07') select event_time,ordernumber from order_created_4_partition;

    发现字段值错位,在使用时一定要注意:字段值顺序要与表中字段顺序一致,名称可以不一致;

    查看分区表已有的所有分区:

    show partitions order_created_partition;

    查看分区表已有的指定分区:

    SHOW PARTITIONS order_created_partition PARTITION(event_month='2014-06');

    查看表字段信息:

    desc order_created_partition;
    desc extended order_created_partition;
    desc formatted order_created_partition;
    desc formatted order_created_partition partition(event_month='2014-05');

    二级分区表:

    CREATE TABLE order_created_partition2 (
        orderNumber STRING
      , event_time  STRING
    )
    PARTITIONED BY (event_month string, step string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '	';
    show partitions order_created_partition2;

    显示结果空

    load data local inpath '/home/spark/software/data/order_created.txt' into table order_created_partition2 partition(event_month='2014-09',step='1');  
    show partitions order_created_partition2;
    +-----------------------------+
    |           result            |
    +-----------------------------+
    | event_month=2014-09/step=1  |
    +-----------------------------+
    insert overwrite table order_created_partition2 partition(event_month='2014-09',step='2') select * from order_created_4_partition;
    show partitions order_created_partition2;
    +-----------------------------+
    |           result            |
    +-----------------------------+
    | event_month=2014-09/step=1  |
    | event_month=2014-09/step=2  |
    +-----------------------------+

    动态分区表

    CREATE TABLE order_created_dynamic_partition (
        orderNumber STRING
      , event_time  STRING
    )
    PARTITIONED BY (event_month string)
    ;
    insert into table order_created_dynamic_partition PARTITION (event_month)
    select orderNumber, event_time, substr(event_time, 1, 7) as event_month from order_created;

    报错:

    FAILED: SemanticException [Error 10096]: Dynamic partition strict mode requires at least one static partition column.
    To turn this off set hive.exec.dynamic.partition.mode=nonstrict

    解决方案:

    set hive.exec.dynamic.partition.mode=nonstrict;

    重新执行:

    insert into table order_created_dynamic_partition PARTITION (event_month)
    select orderNumber, event_time, substr(event_time, 1, 7) as event_month from order_created;
    select * from order_created_dynamic_partition;
    +-----------------+-----------------------------+--------------+
    |   ordernumber   |         event_time          | event_month  |
    +-----------------+-----------------------------+--------------+
    | 10703007267488  | 2014-05-01 06:01:12.334+01  | 2014-05      |
    | 10101043505096  | 2014-05-01 07:28:12.342+01  | 2014-05      |
    | 10103043509747  | 2014-05-01 07:50:12.33+01   | 2014-05      |
    | 10103043501575  | 2014-05-01 09:27:12.33+01   | 2014-05      |
    | 10104043514061  | 2014-05-01 09:03:12.324+01  | 2014-05      |
    +-----------------+-----------------------------+--------------+
  • 相关阅读:
    相对路径与绝对路径问题
    javaee自定义servlet的步骤
    Struts1.X与Spring集成——另外一种方案
    菜鸟也能学cocos2dx3.0 浅析刀塔传奇(下)
    JAVA之了解类载入器Classloader
    IOS 编程中引用第三方的方类库的方法及常见问题
    通过eclipse的egit插件提交提示Auth fail
    定时器0的方式1 定时器1的方式1 数码管和led
    MongoDB入门学习(四):MongoDB的索引
    J2EE--JDBC
  • 原文地址:https://www.cnblogs.com/luogankun/p/4111145.html
Copyright © 2020-2023  润新知