• hive学习笔记之-数据类型


    数据类型

    Hive基本的数据类型:

     

     

    Hive集合数据类型:

    另外还有一个复合数据类型,可以综合上面的数据类型组合到一起。

    ·          union: UNIONTYPE<data_type, data_type, ...>

     

     

    类型说明

    时间戳

     

    支持传统的unix时间戳,可选的纳秒级精度。

     

    支持的转换:

     

      l        整型数值类型:解读为以秒为单位的UNIX时间戳

      l        浮动点数值类型:解读为以秒和小数精度为单位的UNIX时间戳。

      l        字符串:JDBC兼容的java.sql.Timestamp格式“YYYY-MM-DD HH:MM:SS.fffffffff”(9位小数位精度)

     

    时间戳被解释是与timezone无关,存储为从UNIX纪元的偏移量。提供便利的UDF和时区转换(to_utc_timestamp,from_utc_timestamp)。

    所有现有datetime的UDF(月,日,年,小时,等)可以工作于TIMESTAMP数据类型。

    限定符

    Hive默认的限定符:

    下面两个建表语句是一样的。

    隐式的限定符语句:

    CREATE TABLEemployees (

    name  STRING,

    salary  FLOAT,

    subordinatesARRAY<STRING>,

    deductions  MAP<STRING, FLOAT>,

    address  STRUCT<street:STRING, city:STRING,state:STRING, zip:INT>);

     

    显式的限定符语句:

    CREATE TABLEemployees (

    name  STRING,

    salary  FLOAT,

    subordinatesARRAY<STRING>,

    deductions  MAP<STRING, FLOAT>,

    address  STRUCT<street:STRING, city:STRING,state:STRING, zip:INT>

    )

    ROW FORMATDELIMITED

    FIELDSTERMINATED BY '01'

    COLLECTION ITEMSTERMINATED BY '02'

    MAP KEYSTERMINATED BY '03'

    LINES TERMINATEDBY ' '

    STORED ASTEXTFILE;

     

    要导入的文件格式

    John  Doe^A100000.0^AMary  Smith^BTodd Jones^AFederal  Taxes^C.2^BStateTaxes^C.05^BInsurance^C.1^A1 Michigan Ave.^BChicago^BIL^B60600

    Mary  Smith^A80000.0^ABill  King^AFederal Taxes^C.2^BState Taxes^C.05^BInsurance^C.1^A100 Ontario St.^BChicago^BIL^B60601

    Todd Jones^A70000.0^AFederalTaxes^C.15^BState Taxes^C.03^BInsurance^C.1^A200 Chicago Ave.^BOak Park^BIL^B60700

    Bill  King^A60000.0^AFederal  Taxes^C.15^BState  Taxes^C.03^BInsurance^C.1^A300 Obscure Dr.^BObscuria^BIL^B60100

     

    下面是导入首行记录的格式样本结构:

    {

    "name":  "John Doe",

    "salary": 100000.0,

    "subordinates": ["MarySmith", "Todd Jones"],

    "deductions": {

    "Federal Taxes": .2,

    "State Taxes":  .05,

    "Insurance":  .1

    },

    "address":{

    "street": "1 Michigan Ave.",

    "city":  "Chicago",

    "state":  "IL",

    "zip":  60600

    }

    }

     

    当然我们可以自己可以自定列值的限定符,如下:

    CREATE TABLEemployees (

    name STRING,

    salary FLOAT,

    subordinates ARRAY<STRING>,

    deductions MAP<STRING, FLOAT>,

    address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT>

    )

    ROW FORMATDELIMITED

    FIELDSTERMINATED BY ','

    COLLECTION ITEMSTERMINATED BY '|'

    MAP KEYSTERMINATED BY ':';

     

    注意:

    1.      由于field, collection, and key-value的分隔默认就是TEXTFILE格式,所以上面可以省略掉STORED AS TEXTFILE子句。

    2.      由于目前hive支持的行分隔符只有/n(换行符),所以LINES TERMINATED BY ' '子句也可以去掉。

    3.      关于怎么制作Hive默认分隔符的数据文件见:http://www.myexception.cn/software-architecture-design/1351552.html

     

    按表的定义文件的格式:

    John Doe,100000.0,MarySmith|Todd Jones,Federal Taxes:.2|State Taxes:.05|Insurance:.1,1 MichiganAve.|Chicago|IL|60600

    MarySmith,80000.0,Bill King,Federal Taxes:.2|State Taxes:.05|Insurance:.1,100Ontario St.|Chicago|IL|60601

    ToddJones,70000.0,,Federal Taxes:.15|State Taxes:.03|Insurance:.1,200 ChicagoAve.|Oak Park|IL|60700

    BillKing,60000.0,,Federal Taxes:.15|State Taxes:.03|Insurance:.1,300 ObscureDr.|Obscuria|IL|60100

     

    导入数据:

    load data localinpath '/app/hadoop/data/employees2' overwrite into table employees;

     

    查看数据:

    hive(default)> select * from employees2;

    OK

    John Doe        100000.0        ["Mary Smith","ToddJones"]     {"FederalTaxes":0.2,"State Taxes":0.05,"Insurance":0.1}        {"street":"1 Michigan Ave.","city":"Chicago","state":"IL","zip":60600}

    Mary Smith      80000.0 ["Bill King"]   {"Federal Taxes":0.2,"StateTaxes":0.05,"Insurance":0.1}        {"street":"100 Ontario St.","city":"Chicago","state":"IL","zip":60601}

    Todd Jones      70000.0 []      {"FederalTaxes":0.15,"State Taxes":0.03,"Insurance":0.1}       {"street":"200 Chicago Ave.","city":"Oak Park","state":"IL","zip":60700}

    Bill King       60000.0 []      {"FederalTaxes":0.15,"State Taxes":0.03,"Insurance":0.1}       {"street":"300 Obscure Dr.","city":"Obscuria","state":"IL","zip":60100}

    Time taken:0.085 seconds, Fetched: 4 row(s)

     

     

    参考:

    1.Hive编程指南

    2.https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Types


  • 相关阅读:
    cf 1155 d 最大区间和(变形 区间*x)
    俄罗斯方块的形状暴力
    cf 1160 E dp 组合数 思维
    cf 1110d dp(题目特殊性质)
    cf 1114d 区间dp 0,1标记左右
    poj 1426 bfs
    poj 1679 最小生成树是否唯一
    cf 1106e dp
    【PAT顶级】1002 Business (35分)(0/1背包,DP)
    【PAT顶级】1001 Battle Over Cities
  • 原文地址:https://www.cnblogs.com/charlist/p/7122137.html
Copyright © 2020-2023  润新知