• HIVE 常见函数


    函数
    1.hive函数分类
        标准函数 自定义标准函数称之为 UDF   传递一行数据,返回一个结果
    
        聚合函数 自定义聚合函数称之为 UDAF  传递多行数据,返回一个结果 group by   sum count
    
        表生成函数 自定义表生成函数称之为 UDTF  传递一行数据,返回多行数据  explode
    
    
    2.函数帮助文档
        SHOW FUNCTIONS;
        显示当前环境所有的函数
    
        DESC FUNCTION length;
        显示帮助函数文档
    
        DESC FUNCTION EXTENDED length;
        显示函数额外信息
    
    
    3.数学函数
        round  求四舍五入
    
        rand  随机
    
        conv   进制转换函数
        conv(num, from_base, to_base)  num数据    from_base 指定的进制    to_base 转换进制
    
        cast  类型转换  select cast(id as STRING) from student;  把int类型转化成STRING
    
    
    4.日期函数
        from_unixtime   把时间戳转化成字符串
        select from_unixtime(1567995900, 'yyyyMMdd HHmmss')
    
        unix_timestamp  把字符串转化时间戳
        select unix_timestamp('2011-12-10 13:10:12');
    
    5.条件函数
        IF(expr1,expr2,expr3) !!!
        expr1 如果expr1为真 输出expr2   否则就输出expr3
        SELECT name, age, if(age > 20, 'A', 'B') as type FROM student;
    
    
        返回第一个非空数据 #
        SELECT coalesce(null, 12, 3,14,null);
    
    
        CASE  !!!
        给数据分组
    
        SELECT id, CASE   
                   WHEN id <=2 THEN 'a' 
                   WHEN id > 2 AND id <=10 THEN 'b' 
                   ELSE 'c'  
                   END  FROM student;
    
    
        SELECT mycol, sum(id)FROM (SELECT id, (CASE  WHEN id <=2 THEN 'a' WHEN id > 2 AND id <=10 THEN 'b' ELSE 'c'  END)  as mycol FROM student) t GROUP BY mycol
                   
    
        SELECT CASE id  WHEN 2 THEN 'a'   ELSE 'b' END from student;
    
    //////////////////////////////////////////////////////////////////////
    6.字符串处理函数
        字符串拼接
        concat    拼接任何类型,中间没有分隔符指定
        select id, concat(name, age) from student;
    
        concat_ws 只能拼接字符串类型,可以指定分隔符
        select concat_ws('_', 'asdfasd', '18') ;
    
    
        instr(string str, string substr)
        select  instr('candle', 'dle')
    
        length  字符串大小
        lower  小写
        lcase  大写
    
    
        正则表达式 #
        regexp_extract  通过正则表达式查找
        regexp_extract(str, regexp[, idx])
    
        参数
        str 需要处理的字符串
        regexp 正则表达式 主要用于匹配
        idx   索引
              0 整个字符串匹配
              1 匹配正则表达式中第一个括号
              2 匹配正则表达式第二个括号
    
         select regexp_extract('x=asdfasdf12345asdf', 'x=([a-z]+)([0-9]+)([a-z]+)', 3);
    
    
    
        regexp_replace  。。。。。。。替换
         select regexp_replace('x=asdfasdf12345asdf3456345', '([0-9]+)','xxxx' );
    
    
        URL #
        parse_url   专门用来解析URL数据
        http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere
    
        select parse_url('http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere',  'REF');
    
        [HOST 域名,PATH 路径,QUERY 查询,REF 锚点,PROTOCOL 协议,FILE,AUTHORITY IP地址+端口,USERINFO]
    
        输出结果
            hive (hadoop)> select parse_url('http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere',  'HOST');
            OK
            _c0
            www.candle.com
            Time taken: 0.07 seconds, Fetched: 1 row(s)
    
            hive (hadoop)> select parse_url('http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere',  'PATH');
            OK
            _c0
            /dir1/dir2/dir3/file.html
            Time taken: 0.093 seconds, Fetched: 1 row(s)
    
            hive (hadoop)> select parse_url('http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere',  'QUERY');
            OK
            _c0
            key1=value1&key2=value2
            Time taken: 0.051 seconds, Fetched: 1 row(s)
    
            hive (hadoop)> select parse_url('http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere',  'QUERY', 'key1');
            OK
            _c0
            value1
            Time taken: 0.105 seconds, Fetched: 1 row(s)
    
            hive (hadoop)> select parse_url('http://www.candle.com:50070/dir1/dir2/dir3/file.html?key1=value1&key2=value2#imhere',  'REF');
            OK
            _c0
            imhere
    
    
    
        JSON  #
    
        '{"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}'
    
        
        {
            "store":
                {
                    "fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],
                    "bicycle":{"price":19.95,"color":"red"}
                } ,
    
    
            "email":"amy@only_for_json_udf_test.net",
            "owner":"amy"
        }
    
    
        hive (hadoop)> select get_json_object('{"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}', '$.store.bicycle.price');
        OK
        _c0
        19.95
    
    
        hive (hadoop)> select get_json_object('{"store":{"fruit":[{"weight":8,"type":"apple"},{"weight":9,"type":"pear"}],"bicycle":{"price":19.95,"color":"red"} }, "email":"amy@only_for_json_udf_test.net","owner":"amy"}', '$.store.fruit[0].type');
        OK
        _c0
        apple
    
    
    
    
    7.宽表和长表转化  !!!
        explode 表生成函数
    
        hive (hadoop)> select explode(city) from student1;
    
        错误:
        hive (hadoop)> select id, name, explode(city) from student1;
        FAILED: SemanticException [Error 10081]: UDTF's are not supported outside the SELECT clause, nor nested in expressions
    
        数组
            hive 中 查询,表生成函数不能直接和普通列直接查询
            需要先生成一个临时视图 拼接起来
            select id, name, mycity from student1 lateral view  explode(city) tmp_view AS mycity;
    
            SELECT id, name, mycity FROM student1 LATERAL VIEW explode(city) 临时视图名称 AS 自定义列名
    
    
        键值对
            键值对 explode生成两列 key value
    
            > select id, name , mykey, myvalue from student3 LATERAL VIEW explode(pairs) tmp_view AS mykey, myvalue;
    
            OK
            id    name    mykey    myvalue
            1    candle    k1    v1
            1    candle    k2    v2
            2    jack    k1    v1
            2    jack    k2    v2
            3    tom    k1    v1
    
        结构体:
            多列
            hive (hadoop)> select explode(addr) from student2;
            FAILED: UDFArgumentException explode() takes an array or a map as a parameter
    
            注意:
            结构体成员不能直接转化成多列,explode只接受arry和map
            其实,把结构体每个成员转化成每一行 没有意义
    
            一般再处理结构体采用如下形式
            hive (hadoop)> select id,name, addr.city, addr.area, addr.streetid from student2;
            OK
            id    name    city    area    streetid
            1    candle    shanghai    minhang    35
            2    jack    beijing    haidian    100
            3    tom    hefei    shushan    103
    
    
    
            一般不采用如下格式
            hive (hadoop)> select id, name, info from student2 lateral view explode(array(addr.city, addr.area, addr.streetid)) tmp_view as info;
            OK
            id    name    info
            1    candle    shanghai
            1    candle    minhang
            1    candle    35
            2    jack    beijing
            2    jack    haidian
            2    jack    100
            3    tom    hefei
            3    tom    shushan
            3    tom    103
    
    
        长转宽  聚合函数 group by
          collect_list  多行数据 合并到一行 生成数组 list 允许成员重复
          select age, collect_list(name),  collect_list(id) from student group by age;
    
          age    _c1    _c2
          18    ["candle","candle"]    [1,1]
          19    ["aa","jack"]    [10,2]
          20    ["c2","tom"]    [1,4]
          100    ["cc"]    [12]
          200    ["dd"]    [13]
    
      
          collect_set  不允许重复
    
          select age, collect_set(name),  collect_set(id) from student group by age;
    
          age    _c1    _c2
            18    ["candle"]    [1]
            19    ["aa","jack"]    [10,2]
            20    ["c2","tom"]    [1,4]
            100    ["cc"]    [12]
            200    ["dd"]    [13]
    
    
            select map(id, name) from student; 生成键值对
            select struct(id, name, age) from student; 生成结构体
    
    
    8.窗口函数 !!!
        分组排序
    
        1) 创建表格
           CREATE TABLE windows
           (
           id INT,
           class STRING,
           score INT
           )
           ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
           LINES TERMINATED BY '
    '
           STORED AS TEXTFILE;
    
        2) row_number
            partition by  指定分组列
            order by   指定排序列
            SELECT class, score, row_number() over(partition by class order by score desc)  from windows;
    
    
            排名  不会因为score相同改变次序
            class    score    paiming
            class1    78    1
            class1    78    2
            class1    66    3
            class1    66    4
            class1    66    5
            class1    56    6
            class1    56    7
            class1    56    8
            class1    56    9
            class1    56    10
    
    
        3) rank
    
            排名  如果score重复,排序也重复,下一个名词会跳过重复的部分 
    
            SELECT class, score, rank() over(partition by class order by score desc)  from windows;
            class    score    _wcol0
            class1    78    1
            class1    78    1
            class1    66    3
            class1    66    3
            class1    66    3
            class1    56    6
            class1    56    6
            class1    56    6
            class1    56    6
            class1    56    6
    
    
        4) dense_rank
    
            如果score重复,排序也重复,下一个名词不会跳过重复的部分
            SELECT class, score, dense_rank() over(partition by class order by score desc)  from windows;
    
            class    score    _wcol0
            class1    78    1
            class1    78    1
            class1    66    2
            class1    66    2
            class1    66    2
            class1    56    3
            class1    56    3
            class1    56    3
            class1    56    3
            class1    56    3
    
        5) 分组 topn模型
        不能直接筛选
    
        hive (hadoop)> SELECT class, score, row_number() over(partition by class order by score desc) as paiming  from windows WHERE paiming <= 5;
    
    
        FAILED: SemanticException [Error 10004]: Line 1:110 Invalid table alias or column reference 'paiming': (possible column names are: id, class, score)
    
    
        使用子查询
        SELECT class, score from 
            (SELECT class, score, row_number() over(partition by class order by score desc) as paiming  from windows)  t WHERE paiming <=5;
    
    
    
    
    
        
  • 相关阅读:
    毕业面试心程
    hash_map的简洁实现
    缓冲区溢出攻击
    统计一下你写过多少代码
    SecureCRT自动断开
    误操作yum导致error: rpmdb
    Foxmail7.2新建的文件夹不见了
    pycurl安装
    一起用ipython
    vi/vim多行注释和取消注释
  • 原文地址:https://www.cnblogs.com/alpha-cat/p/13184139.html
Copyright © 2020-2023  润新知