• [Hive_10] Hive 的分析函数



    0. 说明

      Hive 的分析函数 窗口函数  | 排名函数 | 最大值 | 分层次 | lead && lag 统计活跃用户 | cume_dist


    1. 窗口函数(开窗函数) over

      1.1 说明

      1 preceding  //前一个
      1 following  //后一个
      current row  //当前行
      unbounded preceding  //无上限
      unbounded following  //无下限

      1.2 测试

    # 以行定义窗口界限
    select id, name, age , sum(age)over(order by id rows between current row and 2 following) from user_par;
    
    # 以值定义窗口界限
    select id, name, age , sum(age)over(order by age range between current row and 10 following) from user_par;

    2. 排名函数

      2.1 并列跳跃

      113
      rank

    select id, name, province, age , rank()over(partition by province order by age desc) from user_par;

      2.2 不跳跃

      112
      dense_rank

    select id, name, province, age , dense_rank()over(partition by province order by age desc) from user_par;

      2.3 顺序

      123
      row_number

    select id, name, province, age , row_number()over(partition by province order by age desc) from user_par;

    3. 最大值

      first_value()

    select id, name, province, age , first_value()over(partition by province order by age desc) from user_par;

    4. 分层次

      按照三六九等进行平均分层

      ntile()

    select id, name, age , ntile(3)over(order by age desc) from user_par;


    5. lead && lag

      5.1 lead()

      将列向上提

    select id, name, province, age , lead(age)over(partition by province order by age asc) from user_par;

      5.2 lag()

      将列向下沉

    select id, name, province, age , lag(age)over(partition by province order by age asc) from user_par;

      5.3 统计连续活跃

      1. 准备数据

      

      2. 建表

    create table active(id string, month int) 
    row format delimited
    fields terminated by '	';

      3. 加载数据

    load data local inpath '/home/centos/files/active.txt' into  table active;

      4. 统计连续两月活跃用户

    select id from (select id, month, lead(month)over(partition by id order by month desc) as month2 from active)a where month=month2+1;

    6. cume_dist()

      指定值占总数的百分比

      Demo

    select id,name,age, cume_dist()over(order by age desc) from user_nopar;

       


  • 相关阅读:
    Html.Partial和Html.RenderPartial, Html.Action和Html.RenderAction的区别
    cygwin下git出现cabundle.crt相关错误的解决办法
    Orchard CMS前台页面为什么没有Edit链接?
    Entity Framework练习题
    分析Autofac如何实现Controller的Ioc(Inversion of Control)
    在Winform,Silvelight,WPF等程序中访问Asp.net MVC web api
    适合.net程序员的.gitignore文件
    如何处理Entity Framework中的DbUpdateConcurrencyException异常
    Asp.net MVC中repository和service的区别
    smplayer中使用srt字幕乱码问题
  • 原文地址:https://www.cnblogs.com/share23/p/10298373.html
Copyright © 2020-2023  润新知