• MySQL 与 Hive 逻辑相关


    MYSQL:

    1. 每一个逗号, 相当于一个中间表, 而且hive的groupby 更为严格 ;

    -> 计算占比
    SELECT branch_company, ROUND(SUM(pass_flag) / (SELECT SUM(pass_flag) FROM wr_test),2) rate FROM wr_test GROUP BY branch_company

    -> 计算排名
    SELECT FIELD, t1.field2, (SELECT DISTINCT(COUNT(field2)) FROM `test` t2 WHERE t2.field2 >= t1.field2 ) rank FROM `test` t1


    Hive:

    -> 不能在逗号后面直接使用() 而必须使用left join 出总值:
    select round( t.s / t2.total , 3) , "" order_city_name, "" dealer_shortname , t.* from
    (
    select branch_company, time_date, time_unit, sum(pass_flag) s from bi_middle.wr_middle_1
    group by branch_company, time_date, time_unit
    ) t
    left join
    (select sum(pass_flag) total from bi_middle.wr_middle_1) t2


    -> 使用grouping sets 代替多个union;

    select t1.*,t1.pass_flag/t2.total as ratio from (
    select nvl(branch_company,'全国') branch_company,order_city_name,dealer_shortname,a.time_date,a.time_unit,sum(pass_flag) as pass_flag from bi_middle.wr_middle_1 a
    group by branch_company,order_city_name,dealer_shortname,a.time_date,a.time_unit
    grouping sets((a.time_date,a.time_unit),(branch_company,a.time_date,a.time_unit),(branch_company,order_city_name,a.time_date,a.time_unit),(branch_company,order_city_name,dealer_shortname,a.time_date,a.time_unit))
    ) t1
    left join (
    select sum(pass_flag) as total from bi_middle.wr_middle_1
    ) t2

    # 注意事项:

    -> Spark-SQL使用JDBC读取表, 默认使用一个executor去读, 如果转成并发, 需要设置分区索引去读;

    1. 创建parquet存储表

    CREATE TABLE IF NOT EXISTS bi_orginal.`param_code_dcs` (
    `CODE_ID` string COMMENT 'CODE ID',
    `TYPE` string COMMENT 'CODE类型',
    `TYPE_NAME` string COMMENT 'CODE类型中文说明',
    `CODE_DESC` string COMMENT 'CODE ID中文说明',
    `NUM` string ,
    `update_time` string
    )
    comment 'param_code_dcs'
    STORED AS parquet

    2.  向表中插入数据


    insert overwrite table bi_orginal.`param_code_dcs`
    SELECT CODE_ID, TYPE, TYPE_NAME, CODE_DESC, NUM, from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM artemisdcs.d_f_artemisdcs_tc_code_dcs

    3.  Hive中的union

    SELECT * from (
    SELECT sso_id from bi_orginal.som_so_user
    UNION ALL
    SELECT sso_id from bi_orginal.som_so_user
    )t LIMIT 5

    4. Hive删表:

    truncate table table_name         先删除表中数据

    drop table table_name               后删除HDFS上数据表

  • 相关阅读:
    .NET 环境中使用RabbitMQ
    WPF窗口模板——Style样式
    C#获取当前日期时间
    C#解析JSON字符串总结
    c#简单加密和对称加密
    List<T>转换为二维数组
    java后台导出pdf
    C# 创建 读取 更新 XML文件
    python 第三天
    编写登录接口
  • 原文地址:https://www.cnblogs.com/ruili07/p/10825767.html
Copyright © 2020-2023  润新知