• MySQL 与 Hive 逻辑相关


    MYSQL:

    1. 每一个逗号, 相当于一个中间表, 而且hive的groupby 更为严格 ;

    -> 计算占比
    SELECT branch_company, ROUND(SUM(pass_flag) / (SELECT SUM(pass_flag) FROM wr_test),2) rate FROM wr_test GROUP BY branch_company

    -> 计算排名
    SELECT FIELD, t1.field2, (SELECT DISTINCT(COUNT(field2)) FROM `test` t2 WHERE t2.field2 >= t1.field2 ) rank FROM `test` t1


    Hive:

    -> 不能在逗号后面直接使用() 而必须使用left join 出总值:
    select round( t.s / t2.total , 3) , "" order_city_name, "" dealer_shortname , t.* from
    (
    select branch_company, time_date, time_unit, sum(pass_flag) s from bi_middle.wr_middle_1
    group by branch_company, time_date, time_unit
    ) t
    left join
    (select sum(pass_flag) total from bi_middle.wr_middle_1) t2


    -> 使用grouping sets 代替多个union;

    select t1.*,t1.pass_flag/t2.total as ratio from (
    select nvl(branch_company,'全国') branch_company,order_city_name,dealer_shortname,a.time_date,a.time_unit,sum(pass_flag) as pass_flag from bi_middle.wr_middle_1 a
    group by branch_company,order_city_name,dealer_shortname,a.time_date,a.time_unit
    grouping sets((a.time_date,a.time_unit),(branch_company,a.time_date,a.time_unit),(branch_company,order_city_name,a.time_date,a.time_unit),(branch_company,order_city_name,dealer_shortname,a.time_date,a.time_unit))
    ) t1
    left join (
    select sum(pass_flag) as total from bi_middle.wr_middle_1
    ) t2

    # 注意事项:

    -> Spark-SQL使用JDBC读取表, 默认使用一个executor去读, 如果转成并发, 需要设置分区索引去读;

    1. 创建parquet存储表

    CREATE TABLE IF NOT EXISTS bi_orginal.`param_code_dcs` (
    `CODE_ID` string COMMENT 'CODE ID',
    `TYPE` string COMMENT 'CODE类型',
    `TYPE_NAME` string COMMENT 'CODE类型中文说明',
    `CODE_DESC` string COMMENT 'CODE ID中文说明',
    `NUM` string ,
    `update_time` string
    )
    comment 'param_code_dcs'
    STORED AS parquet

    2.  向表中插入数据


    insert overwrite table bi_orginal.`param_code_dcs`
    SELECT CODE_ID, TYPE, TYPE_NAME, CODE_DESC, NUM, from_unixtime(unix_timestamp(),'yyyy-MM-dd HH:mm:ss') FROM artemisdcs.d_f_artemisdcs_tc_code_dcs

    3.  Hive中的union

    SELECT * from (
    SELECT sso_id from bi_orginal.som_so_user
    UNION ALL
    SELECT sso_id from bi_orginal.som_so_user
    )t LIMIT 5

    4. Hive删表:

    truncate table table_name         先删除表中数据

    drop table table_name               后删除HDFS上数据表

  • 相关阅读:
    java 之 对象与垃圾回收
    Java 之 内部类
    java 接口(interface)
    抽象类
    Java之fianl修饰符
    类的继承
    上传图片
    一般处理程序 给图片添加文字水印
    socket
    初识linq,lambda表达式
  • 原文地址:https://www.cnblogs.com/ruili07/p/10825767.html
Copyright © 2020-2023  润新知