• Hive作业


      最近学习Hive。实现了其中几个作业,如下。Hql语句没怎么调整格式,见谅。

    Hive练习二中的题目

    (1). 2017 年4 月1 日各个商品品牌的交易笔数,按照销售交易从多到少排序

    1 select 
    2 brand,
    3 count(*) as totalCount 
    4 from 
    5 record 
    6 join brand_dimension on record.bid = brand_dimension.bid 
    7 where record.trancation_date= '2017-04-01' 
    8 group by brand_dimension.brand
    9 order by totalCount desc;

    输出

    +------------+-------------+--+
    |   brand    | totalcount  |
    +------------+-------------+--+
    | SAMSUNG    | 2           |
    | WULIANGYE  | 1           |
    | PUMA       | 1           |
    | OPPO       | 1           |
    | DELL       | 1           |
    +------------+-------------+--+

    (2). 不同性别消费的商品类别情况(不同性别消费不同商品类别的总价)

     1 select 
     2 gender,
     3 category, 
     4 sum(price) as totalPrice
     5 from record 
     6 join user_dimension on record.uid = user_dimension.uid
     7 join brand_dimension on record.bid = brand_dimension.bid
     8 group by gender, 
     9 category
    10 order by 
    11 gender,
    12 category,
    13 totalPrice;

    输出

    +---------+------------+-------------+--+
    | gender  |  category  | totalprice  |
    +---------+------------+-------------+--+
    | M       | computer   | 252         |
    | M       | food       | 429         |
    | M       | sports     | 120         |
    | M       | telephone  | 1669        |
    +---------+------------+-------------+--+

    Hive练习三中的题目

    (1). 谁不是经理

    1 select 
    2 name 
    3 from 
    4 employees 
    5 where 
    6 size(subordinates)<=0;

    输出

    +-------------------+---------------+--+
    |       name        | subordinates  |
    +-------------------+---------------+--+
    | Todd Jones        | []            |
    | Bill King         | []            |
    | Stacy Accountant  | []            |
    +-------------------+---------------+--+
    3 rows selected (0.092 seconds)

    (2). 谁住在邮编比60500 大的地区

    1 select name,address.zip from employees where address.zip> 60500;

    输出

    +-------------------+--------+--+
    |       name        |  zip   |
    +-------------------+--------+--+
    | John Doe          | 60600  |
    | Mary Smith        | 60601  |
    | Todd Jones        | 60700  |
    | Stacy Accountant  | 60563  |
    +-------------------+--------+--+
    4 rows selected (0.123 seconds)

    (3). 联邦税超过0.15 的雇员

    1 select 
    2 name, 
    3 deductions['Federal Taxes'] 
    4 from 
    5 employees 
    6 where deductions['Federal Taxes'] > 0.15+1e-5;

    输出

    +---------------+---------------+--+
    |     name      | federaltaxes  |
    +---------------+---------------+--+
    | John Doe      | 0.2           |
    | Mary Smith    | 0.2           |
    | Boss Man      | 0.3           |
    | Fred Finance  | 0.3           |
    +---------------+---------------+--+
    4 rows selected (0.126 seconds)

    (4). 谁住在Drive 或Par 街道

    1 select 
    2 name,
    3 address 
    4 from 
    5 employees 
    6 where address.street rlike '^.*(Drive|Par).*$';

    输出

    ---------------+------------------------------------------------------------------------------+--+
    |     name      |                                   address                                    |
    +---------------+------------------------------------------------------------------------------+--+
    | Boss Man      | {"street":"1 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500}  |
    | Fred Finance  | {"street":"2 Pretentious Drive.","city":"Chicago","state":"IL","zip":60500}  |
    +---------------+------------------------------------------------------------------------------+--+
    2 rows selected (0.218 seconds)

    Hive练习十中的题目

    1. 建表

    创建并 使用 database tmall

    1 create database tmall;
    2 use tmall;

    建表 product,格式 text

     1 create table if not exists product(
     2 item_id string,
     3 pic_url string,
     4 category string,
     5 brand_id string,
     6 seller_id string
     7 )
     8 row format delimited
     9 fields terminated by ','
    10 lines terminated by '
    '
    11 stored as textfile;

    建表 review,格式 text

     1 create table if not exists review(
     2 item_id string,
     3 user_id string,
     4 feedback int,
     5 feedback_time timestamp,
     6 feedback_pic_url string
     7 )
     8 row format delimited
     9 fields terminated by ','
    10 lines terminated by '
    '
    11 stored as textfile;

    建表 log_orc,格式 orc

     1 # log as textfile
     2 create table if not exists log(
     3 item_id string,
     4 user_id string,
     5 action string,
     6 action_time timestamp
     7 )
     8 row format delimited
     9 fields terminated by ','
    10 lines terminated by '
    '
    11 stored as textfile;
    12 
    13 # log as orc
    14 create table if not exists log_orc
    15 like log
    16 stored as orc;

    2. 载入数据

     1 #product
     2 load data local inpath '/home/zkpk/test/tmall/tmall_product.csv' overwrite into table product;
     3 #验证:select * from product limit 100;
     4 #review
     5 load data local inpath '/home/zkpk/test/tmall/tmall_review.csv' overwrite into table review;
     6 #验证:select * from review limit 100;
     7 #log
     8 load data local inpath '/home/zkpk/test/tmall/tmall_log.csv' overwrite into table log;
     9 #验证:select * from log limit 100;
    10 insert into table log_orc select * from log;

    3. 载入数据

    • 热度前十名的商家(商品被浏览的最多)
      1 select 
      2 p.seller_id, 
      3 count(*) as click_count 
      4 from log_orc l 
      5 inner join product p on l.item_id = p.item_id 
      6 where l.action='click' 
      7 group by p.seller_id 
      8 order by click_count desc 
      9 limit 10;

      输出

      +--------------+--------------+--+
      | p.seller_id | click_count |
      +--------------+--------------+--+
      | s403 | 143 |
      | s190 | 116 |
      | s284 | 86 |
      | s161 | 78 |
      | s227 | 59 |
      | s61 | 59 |
      | s29 | 57 |
      | s82 | 45 |
      | s464 | 42 |
      | s261 | 42 |
      +--------------+--------------+--+
    • 好评率(feedback=5为好评)低于60%的商品
      1 select 
      2 item_id, 
      3 count(case when feedback =5 then 1 else null end) as best, 
      4 count(1) as total, 
      5 count(case when feedback =5 then 1 else null end)/ count(1) as best_rate 
      6 from review 
      7 group by item_id 
      8 having count(case when feedback =5 then 1 else null end) < count(1) *0.6 ;

      输出

      +----------+-------+--------+----------------------+--+
      | item_id  | best  | total  |      best_rate       |
      +----------+-------+--------+----------------------+--+
      | 221      | 0     | 1      | 0.0                  |
      | 256      | 0     | 4      | 0.0                  |
      | 287      | 1     | 3      | 0.3333333333333333   |
      | 288      | 2     | 6      | 0.3333333333333333   |
      | 378      | 0     | 1      | 0.0                  |
      | 379      | 1     | 6      | 0.16666666666666666  |
      | 397      | 0     | 2      | 0.0                  |
      | 398      | 0     | 1      | 0.0                  |
      | 423      | 2     | 7      | 0.2857142857142857   |
      | 449      | 1     | 5      | 0.2                  |
      | 45       | 0     | 6      | 0.0                  |
      | 450      | 0     | 1      | 0.0                  |
      | 451      | 0     | 3      | 0.0                  |
      | 453      | 1     | 2      | 0.5                  |
      | 505      | 1     | 2      | 0.5                  |
      +----------+-------+--------+----------------------+--+
    • 找出潜在购买用户(收藏了商品,但是没有购买)
       1 select distinct 
       2 m.item_id, 
       3 m.user_id 
       4 from log_orc m 
       5 left outer join log_orc s 
       6 on (m.item_id = s.item_id 
       7 and m.user_id = s.user_id 
       8 and s.action='alipay') 
       9 where m.action='collect' 
      10 and s.item_id is null;

      输出

      +------------+------------+--+
      | m.item_id  | m.user_id  |
      +------------+------------+--+
      | 152        | 3286       |
      | 24         | 3389       |
      | 242        | 39         |
      | 422        | 3423       |
      | 468        | 2727       |
      +------------+------------+--+
    技术服务业务
  • 相关阅读:
    webservice的cxf的客户端
    webservice用cxf发布SOAP
    webservice声明发布SOAP1.2
    webservice使用注解修改WSDL内容
    webservice获取天气信息
    结巴分词原理介绍
    Pytorch学习记录-torchtext和Pytorch的实例( 使用神经网络训练Seq2Seq代码)
    【Pandas】Pandas求某列字符串的长度,总结经验教训
    UTF-8与UTF-8 BOM
    logging.basicConfig函数
  • 原文地址:https://www.cnblogs.com/cauwt/p/homework_hive.html
Copyright © 2020-2023  润新知