• yhd日志分析(二)


    yhd日志分析(二)

    继续yhd日志分析,统计数据

    日期uvpv登录人数游客人数平均访问时长二跳率独立ip数

    1 分析

    登录人数

    count(distinct endUserId)

    游客人数

    count(distinct guid) - count(distinct endUserId)

    平均访问时长

    先把tracktime转换为unix timestamp, 相同sessionId的tracktime中,max(tracktime)-min(tracktime),得到用户停留时间。所有用户的停留时间相加,得到总停留时间。总停留时间和总访问次数的比例,就是平均访问时长

    select sum(stay_time) as total_stay_time
    from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time from yhd_log group by sessionId) stay
    

    用户访问总数

    count(distinct sessionId)

    二跳率

    sessionViewNo=2的用户,即为二跳用户。统计出二跳用户和uv的比例

    select count(distinct guid) from yhd_log where sessionViewNo=2

    独立ip数

    count(distinct ip)

    实现

    1. 借助中间表,分别存放停留时间和二次跳用户总数

       // 存放总停留时间
       
       create table if not exists yhd_log_total_stay_time(
       	date string,
       	total_stay_time bigint
       )
       row format delimited fields terminated by '	'
       stored as textfile;
       
       // 存放二次跳用户总数
       
       create table if not exists yhd_log_total_second_jump(
       	date string,
       	total_second_jump bigint
       )
       row format delimited fields terminated by '	'
       stored as textfile;
      
    2. 计算总停留时间,存放在yhd_log_total_stay_time, 按日期分组

       insert overwrite table yhd_log_total_stay_time
       select date, sum(stay_time) as total_stay_time
       from (select max(to_unix_timestamp(trackTime)) - min(to_unix_timestamp(trackTime)) as stay_time, date from yhd_log group by date, sessionId) stay
       group by date
      
    3. 计算二次跳用户总数,存放在yhd_log_total_second_jump, 按日期分组

       insert overwrite table yhd_log_total_second_jump
       select date, count(distinct guid)
       from yhd_log
       where sessionViewNo=2 
       group by date
      
    4. 统计

       把yhd_log_total_stay_time,yhd_log_total_second_jump和yhd_log按照 date连接查询
      
       select date, pv, uv, user_count, guest_count, 
       	total_stay_time/total_visit as average_stay_time,
       	total_second_jump/ uv as second_jump_rate, indepent_ip
       from (
       select log.date, 
       	count(url) as pv, 
       	count(distinct guid) as uv,
       	count(distinct endUserId) as user_count,
       	count(distinct guid) - count(distinct endUserId) as guest_count,
       	count(distinct sessionId) as total_visit,
       	min(stay.total_stay_time) as total_stay_time,
       	min(second.total_second_jump) as total_second_jump,
       	count(distinct ip) as indepent_ip
       from yhd_log log inner join yhd_log_total_stay_time stay on stay.date=log.date inner join yhd_log_total_second_jump second on second.date=log.date
       group by log.date
       ) stat
      

    结果

    datepvuvuser_countguest_countaverage_stay_timesecond_jump_rateindepent_ip
    20150828126134390071768721320745.97973932447510.1311815827928320730462
  • 相关阅读:
    管理这门技术和艺术
    主流双核处理器对比
    20111226自然醒
    message_t
    安装 SQL Server 2008 Management Studio Express
    Linux Ext2/Ext3/Ext4 文件系统分区
    多版本火狐共存方案及火狐配置文档高级管理技巧[转自火狐社区]
    心理学,慢慢学
    system pause in C#
    play with js
  • 原文地址:https://www.cnblogs.com/ivanny/p/yhd_log_statatis_2.html
Copyright © 2020-2023  润新知