• 第一关练习题统计网站最大访问量sed法,隐藏知识数组下标不能重复


    1.1.1 获取日志的最大top10,排序

    获取两列到新的文件中第一次处理

    sed截取字符串中间的内容,sed不支持贪婪匹配.找出图片在的列和图片大小到test1文件

    本题需要输出三个指标:【访问次数】【访问次数*单个文件大小】【文件名(可以带URL)】

    测试数据

    59.33.26.105 --[08/Dec/2010:15:43:56 +0800] "GET /static/images/photos/2.jpgHTTP/1.1" 200 11299 "http://oldboy.blog.51cto.com/static/web/column/17/index.shtml?courseId=43" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"

    59.33.26.105 --[08/Dec/2010:15:43:56 +0800] "GET /static/images/photos/2.jpgHTTP/1.1" 200 11299"http://oldboy.blog.51cto.com/static/web/column/17/index.shtml?courseId=43" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"

    59.33.26.105 --[08/Dec/2010:15:44:02 +0800] "GET /static/flex/vedioLoading.swfHTTP/1.1" 200 3583"http://oldboy.blog.51cto.com/static/flex/AdobeVideoPlayer.swf?width=590&height=328&url=/[[DYNAMIC]]/2" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)"

    124.115.4.18 --[08/Dec/2010:15:44:15 +0800] "GET /?=HTTP/1.1" 200 46232 "-" "-"

    124.115.4.18 --[08/Dec/2010:15:44:25 +0800] "GET /static/js/web_js.jsHTTP/1.1" 200 4460 "-" "-"

    124.115.4.18 --[08/Dec/2010:15:44:25 +0800] "GET /static/js/jquery.lazyload.jsHTTP/1.1" 200 1627 "-" "

    ============================答题步骤

    s sed截取字符串中间的内容,sed不支持贪婪匹配.找出图片在的列和图片大小到test1文件

    sed -rn 's/.*GET (.*) HTTP.*200 (.*) /1 2/gp' bb.txt >>test1.txt

    /static/images/photos/2.jpg      11299

    /static/images/photos/2.jpg      11299

    /static/flex/vedioLoading.swf    3583

    /?=      46232 "-" "-"

    /static/js/web_js.js     4460 "-" "-"

    /static/js/jquery.lazyload.js    1627 "-" "-"

    awk -F " " '{print $2}' test1.txt |sort -nr  对数字列进行倒序排序

    uniq -c 对每行进行计数

    awk -F " " '{print $1" "$2}' test1.txt |sort -n|uniq -c>test2.txt #第二次处理

    1 /?=     46232

          1 /static/flex/vedioLoading.swf   3583

          2 /static/images/photos/2.jpg     11299

          1 /static/js/jquery.lazyload.js   1627

          1 /static/js/web_js.js    4460

    [root@ob data]# awk -F " " '{print (($1*$3))" " $2}' test2.txt|sort -nru     #最后合并排序

    46232   /?=

    22598   /static/images/photos/2.jpg

    4460    /static/js/web_js.js

    3583    /static/flex/vedioLoading.swf

    1627    /static/js/jquery.lazyload.js

    ======================================================#sed

    awk -F"GET |HTTP/1.1|200 " '{print $2,$4}' /data/bb.txt    

    /static/images/photos/2.jpg  11299

    /static/images/photos/2.jpg  11299

    /static/flex/vedioLoading.swf  3583

    /?=  46232 "-" "-"

    /static/js/web_js.js  4460 "-" "-"

    /static/js/jquery.lazyload.js  1627 "-" "-"

    [root@ob1 mytmp]# awk -F"GET |HTTP/1.1|200 " '{TP[$2]++}END{for (i in TP) print i,TP[i]}' /data/bb.txt    

     7

    /static/js/jquery.lazyload.js  1

    /static/flex/vedioLoading.swf  1

    /?=  1

    /static/images/photos/2.jpg  2

    /static/js/web_js.js  1

    #逻辑错误掉坑里写法,awk的特性是按行取数据,前一个数据会将第一个顶掉,所以在END时候只有最后一个数据,

    [root@ob1 mytmp]# awk -F"GET |HTTP/1.1|200 " '{TP[$2]++}END{for (i in TP) print i,TP[i],$4}' /data/bb.txt|awk -F ' '  '{print $2*$3,$1}'|sort -nrk1

    3254 /static/images/photos/2.jpg

    1627 /static/js/web_js.js

    1627 /static/js/jquery.lazyload.js

    1627 /static/flex/vedioLoading.swf

    1627 /?=

    0 7

    ===================================================

    第一步查看第二列

    [root@ob1 data]# awk -F"GET | HTTP"  '{print $2}' bb.txt

    /static/images/photos/2.jpg

    /static/images/photos/2.jpg

    /static/flex/vedioLoading.swf

    /?=

    /static/js/web_js.js

    /static/js/jquery.lazyload.js

    第二步 属组取出

    [root@ob1 data]# awk -F"GET | HTTP|200 | "-"  '{tt[$2]++}END{for (i in tt)print i,tt[i]}' bb.txt

     7

    /?= 1

    /static/js/web_js.js 1

    /static/images/photos/2.jpg 2

    /static/flex/vedioLoading.swf 1

    /static/js/jquery.lazyload.js 1

    第三步再次定义一个数组存放第三列,size[$2]+=$3,又买一个筐子存放第三列,可以使用相同的数组下表应为每一列相同

    [root@ob1 data]# awk -F"GET | HTTP/1.1" 200 | "-" '{aa[$2]++;size[$2]+=$3}END{for (i in aa)print aa[i],i,size[i]}' bb.txt

    awk: warning: escape sequence `-' treated as plain `-'

    7  0

    1 /?= 46232

    1 /static/js/web_js.js 4460

    2 /static/images/photos/2.jpg 22598

    1 /static/flex/vedioLoading.swf 3583

    1 /static/js/jquery.lazyload.js 1627

    第三步计算排序

    [root@ob1 data]# awk -F"GET | HTTP/1.1" 200 | "-" '{aa[$2]++;size[$2]+=$3}END{for (i in aa)print i,aa[i]*size[i]}' bb.txt|sort -nk 2

    awk: warning: escape sequence `-' treated as plain `-'

     0

    /static/js/jquery.lazyload.js 1627

    /static/flex/vedioLoading.swf 3583

    /static/js/web_js.js 4460

    /static/images/photos/2.jpg 45196

    /?= 46232

    ================================

  • 相关阅读:
    使用 cordova-plugin-wechat 分享返回后闪退解决方法
    恢复删除的表
    移动端还原设计图
    js文本差异比较
    windows使用nvm安装nodejs后升级npm报错
    windows添加右键菜单"在此处打开cmd窗口"
    cordova热更新
    js变量提升
    c# 判断字符串是否是日期格式需要注意的一点小问题
    剑指offer_和为S的两个数字
  • 原文地址:https://www.cnblogs.com/gaoyuechen/p/7521207.html
Copyright © 2020-2023  润新知