• linux的统计实现


    场景:

    将下面的数据里category里的分类统计计数

    数据源

    es_ip10000.json

    {"_index":"order","_type":"service","_id":"107.151.83.180:22","_score":1,"_source":{"ip":"107.151.83.180","parent_category":["支撑系统"],"category":["其他支撑系统"]}}
    {"_index":"order","_type":"service","_id":"107.151.84.167:22","_score":1,"_source":{"ip":"107.151.84.167","parent_category":["支撑系统"],"category":["其他支撑系统"]}}
    {"_index":"order","_type":"service","_id":"107.151.84.177:22","_score":1,"_source":{"ip":"107.151.84.177","parent_category":["支撑系统"],"category":["其他支撑系统"]}}
    {"_index":"order","_type":"service","_id":"107.152.188.252:1723","_score":1,"_source":{"ip":"107.152.188.252","parent_category":["网络产品"],"category":["路由器"]}}
    {"_index":"order","_type":"service","_id":"107.151.89.125:1025","_score":1,"_source":{"ip":"107.151.89.125"}}
    {"_index":"order","_type":"service","_id":"107.152.58.217:22","_score":1,"_source":{"ip":"107.152.58.217","parent_category":["支撑系统"],"category":["服务"]}}
    {"_index":"order","_type":"subdomain","_id":"107.15.221.83:443","_score":1,"_source":{"ip":"107.15.221.83","parent_category":["办公外设","系统软件"],"category":["打印机","操作系统"]}}
    

    _source下的category字段

    cat es_ip10000.json | jq ._source.category > category.txt

    输出结果

    [
      "其他支撑系统"
    ]
    [
      "其他支撑系统"
    ]
    [
      "其他支撑系统"
    ]
    [
      "路由器"
    ]
    null
    [
      "服务"
    ]
    [
      "打印机",
      "操作系统"
    ]
    
    

    用编辑器,去除 , []

    处理后的结果

    
      "其他支撑系统"
    
    
      "其他支撑系统"
    
    
      "其他支撑系统"
    
    
      "路由器"
    
    null
    
      "服务"
    
    
      "打印机"
      "操作系统"
    
    

    排序 > 去重->统计->再排序

    cat category.txt | sort | uniq -c | sort -n >category_count.txt

    说明:

    uniq -c #去重并统计

    sort -n # 正序排序

    sort -r # 倒序排序

    输出结果:

          1 null
          1   "操作系统"
          1   "打印机"
          1   "服务"
          1   "路由器"
          3   "其他支撑系统"
         12 
    
    [Haima的博客] http://www.cnblogs.com/haima/
  • 相关阅读:
    springMVC-MyBatis-Mysql 环境下, 返回时间格式不是指定格式
    大话设计模式--第一章 简单工厂设计模式
    java编程思想第四版第十八章总结
    尚学堂-马士兵-专题-正则表达式
    张孝祥java高新技术 --- jkd1.5 新特性 -- 精华总结
    Python—文件进阶操作
    Python—文件读写操作
    Python—异常处理
    Python—网络抓包与解包(pcap、dpkt)
    Python—其它模块
  • 原文地址:https://www.cnblogs.com/haima/p/15118877.html
Copyright © 2020-2023  润新知