使用Linux命令sort及uniq对文件或屏幕输出进行分组统计

在日常Linux操作常常需要对一些文件或屏幕数次中重复的字段进行分组统计。另外分组统计也是常考的面试题之一。
实现的方法非常简单，核心命令为：sort | uniq --c | sort -rn 。

sort：对指定列进行排序，使该列相同的字段排练到一起
uniq -c：uniq命令用于检查及删除文本文件中重复出现的行列，uniq -c或uniq --count用于统计重复的行
sort -rn：sort -n将字符串数字按数字进行比较，-r则从大到小排列

题目1. 某个文本demo.txt文件，每一行一个单词，统计出现次数最多的3个单词

hello
hi
hello
world
world
my
word
hi
hello

参考答案

sort demo.txt  | uniq -c | sort -rn | head -3

执行结果如下

   3 hello
   2 world
   2 hi

题目2. 统计Nginx日志access.log中出现最多的10个url

201.158.69.116 - - [03/Jan/2013:21:17:20 -0600] fwf[-] tip[-] 127.0.0.1:9000 0.007 0.007 MX pythontab.com GET /html/test.html HTTP/1.1 "200" 2426 "http://a.com" "es-ES,es;q=0.8" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11"
187.171.69.177 - - [03/Jan/2013:21:17:20 -0600] fwf[-] tip[-] 127.0.0.1:9000 0.006 0.006 MX pythontab.com GET /html/test2.html HTTP/1.1 "200" 2426 "http://a.com" "es-ES,es;q=0.8" "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.97 Safari/537.11"

参考答案

cat access.log | awk '{print $14}'|sort|uniq -c | sort -rn | head -10

或

sort access.log -k 14 | uniq -c | sort -rn | head -10| awk '{print $1,$15}'

sort -k：指定列 -t：指定分隔符

执行结果如下

   1 /html/test2.html
   1 /html/test.html

题目3. 统计80端口，TCP链接状态，并按数量从大到小排序

netstat -nat | grep 80显示如下：

tcp4       0      0  192.168.0.101.57581    80.254.145.118.80      SYN_SENT
tcp4       0      0  192.168.0.101.57572    111.161.64.23.80       ESTABLISHED
tcp4       0      0  192.168.0.101.57565    60.29.242.162.80       ESTABLISHED
tcp4       0      0  192.168.0.101.57513    175.174.56.212.80      CLOSE_WAIT
tcp6       0      0  fe80::18e3:52d8:.56850 fe80::1cc0:75be:.62835 ESTABLISHED
tcp4       0      0  192.168.0.101.56178    175.174.56.212.80      CLOSE_WAIT

参考答案

netstat -nat | grep 80 | awk '{print $6}' | sort | uniq -c | sort -rn

执行结果如下

  27 ESTABLISHED
  10 LISTEN
   2 CLOSE_WAIT
   1 ce382f50fea83507
   1 ce382f50fea80df7
   1 SYN_SENT
   1

相关阅读:
ReflectionException: There is no getter for property named
iframe发送post请求
wget已安装但命令没找到
linux性能观察命令
ELK搭建
python之中特性（attribute）与属性（property）有什么区别？
Django中的日志详解
创建fastdfs_nginx容器及nginx配置
2. 顺序表数据结构与算法(python)
Ubuntu安装和卸载搜狗输入法

原文地址：https://www.cnblogs.com/superhin/p/15141853.html