故障背景:
网站页面打开速度非常慢
排查过程:
1.一开始用vmstat 看到procs下的r值稳定在5、6,由于这台服务器是12核24线程,并且cpu的wa很大,说明系统很轻松,
肯定不会报警了,那为什么页面打开会这么慢呢?
[root@app_sz nginx]# vmstat 2
procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
4 0 141816 4319000 513996 17482000 0 0 0 18 0 0 12 0 88 0 0
2 0 141816 4317912 513996 17482064 0 0 0 460 2118 1581 3 0 97 0 0
0 0 141816 4300932 513996 17482100 0 0 0 1848 1966 1482 3 0 97 0 0
4 0 141816 4308788 513996 17482056 0 0 0 358 1653 1247 2 0 97 0 0
1 0 141816 4314516 513996 17482080 0 0 0 1588 2311 1613 3 0 96 0 0
0 0 141816 4312296 514004 17482020 0 0 0 82952 2122 1353 2 0 98 0 0
3 0 141816 4301508 514004 17482080 0 0 0 408 1880 1424 3 0 97 0 0
[root@app_sz nginx]# top
top - 16:57:00 up 52 days, 10:10, 8 users, load average: 4.44, 4.62, 4.61
Tasks: 613 total, 2 running, 611 sleeping, 0 stopped, 0 zombie
Cpu(s): 21.4%us, 3.6%sy, 0.0%ni, 75.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24729804k total, 20219216k used, 4510588k free, 513968k buffers
Swap: 31457272k total, 141816k used, 31315456k free, 17306056k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6571 root 20 0 2920m 384m 348m S 407.1 1.6 83:28.58 mongod
8514 nginx 20 0 219m 18m 5328 R 100.0 0.1 0:23.94 php-fpm
9479 root 20 0 15432 1672 948 R 81.4 0.0 0:00.08 top
1 root 20 0 19364 1364 1152 S 0.0 0.0 0:02.07 init
2 root 20 0 0 0 0 S 0.0 0.0 0:00.18 kthreadd
3 root RT 0 0 0 0 S 0.0 0.0 1:00.97 migration/0
4 root 20 0 0 0 0 S 0.0 0.0 0:29.55 ksoftirqd/0
5 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/0
6 root RT 0 0 0 0 S 0.0 0.0 0:03.90 watchdog/0
7 root RT 0 0 0 0 S 0.0 0.0 0:20.17 migration/1
8 root RT 0 0 0 0 S 0.0 0.0 0:00.00 migration/1
9 root 20 0 0 0 0 S 0.0 0.0 0:12.43 ksoftirqd/1
10 root RT 0 0 0 0 S 0.0 0.0 0:02.43 watchdog/1
11 root RT 0 0 0 0 S 0.0 0.0 0:06.51 migration/2
2.看到top中排在第一位的mongod是400+%,还是耗了一定cpu的,由于linux是多线程叠加处理,所以24个线程
下400+d cpu占用其实也不是特别高,不过这mongod却要去看看了,果不其然故障就出现这mongod,陈序那边这两天
启用了一个debug的东西,一直在写日志,累计运行几天后就把mongod拖慢了,导致整个业务就慢了,一个小技巧,top回车后
按1就可以看到所有cpu进程的负载了。
[root@app_sz nginx]# top
top - 17:16:40 up 52 days, 10:29, 8 users, load average: 1.39, 0.87, 1.97
Tasks: 613 total, 2 running, 611 sleeping, 0 stopped, 0 zombie
Cpu0 : 36.2%us, 0.7%sy, 0.0%ni, 63.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 46.8%us, 0.3%sy, 0.0%ni, 52.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 23.6%us, 0.3%sy, 0.0%ni, 76.1%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 2.3%us, 0.3%sy, 0.0%ni, 97.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu4 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu5 : 65.4%us, 0.0%sy, 0.0%ni, 34.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu6 : 44.7%us, 0.3%sy, 0.0%ni, 55.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu7 : 53.6%us, 0.7%sy, 0.0%ni, 45.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu8 : 58.3%us, 0.3%sy, 0.0%ni, 41.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu9 : 4.0%us, 0.3%sy, 0.0%ni, 95.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu10 : 10.6%us, 0.0%sy, 0.0%ni, 89.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu11 : 35.4%us, 0.0%sy, 0.0%ni, 64.6%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu12 : 5.0%us, 0.3%sy, 0.0%ni, 94.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu13 : 0.3%us, 0.0%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu14 : 3.3%us, 0.0%sy, 0.0%ni, 96.4%id, 0.3%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu15 : 0.3%us, 0.3%sy, 0.0%ni, 99.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu16 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu17 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu18 : 2.3%us, 0.3%sy, 0.0%ni, 97.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu19 : 1.7%us, 0.0%sy, 0.0%ni, 98.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu20 : 3.0%us, 0.0%sy, 0.0%ni, 97.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu21 : 0.0%us, 0.3%sy, 0.0%ni, 99.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu22 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu23 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 24729804k total, 20473464k used, 4256340k free, 514012k buffers
Swap: 31457272k total, 141816k used, 31315456k free, 17502960k cached