一个top命令不就行了么?顶多再加一些管道什么的过滤一下。我一开始也是这么想得。其实还可以理解的更多。
首先一个问题,是统计某个时间点的CPU负载,还是某个时间段的?
为了画折线图报表,一般横坐标都是某个时间点,也就是希望能够统计某个时间点的CPU负载,但这是很难办得到的。比较容易的做法是通过两个时间点之间的CPU负载,也就是某个时间段。如果要做benchmark,就把时间段变得很小,1秒甚至更小。如果要常规监控, 可以将时间段放大到1分钟,甚至更多。
第二个问题,用什么来判断某个时间段的CPU的负载?
CPU有一个基本时间度量单位叫做jiffy,这是一个很短的时间,具体时常多少取决与硬件。不过关系不大,对于我的计算负载达到百分之多少来讲已经够用了。
下面这篇文章http://www.linuxhowtos.org/System/procstat.htm介绍了介绍了 介绍了/proc/stat文件。里面指的关注的是:
1. 第一行CPU的数值是下面几个CPU数值的总和
2. 一行7个数字的分别解释:
The meanings of the columns are as follows, from left to right: user: normal processes executing in user mode nice: niced processes executing in user mode system: processes executing in kernel mode idle: twiddling thumbs iowait: waiting for I/O to complete irq: servicing interrupts softirq: servicing softirqs
然后这篇讨论贴给出计算公式,
http://stackoverflow.com/questions/3017162/how-to-get-total-cpu-usage-in-linux-c
e.g. Suppose at 14:00:00 you have cpu 4698 591 262 8953 916 449 531 total_jiffies_1 = (sum of all values) = 16400 work_jiffies_1 = (sum of user,nice,system = the first 3 values) = 5551 and at 14:00:05 you have cpu 4739 591 289 9961 936 449 541 total_jiffies_2 = 17506 work_jiffies_2 = 5619 So the %cpu usage over this period is: work_over_period = work_jiffies_2 - work_jiffies_1 = 68 total_over_period = total_jiffies_2 - total_jiffies_1 = 1106 %cpu = work_over_period / total_over_period * 100 = 6.1%
很容易理解。最后算出来的小数 * 100后就是百分数。
在我的机器上,一共10列。
cat /proc/stat cpu 2065552 1692 636745 10842974 59979 16 6860 0 0 0 cpu0 524690 552 158305 2701823 8912 7 4808 0 0 0 cpu1 511203 670 157274 2703792 31404 1 1179 0 0 0 cpu2 519169 441 155591 2720326 11179 0 438 0 0 0 cpu3 510489 27 165574 2717032 8482 7 435 0 0 0
在man 5 proc中回车,输入/proc/stat后再次回车进行查找,看到
/proc/stat kernel/system statistics. Varies with architecture. Common entries include: cpu 3357 0 4313 1362393 The amount of time, measured in units of USER_HZ (1/100ths of a second on most architectures, use sysconf(_SC_CLK_TCK) to obtain the right value), that the system spent in user mode, user mode with low priority (nice), system mode, and the idle task, respectively. The last value should be USER_HZ times the second entry in the uptime pseudo-file. In Linux 2.6 this line includes three additional columns: iowait - time waiting for I/O to complete (since 2.5.41); irq - time servicing interrupts (since 2.6.0-test4); softirq - time servicing softirqs (since 2.6.0-test4). Since Linux 2.6.11, there is an eighth column, steal - stolen time, which is the time spent in other operating systems when running in a virtualized environment Since Linux 2.6.24, there is a ninth column, guest, which is the time spent running a virtual CPU for guest operating systems under the control of the Linux kernel.
这里解释了
第8个是虚拟机环境下,其他OS偷走的时间。
第9个是如果是host机器,那么运行的guest VM用去的时间。
这些信息也是很有用的。毕竟现在不少server其实只是VM而已。