• shell实战之Linux主机系统监控


    1、系统监控概述

    采集的监控信息主要有内存占用率,CPU占用率,当前在线用户,磁盘挂载及磁盘空间使用率,平均每秒写入流量,平均每秒流出流量。磁盘IO:平均每秒从磁盘读入内存的速率,平均每秒从内存写入磁盘的速率。

    2、监控原理

    2.1、CPU占用率

    监控原理:

    CPU相关信息记录在文件 /proc/stat中。详情请查看博文:https://blog.csdn.net/ustclu/article/details/1721673

    stephen@stephen-K55VD:~/shell$ cat  /proc/stat
    cpu  348229 906 98356 7304276 81726 0 2821 0 0 0
    cpu0 95033 273 22980 1803962 33023 0 1721 0 0 0
    cpu1 79735 255 24756 1836717 17035 0 454 0 0 0
    cpu2 84045 211 25742 1831963 16753 0 582 0 0 0
    cpu3 89415 166 24876 1831633 14913 0 62 0 0 0
    intr 10306028 7 28486 0 0 0 0 0 0 1 825 0 0 50130 0 0 0 76 284421 0 213811 0 0 0 29 795993 19 0 81 766580 15 648 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    ctxt 51268973
    btime 1554444493
    processes 14526
    procs_running 1
    procs_blocked 0
    softirq 9059312 7 2712077 5 5478 204089 0 1245879 2780432 0 2111345

     代码实现:

    1 #获取CPU的总量与使用量
    2     cpuTotalStart=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
    3     cpuUsedStart=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
    4     #隔30s再获取一次CPU总量与使用量并计算差值
    5     sleep 30
    6     cpuTotalEnd=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
    7     cpuUsedEnd=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
    8     usedCPU=`expr ${cpuUsedEnd} - ${cpuUsedStart}`
    9     totalCPU=`expr ${cpuTotalEnd} - ${cpuTotalStart}`

    2.2、内存占用率

    监控原理:

    内存相关的信息记录在/proc/meminfo文件中,MemTotal为内存总量,单位为kb,MemFree为空闲内存。内存占用率=(总内存-空闲内存)/ 总内存。

    stephen@stephen-K55VD:~/shell$ cat /proc/meminfo
    MemTotal:        3922884 kB
    MemFree:          139108 kB
    MemAvailable:     317700 kB
    Buffers:           31792 kB
    Cached:           538160 kB
    SwapCached:        10012 kB
    Active:          2615652 kB

    代码实现:

     1 #获取内存使用率
     2 function memUsage(){
     3     logInfo "Begin to get mem usage of Host [${ip}]"
     4     #获取总内存
     5     totalMem=`awk '/MemTotal/{print $2}' /proc/meminfo`
     6     #获取空闲内存
     7     freeMem=`awk '/MemFree/{print $2}' /proc/meminfo`
     8     usedMem=`expr ${totalMem} - ${freeMem}`
     9     #echo $(usagePercent ${usedMem} ${totalMem})
    10     #echo $(kbToGb ${totalMem})
    11     logInfo "Host [${ip}] total mem is :  $(kbToGb ${totalMem}) GB"
    12     #计算内存使用率并打印到日志中
    13     logInfo "Host [${ip}] mem usage is :  $(usagePercent ${usedMem} ${totalMem}) %"
    14     logInfo "End to get mem usage of Host [${ip}]"
    15 }

    2.3、流量监控

    监控原理:

    Linux机器流量信息记录在/proc/net/dev文件中。通过计算一段时间段内接收和发送的字节数来计算速率。第一列为网卡信息,第二列为接收的字节数,第10列为发送的字节数。

    stephen@stephen-K55VD:~/shell/sysMonitor$ cat /proc/net/dev
    Inter-|   Receive                                                |  Transmit
     face |bytes    packets errs drop fifo frame compressed multicast|bytes    packets errs drop fifo colls carrier compressed
    wlp3s0: 19595253   41163    0    0    0     0          0         0 34741446   49185    0    0    0     0       0          0
    enp4s0f2:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
    docker0:       0       0    0    0    0     0          0         0        0       0    0    0    0     0       0          0
        lo:  907275    5032    0    0    0     0          0         0   907275    5032    0    0    0     0       0          0

    代码实现:

    1 #ethName为网卡名称
    2 receiveByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
    3 sendByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`

    2.4、磁盘IO

    监控原理:

    磁盘IO相关的信息记录在/proc/vmstat文件中,pgpgin对应的为输入方向的数据量。pgpgout对应的为输出方向的数据量。采集一段时间的数据量,除以时间来计算速率。

    代码实现:

     1 #disk IO in
     2 function diskIOIn(){
     3     #获取磁盘入方向IO
     4     inIoStart=`awk '/pgpgin/{print $2}' /proc/vmstat`
     5     sleep 30
     6     inIoEnd=`awk '/pgpgin/{print $2}' /proc/vmstat`
     7     inIo=$(((inIoEnd-inIoStart)/(30*1024)))
     8     logInfo "Host [${ip}] in IO is :  ${inIo} MB / s"
     9 
    10 }

    3、脚本代码

    • hostLists:监控主机的IP集合。
    • sysMonitor.sh*:获取各项监控信息的脚本。
      1 #!/bin/bash
      2 #监控linux主机系统信息
      3 #导入工具模块
      4 source utils
      5 
      6 #获取CPU占用率
      7 function cpuUsage()
      8 {
      9     #物理CPU个数
     10     phyCPUNums=`cat /proc/cpuinfo |grep "physical id"|sort |uniq|wc -l`
     11     #逻辑CPU个数
     12     lgCPUNums=`cat /proc/cpuinfo |grep "processor"|wc -l`
     13         #core
     14     cores=`cat /proc/cpuinfo |grep "cores"|uniq|awk '{print $4}'`
     15     logInfo "Host [${ip}] physical CPU nums is :  ${phyCPUNums}"
     16     logInfo "Host [${ip}] logic CPU nums is :  ${lgCPUNums}"
     17     logInfo "Host [${ip}] core nums is :  ${cores}"
     18     #CPU占用率
     19     #获取CPU的总量与使用量
     20     cpuTotalStart=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
     21         cpuUsedStart=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
     22     #隔30s再获取一次CPU总量与使用量并计算差值
     23     sleep 30
     24     cpuTotalEnd=`awk 'BEGIN{total=0} /cpu / {for(i=2;i<=NF;i++);total+=i}END{print $total}' /proc/stat`
     25     cpuUsedEnd=`awk 'BEGIN{used=0} /cpu / { used=$2+$3+$4+$7+$8 }END{print used}' /proc/stat`
     26     usedCPU=`expr ${cpuUsedEnd} - ${cpuUsedStart}`
     27     totalCPU=`expr ${cpuTotalEnd} - ${cpuTotalStart}`
     28     logInfo "Host [${ip}] CPU usage is :  $(usagePercent ${usedCPU} ${totalCPU}) %"
     29     
     30 }
     31 
     32 #获取内存使用率
     33 function memUsage(){
     34     logInfo "Begin to get mem usage of Host [${ip}]"
     35     #获取总内存
     36     totalMem=`awk '/MemTotal/{print $2}' /proc/meminfo`
     37     #获取空闲内存
     38     freeMem=`awk '/MemFree/{print $2}' /proc/meminfo`
     39     usedMem=`expr ${totalMem} - ${freeMem}`
     40     #echo $(usagePercent ${usedMem} ${totalMem})
     41     #echo $(kbToGb ${totalMem})
     42     logInfo "Host [${ip}] total mem is :  $(kbToGb ${totalMem}) GB"
     43     #计算内存使用率并打印到日志中
     44     logInfo "Host [${ip}] mem usage is :  $(usagePercent ${usedMem} ${totalMem}) %"
     45     logInfo "End to get mem usage of Host [${ip}]"
     46 }
     47 
     48 #网卡平均每秒流量
     49 function netData(){
     50     logInfo "Begin to get  net data of Host [${ip}]"
     51     ethName=$1    
     52     receiveByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
     53     sendByteStart=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`
     54     sleep 10
     55     receiveByteSEnd=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $2}'`
     56     sendBytesEnd=`cat  /proc/net/dev  |grep  -E "${ethName}"|awk '{print $10}'`
     57     inDataRate=$(echo "scale=2;(${receiveByteSEnd}-${receiveByteStart})/10" | bc)
     58     outDataRate=$(echo "scale=2;(${sendBytesEnd}-${sendByteStart})/10" | bc)
     59     logInfo "Host [${ip}] in data is :  ${inDataRate} kb / s"    
     60     logInfo "Host [${ip}] out data is :  ${outDataRate} kb / s"
     61     logInfo "End to get  net data of Host [${ip}]"
     62 }
     63 
     64 #磁盘空间使用情况
     65 function diskUsage(){
     66     logInfo "Begin to get disk usage of Host [${ip}]"
     67     noTimeLogInfo "`df -h`"
     68     logInfo "End to get disk usage of Host [${ip}]"
     69 }
     70 
     71 #disk IO in
     72 function diskIOIn(){
     73     #获取磁盘入方向IO
     74     inIoStart=`awk '/pgpgin/{print $2}' /proc/vmstat`
     75     sleep 30
     76     inIoEnd=`awk '/pgpgin/{print $2}' /proc/vmstat`
     77     inIo=$(((inIoEnd-inIoStart)/(30*1024)))
     78     logInfo "Host [${ip}] in IO is :  ${inIo} MB / s"
     79 
     80 }
     81 
     82 #disk IO out
     83 function diskIOout(){
     84     #获取磁盘出方向的IO
     85     outIoStart=`awk '/pgpgout/{print $2}' /proc/vmstat`
     86     sleep 60
     87     outIoEnd=`awk '/pgpgout/{print $2}' /proc/vmstat`
     88     outIo=$(((outIoEnd-outIoStart)/(60*1024)))
     89     logInfo "Host [${ip}] out IO is :  ${outIo} MB / s"
     90 }
     91 
     92 #当前在线用户
     93 function onlineUser(){
     94     user=`w |awk  'NR>1'|awk '{print $1 "	" "	" $4}'`
     95     userCount=`w |awk  'NR>1'|wc -l`
     96         #loginAt=`w |awk  'NR>1'|awk '{print $4 }'`
     97         logInfo "There are [${userCount}] users online now."
     98     noTimeLogInfo "UserName        loginAt"
     99         noTimeLogInfo "${user}"
    100 }
    101     
    102 #判断主机网络连通性
    103 function isAlive(){
    104         for ip in `cat hostLists`
    105     do
    106     ping ${ip} -c 3 >/dev/null
    107         if [ $? -eq 0 ];then
    108         logInfo "${ip} is reachable"
    109         #查看在线用户
    110             onlineUser
    111         #获取CPU相关信息
    112         cpuUsage
    113         #获取mem相关信息
    114         memUsage
    115         #获取磁盘IO
    116         diskIOIn
    117         diskIOout
    118         #磁盘使用率
    119         diskUsage
    120         #平均每秒流接收或输出流量
    121         netData wlp3s0
    122     else
    123         logInfo "ERROR ${ip} is unreachable,try login in see more details.."
    124     fi
    125     done
    126 }
    127 
    128 while [ 1 ]
    129     do
    130     isAlive
    131     sleep 60
    132     done
    • utils:打印日志的函数等。
     1 #!/bin/bash
     2 #日志打印
     3 curr_path=`pwd`
     4 function logInfo()
     5 {
     6 local curr_time=`date "+%Y-%m-%d %H:%M:%S"`
     7 log_file=${curr_path}/system_status.log
     8 #判断日志文件是否存在
     9 if [ -e ${log_file} ]
    10    then
    11    #检测文件是否可写
    12    if [ -w ${log_file} ]
    13    then
    14        #若文件无写权限则使用chmod命令赋予权限
    15        chmod 770 ${log_file}
    16    fi
    17 else
    18    #若日志文件不存在则创建
    19    touch ${log_file}
    20 fi
    21 #写日志
    22 local info=$1
    23 echo "${curr_time}  `whoami` [Info] ${info}">>${log_file}
    24 }
    25 function noTimeLogInfo(){
    26     msg=$1
    27     echo  "${msg}">>${log_file}
    28 }
    29 
    30 #把kb转换成gb,精度为3。expr只支持整数计算
    31 function kbToGb(){
    32     kbVal=$1
    33     gbVal=$(echo "scale=3;${kbVal}/1024/1024"| bc)
    34     echo $gbVal
    35 }
    36 #使用率以百分比的形式
    37 #第一个参数为已使用量,第二个参数为总量
    38 function usagePercent(){
    39     used=$1
    40     total=$2
    41     usedPercent=$(echo "scale=2;${used}*100/${total}"| bc)
    42     echo ${usedPercent}
    43 }

    脚本结构:

    1 -rw-r--r-- 1 stephen stephen   30 4月   5 18:33 hostLists
    2 -rwxrwxr-x 1 stephen stephen 4164 4月   5 18:50 sysMonitor.sh*
    3 -rw-r--r-- 1 stephen stephen  951 4月   5 15:23 utils

    4、运行结果

    监控信息记录在日志system_status.log中。运行结果如下:

    2019-04-05 19:44:42  stephen [Info] 192.168.1.109 is reachable
    2019-04-05 19:44:42  stephen [Info] There are [2] users online now.
    UserName        loginAt
    USER        LOGIN@
    stephen        14:09
    2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] physical CPU nums is :  1
    2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] logic CPU nums is :  4
    2019-04-05 19:44:42  stephen [Info] Host [192.168.1.109] core nums is :  2
    2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] CPU usage is :  10.12 %
    2019-04-05 19:45:12  stephen [Info] Begin to get mem usage of Host [192.168.1.109]
    2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] total mem is :  3.741 GB
    2019-04-05 19:45:12  stephen [Info] Host [192.168.1.109] mem usage is :  95.83 %
    2019-04-05 19:45:12  stephen [Info] End to get mem usage of Host [192.168.1.109]
    2019-04-05 19:45:42  stephen [Info] Host [192.168.1.109] in IO is :  0 MB / s
    2019-04-05 19:46:42  stephen [Info] Host [192.168.1.109] out IO is :  0 MB / s
    2019-04-05 19:46:42  stephen [Info] Begin to get disk usage of Host [192.168.1.109]
    文件系统        容量  已用  可用 已用% 挂载点
    udev            1.9G     0  1.9G    0% /dev
    tmpfs           384M  2.0M  382M    1% /run
    /dev/sda10       42G   20G   20G   51% /
    tmpfs           1.9G   20M  1.9G    2% /dev/shm
    tmpfs           5.0M  4.0K  5.0M    1% /run/lock
    tmpfs           1.9G     0  1.9G    0% /sys/fs/cgroup
    /dev/loop0      3.8M  3.8M     0  100% /snap/notepad-plus-plus/202
    /dev/loop2       54M   54M     0  100% /snap/core18/782
    /dev/loop4      441M  441M     0  100% /snap/wine-platform/111
    /dev/loop5      441M  441M     0  100% /snap/wine-platform/105
    /dev/loop7      3.8M  3.8M     0  100% /snap/notepad-plus-plus/199
    /dev/loop3       90M   90M     0  100% /snap/core/6673
    /dev/loop1      274M  274M     0  100% /snap/wps-office-multilang/1
    /dev/loop6       91M   91M     0  100% /snap/core/6405
    /dev/loop8       92M   92M     0  100% /snap/core/6531
    /dev/loop9       36M   36M     0  100% /snap/gtk-common-themes/1198
    /dev/loop10     3.8M  3.8M     0  100% /snap/notepad-plus-plus/195
    /dev/loop11     441M  441M     0  100% /snap/wine-platform/103
    tmpfs           384M   16K  384M    1% /run/user/125
    tmpfs           384M   52K  384M    1% /run/user/1000
    2019-04-05 19:46:42  stephen [Info] End to get disk usage of Host [192.168.1.109]
    2019-04-05 19:46:42  stephen [Info] Begin to get  net data of Host [192.168.1.109]
    2019-04-05 19:46:52  stephen [Info] Host [192.168.1.109] in data is :  42.90 kb / s
    2019-04-05 19:46:52  stephen [Info] Host [192.168.1.109] out data is :  7.00 kb / s
    2019-04-05 19:46:52  stephen [Info] End to get  net data of Host [192.168.1.109]
    2019-04-05 19:47:04  stephen [Info] ERROR 255.255.255.254 is unreachable,try login in see more details..

    5、参考文档

    5.1、ifstat网络流量监控之/proc/net/dev文件

    https://blog.csdn.net/kongshuai19900505/article/details/80676607

    5.2、awk命令

    http://man.linuxde.net/awk

    5.3、使用shell脚本采集系统cpu、内存、磁盘、网络等信息

    https://www.jb51.net/article/50436.htm

  • 相关阅读:
    服务部署 RPC vs RESTful
    模拟浏览器之从 Selenium 到splinter
    windows程序设计 vs2012 新建win32项目
    ubuntu python 安装numpy,scipy.pandas.....
    vmvare 将主机的文件复制到虚拟机系统中 安装WMware tools
    ubuntu 修改root密码
    python 定义类 简单使用
    python 定义函数 两个文件调用函数
    python 定义函数 调用函数
    python windows 安装gensim
  • 原文地址:https://www.cnblogs.com/webDepOfQWS/p/10659653.html
Copyright © 2020-2023  润新知