Linux系统Load Average平均负载高如何处理

通过top或着uptime命令可以看到系统的平均负载，如下，分别表示过去 1 分钟、5 分钟、15 分钟的系统平均负载（之所以统计三个时间点数值，是为了更好的反映系统整体的负载趋势）

[root@k8s-master ~]# uptime
 10:54:36 up 8 days, 12:31,  1 user,  load average: 0.25, 0.51, 1.19

平均负载的含义：系统处于可运行状态和不可中断状态的平均进程数，也就是平均活跃进程数，这里的平均指的是指数衰减平均值，对应到进程的状态如下

可运行状态（Running或Runnable）

不可中断状态（Uninterruptible Sleep，也称为 Disk Sleep）

#查找R或D状态的进程
ps aux | awk '{if($8 ~ /R|D/) print $0 }'

[root@k8s-master ~]# ps aux | awk '{if($8 ~ /R|D/) print $0 }' 
root         9  0.0  0.0      0     0 ?        R    Jun30   6:08 [rcu_sched]
root     30474  0.0  0.0 157456  1912 pts/0    R+   11:10   0:00 ps aux
root     30475  0.0  0.0 113548  1232 pts/0    R+   11:10   0:00 awk {if($8 ~ /R|D/) print $0 }

根据上述平均负载的定义，能够导致平均负载升高的场景有：

1、处于Running状态的进程大量消耗cpu（CPU密集型进程）

2、大量处于Runnable的进程，cpu会频繁进行上下文切换（寄存器、程序计数器）

操作系统管理的任务包括进程（线程），还有硬件通过触发信号，会导致中断处理程序的调用

上下文切换包括：进程上下文切换（虚拟内存、栈、全局变量等用户空间的资源，还包括了内核堆栈、寄存器等内核空间的状态）、线程上下文切换、以及中断上下文切换

特权模式切换：用户态到内核态的上下文切换

查看系统总体的上下文切换情况
vmstat 5

[root@k8s-master sysstat]# vmstat 5
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 8  0      0 131084 203200 1801028    0    0  1724   166    2    8 12  7 79  2  0
 0  0      0 130480 203216 1801272    0    0    42    81 1829 5047  8  5 86  1  0
     0  0      0 129676 203232 1802092    0    0   161   110 1814 4840 16  6 76  1  0
 0  0      0 129344 203236 1802496    0    0    78    70 1939 5258 14 12 73  1  0
 0  0      0 128464 203248 1803156    0    0   120   106 1836 5195  9  6 84  1  0
 1  0      0 128000 203256 1803612    0    0    91    98 1807 4726 16  6 77  1  0

cs（context switch）是每秒上下文切换的次数。
in（interrupt）则是每秒中断的次数。
r（Running or Runnable）是就绪队列的长度，
也就是正在运行和等待 CPU 的进程数。
b（Blocked）则是处于不可中断睡眠状态的进程数。

各类型中断次数统计
watch -d cat /proc/interrupts


查看每个进程的上下文切换次数
pidstat -w -t 1 (-t显示线程上下文切换统计)
05:17:54 PM   UID       PID   cswch/s nvcswch/s  Command
05:17:55 PM     0         1      1.03      0.00  systemd
05:17:55 PM     0         3     21.65      0.00  ksoftirqd/0
05:17:55 PM     0         9    100.00      0.00  rcu_sched
05:17:55 PM     0       296     12.37      0.00  kworker/0:1H
05:17:55 PM     0       320      6.19      0.00  jbd2/vda1-8
05:17:55 PM     0      1139      1.03      0.00  iscsid
05:17:55 PM  1337      5766     13.40      0.00  envoy
05:17:55 PM  1337      6061     16.49      1.03  envoy
05:17:55 PM  1337      6065     16.49      0.00  envoy
05:17:55 PM     1      6141      1.03      2.06  python
05:17:55 PM  1337      6331     13.40      0.00  envoy
05:17:55 PM     0     10774     14.43      0.00  envoy
05:17:55 PM     0     10805      2.06      0.00  YDLive
05:17:55 PM     0     10844     15.46      0.00  envoy
05:17:55 PM     0     11532      3.09      1.03  coredns
05:17:55 PM     0     12767     40.21     15.46  etcd
05:17:55 PM     0     13170     10.31      0.00  kube-proxy
05:17:55 PM     0     18892      1.03      0.00  sshd
05:17:55 PM     0     19758      1.03      0.00  kworker/u2:2
05:17:55 PM     0     27112      8.25      0.00  kworker/0:2
05:17:55 PM     0     28294      1.03      1.03  pidstat
05:17:55 PM     0     28365      2.06      0.00  YDService

cswch：自愿上下文切换，是指进程无法获取所需资源，导致的上下文切换。比如说， I/O、内存等系统资源不足时，就会发生自愿上下文切换。
nvcswch：非自愿上下文切换，则是指进程由于时间片已到等原因，被系统强制调度，进而发生的上下文切换

3、存在处于D状态的进程

centos安装 pidstat和mpstat工具  yum install -y sysstat

#模拟进程消耗cpu
stress -i 1 --timeout 600

#查看各cpu使用情况（我这里只有一个cpu）
(取5s内的数据计算一组平均值）
[root@k8s-master ~]# date;mpstat -P ALL 5 1

Thu Jul  9 14:32:06 CST 2020
Linux 3.10.0-957.27.2.el7.x86_64 (k8s-master.com)     07/09/2020     _x86_64_    (1 CPU)

02:32:07 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
02:32:12 PM  all   96.79    0.00    3.21    0.00    0.00    0.00    0.00    0.00    0.00    0.00
02:32:12 PM    0   96.79    0.00    3.21    0.00    0.00    0.00    0.00    0.00    0.00    0.00

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all   96.79    0.00    3.21    0.00    0.00    0.00    0.00    0.00    0.00    0.00
Average:       0   96.79    0.00    3.21    0.00    0.00    0.00    0.00    0.00    0.00    0.00




#确定占用cpu较多的进程 pidstat的-u参数表示查看cpu指标
[root@k8s-master ~]# pidstat -u 5 1
Linux 3.10.0-957.27.2.el7.x86_64 (k8s-master.com)     07/09/2020     _x86_64_    (1 CPU)

02:37:27 PM   UID       PID    %usr %system  %guest    %CPU   CPU  Command
02:37:33 PM     0     25315   80.90    0.00    0.00   80.90     0  stress
02:37:33 PM     0     25539    0.00    0.39    0.00    0.39     0  pidstat

Average:      UID       PID    %usr %system  %guest    %CPU   CPU  Command
Average:        0     25315   80.90    0.00    0.00   80.90     -  stress
Average:        0     25539    0.00    0.39    0.00    0.39     -  pidstat

stress -i 1 --timeout 600

通过mpstat可以看到cpu0有60%的时间片都在用于等待io，且这个过错不可中断
[root@k8s-master ~]# mpstat -P ALL 5 1
Linux 3.10.0-957.27.2.el7.x86_64 (k8s-master.com)     07/09/2020     _x86_64_    (1 CPU)

03:07:38 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
03:07:43 PM  all    9.58    0.00   30.00   60.00    0.00    0.42    0.00    0.00    0.00    0.00
03:07:43 PM    0    9.58    0.00   30.00   60.00    0.00    0.42    0.00    0.00    0.00    0.00

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
Average:     all    9.58    0.00   30.00   60.00    0.00    0.42    0.00    0.00    0.00    0.00
Average:       0    9.58    0.00   30.00   60.00    0.00    0.42    0.00    0.00    0.00    0.00

通过前面的命令ps aux也能看到stess进程状态为D

为什么会有D状态的进程？

不可中断状态的进程则是正处于内核态关键流程中的进程，并且这些流程是不可打断的，比如最常见的是等待硬件设备的 I/O 响应，也就是我们在 ps 命令中看到的 D 状态（Uninterruptible Sleep，也称为 Disk Sleep）的进程。比如，当一个进程向磁盘读写数据时，为了保证数据的一致性，在得到磁盘回复前，它是不能被其他进程或者中断打断的，这个时候的进程就处于不可中断状态。如果此时的进程被打断了，就容易出现磁盘数据与进程数据不一致的问题。所以，不可中断状态实际上是系统对进程和硬件设备的一种保护机制。

ps aux中各进程状态说明（man ps)

PROCESS STATE CODES
       Here are the different values that the s, stat and state output specifiers (header "STAT" or "S") will display to describe the state of a process:

               D    uninterruptible sleep (usually IO)
               R    running or runnable (on run queue)
               S    interruptible sleep (waiting for an event to complete)
               T    stopped by job control signal
               t    stopped by debugger during the tracing
               W    paging (not valid since the 2.6.xx kernel)
               X    dead (should never be seen)
               Z    defunct ("zombie") process, terminated but not reaped by its parent

       For BSD formats and when the stat keyword is used, additional characters may be displayed:

               <    high-priority (not nice to other users)
               N    low-priority (nice to other users)
               L    has pages locked into memory (for real-time and custom IO)
               s    is a session leader
               l    is multi-threaded (using CLONE_THREAD, like NPTL pthreads do)
               +    is in the foreground process group

相关阅读:
Android深度探索--HAL与驱动开发----第十章读书笔记
 Android深度探索--HAL与驱动开发----第九章读书笔记
 Android深度探索--HAL与驱动开发----第八章读书笔记
 Android深度探索--HAL与驱动开发----第七章读书笔记
 Android深度探索--HAL与驱动开发----第六章读书笔记
 Android深度探索--HAL与驱动开发----第五章读书笔记
 Android深度探索--HAL与驱动开发----第四章读书笔记
 Android深度探索--HAL与驱动开发----第三章读书笔记
 Android深度探索--HAL与驱动开发----第二章读书笔记
 Android深度探索--HAL与驱动开发----第一章读书笔记
原文地址：https://www.cnblogs.com/orchidzjl/p/13272534.html

热门文章
第十章
 第八章
 第九章
 第五章
 第六章
 第七章
 第三章
 第四章
 第二章
 第一章