• linux主机load average的概念&&计算过程&&注意事项


            最近开发的一个模块需要根据机房各节点的负载情况(如网卡IO、load average等指标)做任务调度,刚开始对Linux机器load average这项指标不是很清楚,经过调研,终于搞清楚了其计算方法和影响因素,作为笔记,记录于此。
    1. load average
            当在shell终端键入top命令时,默认情况下,在输出内容的第一行会有load average这项指标值,如下所示:

    top - 19:10:32 up 626 days,  4:58,  1 user,  load average: 7.74, 5.62, 6.51
    Tasks: 181 total,   8 running, 173 sleeping,   0 stopped,   0 zombie
    Cpu(s):  4.0% us,  0.5% sy,  0.0% ni, 95.4% id,  0.0% wa,  0.0% hi,  0.0% si

            同样,输入uptime命令,load average也会被输出:

    19:15:10 up 129 days,  5:12, 15 users,  load average: 0.01, 0.09, 0.05    

            根据man uptime的说明可知, load average包含的3个值分别表示past 1, 5 and 15 minutes内的系统平均负载。
            那么,这3个值是怎么计算出来的?下面从Linux源码中寻找答案。
    2. linux机器load average的计算过程
            wikipedia在对load的解释( 参见这里)中,提到了linux系统对load的计算方法,为亲自验证,我check了linux源码(linux kernel 2.6.9)中的相关代码,自顶向下的验证过程如下。
            在源码树kernel/timer.c文件中,计算系统load的函数代码如下:

    // 源码树路径:kernel/timer.c
    /*
     * Hmm.. Changed this, as the GNU make sources (load.c) seems to
     * imply that avenrun[] is the standard name for this kind of thing.
     * Nothing else seems to be standardized: the fractional size etc
     * all seem to differ on different machines.
     *
     * Requires xtime_lock to access.
     */
    unsigned long avenrun[3];
    
    /*
     * calc_load - given tick count, update the avenrun load estimates.
     * This is called while holding a write_lock on xtime_lock.
     */
    static inline void calc_load(unsigned long ticks)
    {
    	unsigned long active_tasks; /* fixed-point */
    	static int count = LOAD_FREQ;
    
    	count -= ticks;
    	if (count < 0) {
    		count += LOAD_FREQ;
    		active_tasks = count_active_tasks();
    		CALC_LOAD(avenrun[0], EXP_1, active_tasks);
    		CALC_LOAD(avenrun[1], EXP_5, active_tasks);
    		CALC_LOAD(avenrun[2], EXP_15, active_tasks);
    	}
    }

            从上面的代码可知,定义的数组avenrun[]包含3个元素,分别用于存放past 1, 5 and 15 minutes的load average值。calc_load则是具体的计算函数,其参数ticks表示采样间隔。函数体中,获取当前的活跃进程数(active tasks),然后以其为参数,调用CALC_LOAD分别计算3种load average。
            沿着函数调用链,可以看到count_active_tasks()定义如下(也在kernel/timer.c文件中):

    /*  
     * Nr of active tasks - counted in fixed-point numbers
     */
    static unsigned long count_active_tasks(void)
    {
    	return (nr_running() + nr_uninterruptible()) * FIXED_1;
    }

            由源码可见,count_active_tasks()返回当前的活跃进程数,其中活跃进程包括:1)当前正在运行的进程(nr_running);2)不可中断的sleeping进程(如正在执行IO操作的被挂起进程)。
            关于nr_running进程和nr_uninterruptible进程的计算方法,可以在源码树kernel/schde.c中看到相关代码:

    // 源码树路径:kernel/sched.c
    /*
     * nr_running, nr_uninterruptible and nr_context_switches:
     *
     * externally visible scheduler statistics: current number of runnable
     * threads, current number of uninterruptible-sleeping threads, total
     * number of context switches performed since bootup.
     */
    unsigned long nr_running(void)
    {
    	unsigned long i, sum = 0;
    
    	for (i = 0; i < NR_CPUS; i++)
    		sum += cpu_rq(i)->nr_running;
    
    	return sum;
    }
    
    unsigned long nr_uninterruptible(void)
    {
    	unsigned long i, sum = 0;
    
    	for_each_cpu(i)
    		sum += cpu_rq(i)->nr_uninterruptible;
    
    	return sum;
    }

            继续沿着函数调用链查看,可在include/linux/sched.h中看到CALC_LOAD的定义:

    // 源码树路径:include/linux/sched.h
    /*
     * These are the constant used to fake the fixed-point load-average
     * counting. Some notes:
     *  - 11 bit fractions expand to 22 bits by the multiplies: this gives
     *    a load-average precision of 10 bits integer + 11 bits fractional
     *  - if you want to count load-averages more often, you need more
     *    precision, or rounding will get you. With 2-second counting freq,
     *    the EXP_n values would be 1981, 2034 and 2043 if still using only
     *    11 bit fractions.
     */
    extern unsigned long avenrun[];		/* Load averages */
    
    #define FSHIFT		11		/* nr of bits of precision */
    #define FIXED_1		(1<<FSHIFT)	/* 1.0 as fixed-point */
    #define LOAD_FREQ	(5*HZ)		/* 5 sec intervals */
    #define EXP_1		1884		/* 1/exp(5sec/1min) as fixed-point */
    #define EXP_5		2014		/* 1/exp(5sec/5min) */
    #define EXP_15		2037		/* 1/exp(5sec/15min) */
    
    #define CALC_LOAD(load,exp,n) 
    	load *= exp; 
    	load += n*(FIXED_1-exp); 
    	load >>= FSHIFT;

            可以看到,CALC_LOAD是一个宏定义,load average的值与3个参数相关,但若只考虑某1项指标值(如past 5 minutes的load average),则该值只受当前活跃进程数(active tasks)的影响,而活跃进程数包括两种:当前正在运行的进程和不可中断的挂起进程。
            这符合我的观察结果:三台硬件配置相同的linux机器(8 cup, 16GB memory, 1.8T disk),在当前总进程数相差不多(均为170+)的情况下,其中1台机器有1个普通进程(这里的"普通"是指既非CPU型又非IO型)在运行,其余均sleeping;第2台机器有5个cpu型进程,cpu占用率均达到99%,其余进程sleeping;第3台机器2个进程读写硬盘,其余sleeping。很明显地可以看到:第3台机器的load average指标的3个值均为最大,第2台机器次之,第1台机器的3个值均接近0。
            由此,还可以推断出:与running类型的进程相比,uninterruptible类型的进程(如正在进行IO操作)对系统load的影响较大。( 注:该推断暂无数据或代码支撑,若有误,欢迎指正

    3. 理解load average背后的含义
            上面介绍了load average的概念及linux系统对该指标的计算过程,那么,这个指标值到底怎么解读呢?这篇文章给出了详细且形象的说明,此处不再赘述。

    【参考资料】
    1. wikipedia: Load (computing) 
    2. linux源码(内核版本2.6.9)
    3. Understanding Linux CPU Load - when should you be worried? 

    ================== EOF ===================


  • 相关阅读:
    923c C. Perfect Security
    hdu ACM Steps Section 1 花式A+B 输入输出格式
    我回来了
    POJ–2104 K-th Number
    bzoj1009: [HNOI2008]GT考试
    bzoj1875: [SDOI2009]HH去散步
    1898: [Zjoi2005]Swamp 沼泽鳄鱼
    Hadoop
    程序员的自我修养
    Effective C++笔记
  • 原文地址:https://www.cnblogs.com/jiangu66/p/3162865.html
Copyright © 2020-2023  润新知