• 调度器26—Linux内核中的各种时间频率 Hello


    一、各种时间的打印

    1. per-cpu的各种类型的使用时间

    # ls -l /proc/stat
    -r--r--r-- 1 root root 0 2021-01-01 19:46 /proc/stat
    # cat /proc/stat
    cpu  203632 46353 386930 31815547 3869 274339 68486 0 0 0
    cpu0 26704 7709 39012 3916272 49 87626 23620 0 0 0
    cpu1 14682 9898 25125 4055433 68 8755 3338 0 0 0
    cpu2 5588 8202 7818 4098854 47 2215 901 0 0 0
    cpu3 21765 10971 40654 4014299 341 19606 3900 0 0 0
    cpu4 28157 1362 52559 3983416 725 25697 6661 0 0 0
    cpu5 58390 2212 140189 3718682 1273 96146 17063 0 0 0
    cpu6 42753 1587 70162 3930832 1008 32193 11836 0 0 0
    cpu7 5588 4407 11408 4097755 355 2097 1164 0 0 0
    intr 71408793 0 32194638 9259224 0 0 56084 91247 0 0 0 0 0 0 0 0 0 0 0 0 0 23940117 0 0 0 0 1022833 0 0 0 0 0 0 0 0 739 1176966 83 213 253 2243389 758 207033 6503 1916 0 0 9173 0 12210 0 0 0 0 0 140 0 0 10 2058 554 0 0 0 18070 0 0 5083 0 0 0 0 224 0 48 0 0 0 2984 0 0 0 29162 0 49591 0 9466 0 0 0 0 0 0 0 0 159 159 0 0 374 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8365 0 0 0 0 25095 0 0 0 3686 0 0 7767 0 0 0 0 0 0 0 0 0 16034 0 0 0 0 0 231848 0 0 0 25090 0 0 0 3558 0 0 8736 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3144 0 3036 181465 0 0 1400 2 1403 1 504929 32592 637 0 0 12 15 0 0 3 0 3 30 0 0 2 0 6653 9 0 279 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 168 0 0 0 0 96 0 8 0 0 0 0 0 0 0 0 0 0 520 40 0 0 0 0 131 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 98 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 133 0 1 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 24 8 0 0 0 2 67 98 126 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    ctxt 61029826
    btime 1609501574
    processes 27212
    procs_running 1
    procs_blocked 0
    softirq 8564148 1172 1338008 1 3 243852 0 1611 5229125 0 1750376

    对应的时间类型定义在内核头文件 include/linux/kernel_stat.h,上图中 cpu[0...7] 后的数值跟这些类型依次对应:

    /*
     * 'kernel_stat.h' contains the definitions needed for doing
     * some kernel statistics (CPU usage, context switches ...),
     * used by rstatd/perfmeter
     */
    enum cpu_usage_stat {
        CPUTIME_USER, //用户空间占用cpu时间
        CPUTIME_NICE, //高nice任务(第优先级),用户空间占用时间
        CPUTIME_SYSTEM, //内核态占用cpu时间
        CPUTIME_SOFTIRQ, //软中断占用cpu时间
        CPUTIME_IRQ, //硬中断占用cpu时间
        CPUTIME_IDLE, //cpu空闲时间
        CPUTIME_IOWAIT, //cpu等待io时间
        CPUTIME_STEAL, //GuestOS等待real cpu时间
        CPUTIME_GUEST, //GuestOS消耗的时间
        CPUTIME_GUEST_NICE, //高nice任务(第优先级),GuestOS消耗的时间
        NR_STATS,
    };

    打印函数为 fs/proc/stat.c 中的 show_stat(),单位为 jiffie。在linux系统中,cputime模块具有重要的意义。它记录了设备中所有cpu在各个状态下经过的时间。我们所熟悉的top工具就是用cputime换算出的cpu利用率。

    2. per-cluster的在其各个频点下驻留的时间

    cpufreq_stats 模块的开启需要使能 CONFIG_CPU_FREQ_STAT 宏。当系统使能该特性后,cpufreq driver sysfs下生成 stats 目录:

    /sys/devices/system/cpu/cpufreq/policy0/stats # ls -l
    total 0
    --w-------    reset //可以对统计进行reset
    -r--r--r--    time_in_state //本cluster在各频点下驻留的时间,单位jiffy
    -r--r--r--    total_trans //频点之间总切换次数
    -r--r--r--    trans_table //频点转换表
    
    # cat /sys/devices/system/cpu/cpufreq/policy0/stats/time_in_state
    1800000 5647
    1700000 7
    ...
    200000 4221664

    表示的是该 cpufreq policy 内分别处于各个频点的时间,单位为 jiffies。有了这个功能,我们就能获取每个 cluster 运行最多的频点是哪些,进而针对性的对系统功耗性能进行优化。

    3. per-线程在各个频点下驻留的时间

    # cat /proc/913/time_in_state
    cpu0
    1800000 0
    ...
    1250000 2638
    ...
    200000 0
    cpu4
    2850000 0
    ...
    200000 0
    cpu7
    3050000 0
    ...
    1300000 9

    该节点记录了该线程在各个 cpufreq policy 的各个频点下驻留的时间, 单位为 clock_t。clock_t 是由 USER_HZ 来决定,该系统中 USER_HZ 为250,则 clock_t 代表4ms。

    4. per-cpu的cpuidle time

    # ls -l /sys/devices/system/cpu/cpu0/cpuidle
    drwxr-xr-x    driver
    drwxr-xr-x    state0
    drwxr-xr-x    state1
    drwxr-xr-x    state2
    drwxr-xr-x    state3
    drwxr-xr-x    state4
    drwxr-xr-x    state5
    drwxr-xr-x    state6
    
    # ls -l /sys/devices/system/cpu/cpu0/cpuidle/state0
    ...
    -r--r--r-- 1 root root 4096 2021-01-02 19:51 time
    
    # cat /sys/devices/system/cpu/cpu0/cpuidle/state*/time
    2675541339
    13746613328
    0
    0
    460
    24621035515
    0

    cpuidle time 模块的工作就是记录每个cpu在各层深度中睡了多久,即每次开机以来,每个核在每个 C-state下的时长,单位为 us。


    二、各种时间统计原理

    1. per-cpu的各种类型的使用时间

    cputime 模块代码位于 kernel/sched/cputime.c。由上图可见,统计的时间精度是1个tick。当每次timer中断来临时,kernel经过由中断处理函数调用到 irqtime_account_process_tick()(需要使能特性宏 CONFIG_IRQ_TIME_ACCOUNTING,将irq/softirq的统计囊括其中)。通过判断当前task是否为 softirq/user tick/idle进程/guest系统进程/内核进程,将经历的cpu时间(通常为1个tick)统计到各个类型中去。

    /*
     * Account a tick to a process and cpustat
     * @p: the process that the CPU time gets accounted to
     * @user_tick: is the tick from userspace
     * @rq: the pointer to rq
     *
     * Tick demultiplexing follows the order
     * - pending hardirq update
     * - pending softirq update
     * - user_time
     * - idle_time
     * - system time
     *   - check for guest_time
     *   - else account as system_time
     *
     * Check for hardirq is done both for system and user time as there is
     * no timer going off while we are on hardirq and hence we may never get an
     * opportunity to update it solely in system time.
     * p->stime and friends are only updated on system time and not on irq
     * softirq as those do not count in task exec_runtime any more.
     */
    static void irqtime_account_process_tick(struct task_struct *p, int user_tick, int ticks)
    {
        u64 other, cputime = TICK_NSEC * ticks;
    
        /*
         * When returning from idle, many ticks can get accounted at
         * once, including some ticks of steal, irq, and softirq time.
         * Subtract those ticks from the amount of time accounted to
         * idle, or potentially user or system time. Due to rounding,
         * other time can exceed ticks occasionally.
         */
        other = account_other_time(ULONG_MAX);
        if (other >= cputime)
            return;
    
        cputime -= other;
    
        if (this_cpu_ksoftirqd() == p) {
            /*
             * ksoftirqd time do not get accounted in cpu_softirq_time.
             * So, we have to handle it separately here.
             * Also, p->stime needs to be updated for ksoftirqd.
             */
            account_system_index_time(p, cputime, CPUTIME_SOFTIRQ);
        } else if (user_tick) {
            account_user_time(p, cputime);
        } else if (p == this_rq()->idle) {
            account_idle_time(cputime);
        } else if (p->flags & PF_VCPU) { /* System time or guest time */
            account_guest_time(p, cputime);
        } else {
            account_system_index_time(p, cputime, CPUTIME_SYSTEM);
        }
    }

    2. per-cluster的在其各个频点下驻留的时间

    cpufreq_times 模块代码位于 drivers/cpufreq/cpufreq_times.c,它的更新涉及到 cpufreq driver 与 cputime 两个模块。当 cpufreq policy 频率改变时,cpufreq driver 通过 cpufreq_notify_transition(普通调频模式)或者 cpufreq_driver_fast_switch(快速调频模式)调用 cpufreq_times_record_transition 函数,通知 cpufreq_times 模块当前该 policy 处于哪一个频点。当 cputime 模块接收到 timer 中断后,会调用 cpufreq_acct_update_power(),将该 tick 添加到 cpufreq_times 模块当前任务及当前频点的统计上。

    3. per-线程在各个频点下驻留的时间

    cpufreq_stats 模块代码位于 drivers/cpufreq/cpufreq_stats.c。它的更新有些类似于 cpufreq_times, 但与其不同的是只涉及 cpufreq driver 一个外部模块。当 cpufreq policy 频率改变时,cpufreq driver 通过 cpufreq_notify_transition(普通调频模式)或者 cpufreq_driver_fast_switch(快速调频模式)调用 cpufreq_times_record_transition 函数调用 cpufreq_stats_record_transition 函数,通知 cpufreq_stats 模块此刻发生调频以及要切换到哪一个目标频点。cpufreq_state 模块则调用 cpufreq_stats_update 获取当前 jiffies, 并与上一次更新时的 jiffies 相减,最后将差值添加到上个频点的时间统计中:

    //drivers\cpufreq\cpufreq_stats.c
    static void cpufreq_stats_update(struct cpufreq_stats *stats, unsigned long long time)
    {
        unsigned long long cur_time = get_jiffies_64();
    
        stats->time_in_state[stats->last_index] += cur_time - time;
        stats->last_time = cur_time;
    }

    4. per-cpu的cpuidle time

    cpuidle time 模块代码在 drivers/cpuidle/cpuidle.c。当某个 cpu runqueue 上没有 runnable 状态的任务时,该cpu调度到idle进程,经过层层调用,最后执行到 cpuidle_enter_state()函数。

    /**
     * cpuidle_enter_state - enter the state and update stats
     * @dev: cpuidle device for this cpu
     * @drv: cpuidle driver for this cpu
     * @index: index into the states table in @drv of the state to enter
     */
    int cpuidle_enter_state(struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) //drivers/cpuidle/cpuidle.c
    {
        int entered_state;
        ktime_t time_start, time_end;
        
        ...
        time_start = ns_to_ktime(local_clock());
        ...
        entered_state = target_state->enter(dev, drv, index);
        ...
        time_end = ns_to_ktime(local_clock());
        ...
        diff = ktime_sub(time_end, time_start);
        ...
        dev->last_residency_ns = diff;
        dev->states_usage[entered_state].time_ns += diff;
        ...
    }
  • 相关阅读:
    eclipse如何添加User Library
    Json字符串取值
    日常发现的小工具
    java获取json数组格式中的值
    每日总结一个面试题
    linux下备份还原mysql某个库(完整版)
    linux下安装zookeeper教程
    redis安装及常用命令
    dubbo-admin安装使用
    前端框架 一周使用经验积累
  • 原文地址:https://www.cnblogs.com/hellokitty2/p/15666357.html
Copyright © 2020-2023  润新知