Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
..............................................................................
Field Content
pid process id
tcomm filename of the executable
state state (R is running, S is sleeping, D is sleeping in an
uninterruptible wait, Z is zombie, T is traced or stopped)
ppid process id of the parent process
pgrp pgrp of the process
sid session id
tty_nr tty the process uses
tty_pgrp pgrp of the tty
flags task flags
min_flt number of minor faults
cmin_flt number of minor faults with child's
maj_flt number of major faults
cmaj_flt number of major faults with child's
utime user mode jiffies
stime kernel mode jiffies
cutime user mode jiffies with child's
cstime kernel mode jiffies with child's
priority priority level
nice nice level
num_threads number of threads
start_time time the process started after system boot
vsize virtual memory size
rss resident set memory size
rsslim current limit in bytes on the rss
start_code address above which program text can run
end_code address below which program text can run
start_stack address of the start of the stack
esp current value of ESP
eip current value of EIP
pending bitmap of pending signals (obsolete)
blocked bitmap of blocked signals (obsolete)
sigign bitmap of ignored signals (obsolete)
sigcatch bitmap of catched signals (obsolete)
wchan address where process went to sleep
0 (place holder)
0 (place holder)
exit_signal signal to send to parent thread on exit
task_cpu which CPU the task is scheduled on
rt_priority realtime priority
policy scheduling policy (man sched_setscheduler)
blkio_ticks time spent waiting for block IO
这其中就包括这个进程的stime和utime,而ps就是查看这个文件来获得进程运行的时间,从而计算出%CPU,那么stat这个文件中的stime和utime是怎样得到的呢?在fs/proc/array.c中定义了下面两个函数
int proc_tgid_stat(struct task_struct *task, char *buffer)
{
return do_task_stat(task, buffer, 1);
}
int proc_tid_stat(struct task_struct *task, char *buffer)
{
return do_task_stat(task, buffer, 0);
}
static int do_task_stat(struct task_struct *task, char *buffer, int whole)
{
...
/* add up live thread stats at the group level */
if (whole) {
struct task_struct *t = task;
do {
min_flt += t->min_flt;
maj_flt += t->maj_flt;
utime = cputime_add(utime, task_utime(t));
stime = cputime_add(stime, task_stime(t));
t = next_thread(t);
} while (t != task);
min_flt += sig->min_flt;
maj_flt += sig->maj_flt;
utime = cputime_add(utime, sig->utime);
stime = cputime_add(stime, sig->stime);
}
...
}
如果whole的值为1, 那么proc文件系统会把这个进程中各个线程的运行时间累加起来,其中next_thread这个函数就是获取这个进程中的下一个线程。在fork的时候,如果指定了CLONE_THREAD标志,也就是新创建的线程和它的父进程在同一个线程组,那么fork会它加入到这个线程中:
if (clone_flags & CLONE_THREAD) {
p->group_leader = current->group_leader;
list_add_tail_rcu(&p->thread_group, &p->group_leader->thread_group);
而next_thread就是没着它的thread_group所在的链表进行遍历,获取线程组中的每个线程。这样就可以解释为什么%CPU字段有超过100%了,因为分子是这个进程(线程组)中所有线程运行的时间,而在同一时刻,同一线程组中的两个不同线程可能在两个不同的CPU上运行,这样总的运行时间就有可能超过物理上真正过去的时间(分母)可见,这种情况只会在SMP的系统上发生。
[root@localhost 3013]# ps aux|grep firefox-bin
root 3091 15.6 26.6 374644 137048 ? Sl 10:05 47:49 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
[root@localhost 3013]# ps aux -L|grep firefox-bin
root 3091 3091 11.3 12 26.6 374644 137056 ? Sl 10:05 34:40 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
root 3091 3130 0.0 12 26.6 374644 137056 ? Sl 10:05 0:01 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
root 3091 3131 0.1 12 26.6 374644 137056 ? Sl 10:05 0:25 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
root 3091 3140 0.0 12 26.6 374644 137056 ? Sl 10:05 0:00 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
root 3091 3141 0.0 12 26.6 374644 137056 ? Sl 10:05 0:00 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
...
上面的L参数面显示其他的线程及其TID,进程号和线程号相同的线程就是它的第一个线程,即3091,进入这个目录可以看到:
[root@localhost proc]# cd 3091
[root@localhost 3091]# ls
attr clear_refs cpuset exe io maps mountstats root smaps status
auxv cmdline cwd fd limits mem oom_adj sched stat task
cgroup coredump_filter environ fdinfo loginuid mounts oom_score schedstat statm wchan
[root@localhost 3091]# cd task/
[root@localhost task]# ls
11850 11851 11853 11854 11855 3091 3130 3131 3140 3141 3142 3155 3158
[root@localhost task]# cd 3130
[root@localhost 3130]# ls
attr clear_refs cwd fd loginuid mounts root smaps status
auxv cmdline environ fdinfo maps oom_adj sched stat wchan
cgroup cpuset exe limits mem oom_score schedstat statm
在一个进程的目录中的task目录下会包含其他的线程的信息。实际上, 在内核中进程和线程并没有什么本质的区别,只不过如果fork的时候共享地址空间那就是线程,否则就是进程。
Email:wudx05@gmail.com
Blog:http://blog.chinaunix.net/u/22326/
Aquester2008-04-22 11:06:55
/* * linux/fs/proc/array.c * * Copyright (C) 1992 by Linus Torvalds * based on ideas by Darren Senn * * Fixes: * Michael. K. Johnson: stat,statm extensions. * * * Pauline Middelink : Made cmdline,envline only break at ' 's, to * make sure SET_PROCTITLE works. Also removed * bad '!' which forced address recalculation for * EVERY character on the current page.
Aquester2008-04-22 11:01:59
From (Eric W. Biederman) Subject [PATCH 2/4] proc: Rewrite do_task_stat to correctly handle pid namespaces. Date Mon, 19 Nov 2007 15:06:25 -0700 Currently (as pointed out by Oleg) do_task_stat has a race when calling task_pid_nr_ns with the task exiting. In addition do_task_stat is not currently displaying information in the context of the pid namespace that mounted the /proc filesystem. So "cut -d' ' -f 1 /proc//stat" may not equal . This patch fixes the problem by conve