- 前言
Java Thread Dump 是一个非常有用的应用诊断工具, 通过thread dump出来的信息, 可以定位到你需要了解的线程, 以及这个线程的调用栈. 如果配合linux的top命令, 可以找到你的系统中的最耗CPU的线程代码段, 这样才能有针对性地进行优化.
- 场景和实践
2.1. 后台系统一直是在黑盒运行, 除了能暂停一部分任务的执行, 根本无法知道哪些任务耗CPU过多。所以一直以为是业务代码的问题, 经过各种优化(删减没必要的逻辑, 合并写操作)等等优化, 系统负载还是很高. 没什么访问量, 后台任务处理也就是每天几百万的级别, load还是达到了15以上. CPU只有4核,天天收到load告警却无从下手, 于是乎就被迫来分析一把线程.
2.2 系统跑的是java tomcat, 要触发tomcat thread dump很简单, 先找到tomcat对应的进程id, 我们设置为PID
【linux 命令】: ps -ef | grep tomcat
可以找到, 然后给这个进程发送一个QUIT的信号量, 让其触发线程的dump, 下面的操作先别急着动手, 等到看完2.3再动手不迟
【linux 命令】: kill -3 $PID / kill -QUIT $PID
tomcat会把thread dump的内容输出到控制台
【linux 命令】:cd $tomcathome/logs/
查看 catalina.out 文件, 把最后的跟thread相关的内容获取出来.
大致内容如下:
2012-04-13 16:30:41 Full thread dump OpenJDK 64-Bit Server VM (1.6.0-b09 mixed mode): "TP-Processor12" daemon prio=10 tid=0x00000000045acc00 nid=0x7f19 in Object.wait() [0x00000000483d0000..0x00000000483d0a90] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on <0x00002aaab5bfce70> (a org.apache.tomcat.util.threads.ThreadPool$ControlRunnable) at java.lang.Object.wait(Object.java:502) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:662) - locked <0x00002aaab5bfce70> (a org.apache.tomcat.util.threads.ThreadPool$ControlRunnable) at java.lang.Thread.run(Thread.java:636) "TP-Processor11" daemon prio=10 tid=0x00000000048e3c00 nid=0x7f18 in Object.wait() [0x00000000482cf000..0x00000000482cfd10] java.lang.Thread.State: WAITING (on object monitor) .... "VM Thread" prio=10 tid=0x00000000042ff400 nid=0x77de runnable"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000000429c400 nid=0x77d9 runnable "GC task thread#1 (ParallelGC)" prio=10 tid=0x000000000429d800 nid=0x77da runnable "GC task thread#2 (ParallelGC)" prio=10 tid=0x000000000429ec00 nid=0x77db runnable "GC task thread#3 (ParallelGC)" prio=10 tid=0x00000000042a0000 nid=0x77dc runnable "VM Periodic Task Thread" prio=10 tid=0x0000000004348400 nid=0x77e5 waiting on condition JNI global references: 815 Heap PSYoungGen total 320192K, used 178216K [0x00002aaadce00000, 0x00002aaaf1800000, 0x00002aaaf1800000) eden space 303744K, 55% used [0x00002aaadce00000,0x00002aaae718e048,0x00002aaaef6a0000) from space 16448K, 65% used [0x00002aaaf0690000,0x00002aaaf110c1b0,0x00002aaaf16a0000) to space 16320K, 0% used [0x00002aaaef6a0000,0x00002aaaef6a0000,0x00002aaaf0690000) PSOldGen total 460992K, used 425946K [0x00002aaab3a00000, 0x00002aaacfc30000, 0x00002aaadce00000) object space 460992K, 92% used [0x00002aaab3a00000,0x00002aaacd9f6a30,0x00002aaacfc30000) PSPermGen total 56192K, used 55353K [0x00002aaaae600000, 0x00002aaab1ce0000, 0x00002aaab3a00000) object space 56192K, 98% used [0x00002aaaae600000,0x00002aaab1c0e520,0x00002aaab1ce0000)
最后一段是系统的对内存的使用情况.