线上问题出现有个服务多台机器CPU飙升到90%
1、查看系统CPU占用
top -H -p pid
top - 11:51:57 up 110 days, 14:01, 0 users, load average: 7.08, 7.64, 8.15
Threads: 586 total, 11 running, 575 sleeping, 0 stopped, 0 zombie
%Cpu(s): 82.5 us, 7.0 sy, 0.0 ni, 8.0 id, 1.5 wa, 0.0 hi, 1.0 si, 0.0 st
KiB Mem : 8174716 total, 157652 free, 6103332 used, 1913732 buff/cache
KiB Swap: 0 total, 0 free, 0 used. 1741096 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
31810 * 20 0 8485932 4.591g 0 R 53.9 58.9 117:41.64 java
32411 * 20 0 8485932 4.591g 0 R 44.7 58.9 85:29.14 java
31626 * 20 0 8485932 4.591g 0 S 21.7 58.9 0:53.47 java
31716 * 20 0 8485932 4.591g 0 R 19.1 58.9 461:14.88 java
6757 * 20 0 8485932 4.591g 0 R 15.5 58.9 27:29.25 java
7217 * 20 0 8485932 4.591g 0 R 14.8 58.9 27:17.37 java
7216 * 20 0 8485932 4.591g 0 R 12.8 58.9 27:21.51 java
5295 * 20 0 8485932 4.591g 0 R 11.8 58.9 28:12.26 java
31627 * 20 0 8485932 4.591g 0 S 11.5 58.9 15:56.04 java
....
2、查看对应的具体线程占用资源情况,获取16进制线程id
printf “%x ” pid
printf '%x
' 32426
7eaa
3、导出jstack日志
"logback-6" #528 daemon prio=5 os_prio=0 tid=0x00007fda40005000 nid=0x7eab waiting on condition [0x00007fda3c3df000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000765aea438> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - None "logback-5" #527 daemon prio=5 os_prio=0 tid=0x00007fda40004800 nid=0x7eaa waiting on condition [0x00007fda1db8d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000765aea438> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - None
4、使用第2步的id查找jstack日志,logback-5即线程名
"logback-5" #527 daemon prio=5 os_prio=0 tid=0x00007fda40004800 nid=0x7eaa waiting on condition [0x00007fda1db8d000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000765aea438> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1081) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1074) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1134) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Locked ownable synchronizers: - None
根据线程名查找代码
5、由于该服务日志量很大,且打印的都是trace日志,而且打印的是异步日志
ch.qos.logback.core.AsyncAppenderBase#put private void put(E eventObject) { if (neverBlock) { blockingQueue.offer(eventObject); } else { putUninterruptibly(eventObject); } }
综上所述排查顺序,则缩小打印日志量应该就可以减少日志打印的cpu占用情况,而且应该也能减少CMS 老年代垃圾回收时间。准备关闭trace日志,只使用公司的全链路日志,优化效果应该非常明显。