最近发现java应用占用的内存和CPU都很高,第一反应是业务代码问题,跟开发反馈,开发说没问题,后来发现十几个微服务同样都是出现这种情况,让我不得不怀疑需要优化JVM的参数,其实也就是一些启动参数罢了。开发也没解决,只能自己硬着头皮上了。
这里总结一下排查的步骤:
首先是自己写了个脚本(文章最后粘贴)排查问题出现在哪里,报错如下所示:
[1] Busy(3.2%) thread(30444/0x76ec) stack of java process(30435) under user(root):
"VM Thread" os_prio=0 tid=0x00007f16800de800 nid=0x76ec runnable
[2] Busy(3.0%) thread(30442/0x76ea) stack of java process(30435) under user(root):
"Gang worker#3 (Parallel GC Threads)" os_prio=0 tid=0x00007f1680021800 nid=0x76ea runnable
[3] Busy(3.0%) thread(30441/0x76e9) stack of java process(30435) under user(root):
"Gang worker#2 (Parallel GC Threads)" os_prio=0 tid=0x00007f1680020000 nid=0x76e9 runnable
看的出来:"VM Thread"就是该cpu消耗较高的线程,查看相关文档我们得知,VM Thread是JVM层面的一个线程,主要工作是对其他线程的创建,分配和对象的清理等工作的。从后面几个线程也可以看出,JVM正在进行大量的GC工作。这里的原因已经比较明显了,即大量的GC工作导致项目运行缓慢。那么具体是什么原因导致这么多的GC工作呢,我们使用了jstat命令查看了内存使用情况:
看的出来FGC非常频繁,而且GCT时间也很久。
接下来再分析一下新老年代分配的空间,如下所示:
Attaching to process ID 22651, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.211-b12
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 536870912 (512.0MB)
NewSize = 503316480 (480.0MB)
MaxNewSize = 503316480 (480.0MB)
OldSize = 33554432 (32.0MB)
NewRatio = 2
SurvivorRatio = 4
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 419430400 (400.0MB)
used = 297926560 (284.1249084472656MB)
free = 121503840 (115.87509155273438MB)
71.0312271118164% used
Eden Space:
capacity = 335544320 (320.0MB)
used = 297926560 (284.1249084472656MB)
free = 37617760 (35.875091552734375MB)
88.78903388977051% used
From Space:
capacity = 83886080 (80.0MB)
used = 0 (0.0MB)
free = 83886080 (80.0MB)
0.0% used
从OldSize可以看出来老年代是32m,而NewSize是480m,因为年青代设置的实在是太大而年老代太小导致的FGC频繁次数严重。
其实从New Generation和Eden Space这两段来看也行,最好值是50%左右,如果相差太大也是有问题的。
最后重新设置xmn由原来的480改成200好了。这次设置的比例是年青代:年老代
为1:2
这是一个最简单的gc问题了。
总结一下FGC的原因:
(1) 调用System.gc()时,系统建议执行Full GC,但是不必然执行
(2) 老年代空间不足(老年代空间不足,在不GC就OOM,这其实可能是Major GC会和Full GC混淆使用情况)
(3) 方法区空间不足
(4) 通过Minor GC后进入老年代的平均大小大于老年代的可用内存
(5) 由Eden区、survivor space0 (From Space) 区向survivor space1 (To Space)区复制时,对象大小大于To Space可用内存,则把该对象转存到老年代,且老年代的可用内存小于该对象大小,其实也就是老年代空间不足的情况而已。
粘贴几个不错的排查博客:
https://blog.csdn.net/ym15229994318ym/article/details/106525945
https://www.cnblogs.com/three-fighter/p/14644152.html # 这个不错
脚本如下所示:
#!/bin/bash
readonly PROG=`basename $0`
readonly -a COMMAND_LINE=("$0" "$@")
usage() {
cat <<EOF
Usage: ${PROG} [OPTION]...
Find out the highest cpu consumed threads of java, and print the stack of these threads.
Example: ${PROG} -c 10
Options:
-p, --pid find out the highest cpu consumed threads from the specifed java process,
default from all java process.
-c, --count set the thread count to show, default is 5
-h, --help display this help and exit
EOF
exit $1
}
readonly ARGS=`getopt -n "$PROG" -a -o c:p:h -l count:,pid:,help -- "$@"`
[ $? -ne 0 ] && usage 1
eval set -- "${ARGS}"
while true; do
case "$1" in
-c|--count)
count="$2"
shift 2
;;
-p|--pid)
pid="$2"
shift 2
;;
-h|--help)
usage
;;
--)
shift
break
;;
esac
done
count=${count:-5}
redEcho() {
[ -c /dev/stdout ] && {
# if stdout is console, turn on color output.
echo -ne " 33[1;31m"
echo -n "$@"
echo -e " 33[0m"
} || echo "$@"
}
yellowEcho() {
[ -c /dev/stdout ] && {
# if stdout is console, turn on color output.
echo -ne " 33[1;33m"
echo -n "$@"
echo -e " 33[0m"
} || echo "$@"
}
blueEcho() {
[ -c /dev/stdout ] && {
# if stdout is console, turn on color output.
echo -ne " 33[1;36m"
echo -n "$@"
echo -e " 33[0m"
} || echo "$@"
}
# Check the existence of jstack command!
if ! which jstack &> /dev/null; then
[ -z "$JAVA_HOME" ] && {
redEcho "Error: jstack not found on PATH!"
exit 1
}
! [ -f "$JAVA_HOME/bin/jstack" ] && {
redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack file does NOT exists!"
exit 1
}
! [ -x "$JAVA_HOME/bin/jstack" ] && {
redEcho "Error: jstack not found on PATH and $JAVA_HOME/bin/jstack is NOT executalbe!"
exit 1
}
export PATH="$JAVA_HOME/bin:$PATH"
fi
readonly uuid=`date +%s`_${RANDOM}_$$
cleanupWhenExit() {
rm /tmp/${uuid}_* &> /dev/null
}
trap "cleanupWhenExit" EXIT
printStackOfThreads() {
local line
local count=1
while IFS=" " read -a line ; do
local pid=${line[0]}
local threadId=${line[1]}
local threadId0x="0x`printf %x ${threadId}`"
local user=${line[2]}
local pcpu=${line[4]}
local jstackFile=/tmp/${uuid}_${pid}
[ ! -f "${jstackFile}" ] && {
{
if [ "${user}" == "${USER}" ]; then
jstack ${pid} > ${jstackFile}
else
if [ $UID == 0 ]; then
sudo -u ${user} jstack ${pid} > ${jstackFile}
else
redEcho "[$((count++))] Fail to jstack Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user})."
redEcho "User of java process($user) is not current user($USER), need sudo to run again:"
yellowEcho " sudo ${COMMAND_LINE[@]}"
echo
continue
fi
fi
} || {
redEcho "[$((count++))] Fail to jstack Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user})."
echo
rm ${jstackFile}
continue
}
}
blueEcho "[$((count++))] Busy(${pcpu}%) thread(${threadId}/${threadId0x}) stack of java process(${pid}) under user(${user}):"
sed "/nid=${threadId0x} /,/^$/p" -n ${jstackFile}
done
}
ps -Leo pid,lwp,user,comm,pcpu --no-headers | {
[ -z "${pid}" ] &&
awk '$4=="java"{print $0}' ||
awk -v "pid=${pid}" '$1==pid,$4=="java"{print $0}'
} | sort -k5 -r -n | head --lines "${count}" | printStackOfThreads