问题描述
通过压测,发现系统最后会停止响应,不再接收新的请求。
查看日志,发现有如下申请直接内存错误。
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 2130706439, max: 2147483648)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:742) ~[netty-all-4.1.47.Final.jar!/:4.1.47.Final]
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:697) ~[netty-all-4.1.47.Final.jar!/:4.1.47.Final]
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:758) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:734) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:245) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:215) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.buffer.PoolArena.allocate(PoolArena.java:147) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:356) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:139) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114) ~[netty-all-4.1.47.Final.jar!/:?]
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147) [netty-all-4.1.47.Final.jar!/:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714) [netty-all-4.1.47.Final.jar!/:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650) [netty-all-4.1.47.Final.jar!/:?]
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576) [netty-all-4.1.47.Final.jar!/:?]
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493) [netty-all-4.1.47.Final.jar!/:?]
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989) [netty-all-4.1.47.Final.jar!/:?]
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-all-4.1.47.Final.jar!/:4.1.47.Final]
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) [netty-all-4.1.47.Final.jar!/:?]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_191]
问题分析
堆内存
首先怀疑是堆内存泄露,通过jmap
命令查看堆内存使用情况,发现使用正常。Xmx设置是8G,只使用了954M
如果确认是堆溢出,使用命令jmap -dump:format=b,file=heap2.hprof 28624
将堆快照dump出来,再使用MAT/JProfiler等工具分析,同时可以使用jConsole和JVisualVM监控堆内存分配情况。
jmap -heap 17667
Attaching to process ID 17667, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.191-b12
using thread-local object allocation.
Garbage-First (G1) GC with 4 thread(s)
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 8589934592 (8192.0MB)
NewSize = 1363144 (1.2999954223632812MB)
MaxNewSize = 5152702464 (4914.0MB)
OldSize = 5452592 (5.1999969482421875MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 268435456 (256.0MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 2097152 (2.0MB)
Heap Usage:
G1 Heap:
regions = 4096
capacity = 8589934592 (8192.0MB)
used = 1001361104 (954.9723663330078MB)
free = 7588573488 (7237.027633666992MB)
11.657377518713474% used
G1 Young Generation:
Eden Space:
regions = 54
capacity = 1847590912 (1762.0MB)
used = 113246208 (108.0MB)
free = 1734344704 (1654.0MB)
6.129398410896708% used
Survivor Space:
regions = 0
capacity = 0 (0.0MB)
used = 0 (0.0MB)
free = 0 (0.0MB)
0.0% used
G1 Old Generation:
regions = 429
capacity = 1113587712 (1062.0MB)
used = 888114896 (846.9723663330078MB)
free = 225472816 (215.0276336669922MB)
79.75257686751486% used
43044 interned Strings occupying 5008680 bytes.
堆外内存
分析堆外内存。通过设置-Xmx参数只能限制java进程中Java Heap的内存大小。而java进程的内存是由Java Heap、Class、Thread、Code、GC、Internal、Symbol、Native Memory Tracking、unknown这几部分组成。
比如当系统网络请求过大时,Internal部分的内存会显著上升,占用大量内存。
实际场景中,要结合top,jmap,NMT工具对java进程的内存进行分析。
jcmd配置NMT使用。NMT的全称是Native Memory Tracker ,是一个本地内存跟踪工具。
常用来分析JVM的内存使用情况。NMT功能默认关闭,可以通过以下方式开启:
-XX:NativeMemoryTracking=[off | summary | detail]
配置项 说明
off 默认配置
summary 只收集汇总信息
detail 收集每次调用的信息
注意,根据Java官方文档,开启NMT会有5%-10%的性能损耗;
如果想JVM退出时打印退出时的内存使用情况,可以通过如下配置项:
-XX:+UnlockDiagnosticVMOptions -XX:+PrintNMTStatistics
通过jcmd查看堆NMT内存、类内存、内部内存等详细信息
jcmd 17667 VM.native_memory detail scale=MB >temp.txt
17667:
Native Memory Tracking:
Total: reserved=15306MB, committed=11529MB
- Java Heap (reserved=8192MB, committed=5678MB)
(mmap: reserved=8192MB, committed=5678MB)
- Class (reserved=1146MB, committed=138MB)
(classes #23198)
(malloc=4MB #56498)
(mmap: reserved=1142MB, committed=134MB)
- Thread (reserved=461MB, committed=461MB)
(thread #458)
(stack: reserved=459MB, committed=459MB)
(malloc=2MB #2292)
(arena=1MB #912)
- Code (reserved=265MB, committed=118MB)
(malloc=21MB #25211)
(mmap: reserved=244MB, committed=97MB)
- GC (reserved=409MB, committed=315MB)
(malloc=73MB #151121)
(mmap: reserved=336MB, committed=243MB)
- Compiler (reserved=2MB, committed=2MB)
(malloc=2MB #3347)
- Internal (reserved=4778MB, committed=4778MB)
(malloc=4778MB #218361)
- Symbol (reserved=27MB, committed=27MB)
(malloc=23MB #249386)
(arena=4MB #1)
- Native Memory Tracking (reserved=11MB, committed=11MB)
(tracking overhead=11MB)
- Unknown (reserved=16MB, committed=0MB)
(mmap: reserved=16MB, committed=0MB)
Virtual memory map:
[0x00000005c0000000 - 0x00000007c0000000] reserved 8192MB for Java Heap from
[0x00007fce6c249efb] ReservedSpace::initialize(unsigned long, unsigned long, bool, char*, unsigned long, bool)+0xab
[0x00007fce6c24a4e2] ReservedHeapSpace::ReservedHeapSpace(unsigned long, unsigned long, bool, char*)+0x52
[0x00007fce6c216f65] Universe::reserve_heap(unsigned long, unsigned long)+0x65
[0x00007fce6bd6e681] G1CollectedHeap::initialize()+0x151
发现Internal区域很大。可以通过如下方式,对比内存变化。
一般对于堆外内存缓慢增长直到爆炸的情况来说,可以先设一个基线jcmd pid VM.native_memory baseline。
然后等放一段时间后再去看看内存增长的情况,通过jcmd pid VM.native_memory detail.diff(summary.diff)做一下 summary 或者 detail 级别的 diff。这边堆外内存我们重点关注 Internal 的内存增长,如果增长十分明显的话那就是有问题了。
jcmd 17667 VM.native_memory baseline
jcmd 13527 VM.native_memory scale=MB detail.diff --包含具体的内存分配信息
jcmd 13527 VM.native_memory scale=MB summary.diff --只包含内存概要信息
通过如上步骤基本确认是堆外内存泄露了。为了进一步监控直接内存,可以使用如下代码,在启动时加载。
package netty.util;
import java.lang.reflect.Field;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.util.ReflectionUtils;
import io.netty.util.internal.PlatformDependent;
/*直接内存监控*/
public class DirectMemoryReporter {
private static final Logger LOGGER = LoggerFactory.getLogger(DirectMemoryReporter.class);
private static final int _1K = 1024;
private static final String BUSINESS_KEY = "netty_direct_memory";
private AtomicLong directMemory;
public void init() {
Field field = ReflectionUtils.findField(PlatformDependent.class, "DIRECT_MEMORY_COUNTER");
field.setAccessible(true);
try {
directMemory = ((AtomicLong) field.get(PlatformDependent.class));
startReport();
} catch (IllegalAccessException e) {
LOGGER.error("netty_direct_memory error", e);
}
}
public void startReport() {
Runnable runnable = () -> {
doReport();
};
ScheduledExecutorService service = Executors.newSingleThreadScheduledExecutor();
// 第二个参数为首次执行的延时时间,第三个参数为定时执行的间隔时间
service.scheduleAtFixedRate(runnable, 30, 1, TimeUnit.SECONDS);
}
private void doReport() {
try {
int memoryInKB = (int) (directMemory.get());
LOGGER.info("{}:{}KB", BUSINESS_KEY, memoryInKB/1024);
} catch (Exception e) {
LOGGER.error("netty_direct_memory error", e);
}
}
}
可以通过new DirectMemoryReporter().init()启动。之后每隔一秒可以在日志中看到直接内存的占用情况。
2021-03-02 09:58:59,073|[INFO]|[pool-11-thread-1]|[]|xxx.DirectMemoryReporter|netty_direct_memory:2032MB(2130706439B)
通过监控,发现随着压测,直接内存占用会不断上涨,直到涨到Xmx设置的值(如果不设置-XX:MaxDirectMemorySize默认与Xmx相同)
定位
上面基本确认了是堆外直接内存泄露问题,下面要具体定位是哪里泄露的,需要使用到pmap(查看内存分配情况)、gdb(根据可能的内存泄露地址解析出实际内容)、strace(分析线程的内存分配情况)、jstack(定位线程的具体信息)
注意,除了pmap出来的信息大小比较小,strace和jstack的日志可能比较大,以当前定位为例,最终jstack日志6G、strace日志1G,传到本地,文本编辑器打不开。因此尽量在linux环境下进行操作。
1、启动内存pmap、内存分配strace、进程jstack跟踪
先使用pmap -X pid >pmap1.txt
保存当前内存情况,同时启动strace和jstack跟踪
当跑性能跑出溢出后,再使用pmap -X pid >pmap2.txt
保存溢出时的内存分配
pmap
pmap -X pid >> pmap1.txt
strace
在压测前执行该命令,并保持会话
strace -o 日志 -p pid -Ff -Tt -e trace=brk,mmap,munmap
jstack
通过脚本test.sh,每隔一段时间打印一次线程快照
#!/bin/bash
while [ 1 ];
do
jstack -l 16067 >> /opt/jstack_0301.txt
sleep 10
done
./test.sh &
通过对比找出可能的泄露的内存地址
通过分析
7f4674483000 rw-p 00000000 00:00 0 1575296 1574868 1574868 1535980 1574868 0 0
这一行增加的数据很大
2、gdb获取可能泄露的内存
通过偏移计算,使用gdb获取可能泄露的内存的内容,此步需要确认泄露的内存地址,如果内容中有时间,将时间计算出来
gdb
结束地址可以直接使用10进制设置偏移,如起始地址是0x7f4674483000,偏移1G 即偏移1x1024x1024x1024(1073741824),则结束地址为0x7f4674483000+1073741824,注意地址要以0x开头,pmap出来的地址省略了0x
gdb attach pid
c
dump memory 文件名 开始地址 结束地址
打开文件就可以看到解析的内存内容
可以通过vi/hexdump命令查看
下面是实例
[root@host-x-x-x-x ~]# gdb attach 16067
GNU gdb (GDB) 9.1
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
···省略
warning: Cannot parse .gnu_debugdata section; LZMA support was disabled at compile time
--Type <RET> for more, q to quit, c to continue without paging--c #此处输入c,并回车,则进入gdb
(gdb) dump memory out.bin 0x7f4674483000 0x7f4674483000+1073741824 #解析对应内存内容,一般需要计算偏移,如64M
(gdb) q #输入q,退出
A debugging session is active.
Inferior 1 [process 16067] will be detached.
Quit anyway? (y or n) y #输入y,退出
Detaching from program: /root/jdk-8u191/bin/java, process 16067
[Inferior 1 (process 16067) detached]
解析后的文件名为out.bin,打开后可以查看解析后的内容
[root@host-x-x-x-x ~]# ls -alh out.bin
-rw------- 1 root root 1.0G Mar 2 11:01 out.bin
[root@host-10-33-144-227 ~]# vi out.bin
^@^@^@^@^@^@^@^@^B^P^@^A^@^@^@^@^@^@^@^A^@^@^@^@^G^@^@ó^@^@^@^@DATA^M
2 xxx@xxx^M
3 To: xxx^M
4 Message-ID: <651317388.574761.1614644538528.JavaMail.root@host-xxx>^M
5 Subject: =?UTF-8?B?5rWL6K+V6YKu5Lu2?=^M
6 MIME-Version: 1.0^M
7 Content-Type: multipart/mixed; ^M
8 boundary="----=_Part_574760_1068795689.1614644538528"^M
9 ^M
10 ------=_Part_574760_1068795689.1614644538528^M
11 Content-Type: text/plain; charset=UTF-8^M
12 Content-Transfer-Encoding: base64^M
13 ^M
14 6L+Z5piv5LiA5p2h5rWL6K+V6YKu5Lu277yBClRoaXMgaXMgYSB0ZXN0IGVtYWlsIQDATA^
[root@host-10-33-144-227 ~]# hexdump -C out.bin #使用十六进制查看
00219420 41 55 54 48 20 4c 4f 47 49 4e 0d 0a 40 68 6d 61 |AUTH LOGIN..@xx|
00219430 69 6c 2e 68 75 61 77 65 69 2e 63 6f 6d 0d 0a 54 |xx.xxx.com..T|
00219440 6f 3a 20 6c 68 6e 39 34 31 34 40 68 6d 61 69 6c |o: xxx|
00219450 2e 68 75 61 77 65 69 2e 63 6f 6d 0d 0a 4d 65 73 |.xxx.com..Mes|
00219460 73 61 67 65 2d 49 44 3a 20 3c 31 33 38 38 32 34 |sage-ID: <138824|
00219470 33 34 39 2e 35 38 31 31 31 33 2e 31 36 31 34 36 |349.581113.16146|
00219480 34 34 39 35 33 32 32 30 2e 4a 61 76 61 4d 61 69 |44953220.JavaMai|
00219490 6c 2e 72 6f 6f 74 40 68 6f 73 74 2d 31 30 2d 33 |l.root@host-10-3|
002194a0 33 2d 31 34 34 2d 31 39 30 3e 0d 0a 53 75 62 6a |3-144-190>..Subj|
002194b0 65 63 74 3a 20 3d 3f 55 54 46 2d 38 3f 42 3f 35 |ect: =?UTF-8?B?5|
002194c0 72 57 4c 36 4b 2b 56 36 59 4b 75 35 4c 75 32 3f |rWL6K+V6YKu5Lu2?|
002194d0 3d 0d 0a 4d 49 4d 45 2d 56 65 72 73 69 6f 6e 3a |=..MIME-Version:|
002194e0 20 31 2e 30 0d 0a 43 6f 6e 74 65 6e 74 2d 54 79 | 1.0..Content-Ty|
002194f0 70 65 3a 20 6d 75 6c 74 69 70 61 72 74 2f 6d 69 |pe: multipart/mi|
00219500 78 65 64 3b 20 0d 0a 09 62 6f 75 6e 64 61 72 79 |xed; ...boundary|
00219510 3d 22 2d 2d 2d 2d 3d 5f 50 61 72 74 5f 35 38 31 |="----=_Part_581|
00219520 31 31 32 5f 37 38 38 34 33 32 36 31 34 2e 31 36 |112_788432614.16|
00219530 31 34 36 34 34 39 35 33 32 32 30 22 0d 0a 0d 0a |14644953220"....|
00219540 2d 2d 2d 2d 2d 2d 3d 5f 50 61 72 74 5f 35 38 31 |------=_Part_581|
00219550 31 31 32 5f 37 38 38 34 33 32 36 31 34 2e 31 36 |112_788432614.16|
00219560 31 34 36 34 34 39 35 33 32 32 30 0d 0a 43 6f 6e |14644953220..Con|
00219570 74 65 6e 74 2d 54 79 70 65 3a 20 74 65 78 74 2f |tent-Type: text/|
00219580 70 6c 61 69 6e 3b 20 63 68 61 72 73 65 74 3d 55 |plain; charset=U|
00219590 54 46 2d 38 0d 0a 43 6f 6e 74 65 6e 74 2d 54 72 |TF-8..Content-Tr|
002195a0 61 6e 73 66 65 72 2d 45 6e 63 6f 64 69 6e 67 3a |ansfer-Encoding:|
002195b0 20 62 61 73 65 36 34 0d 0a 0d 0a 36 4c 2b 5a 35 | base64....6L+Z5|
002195c0 70 69 76 35 4c 69 41 35 70 32 68 35 72 57 4c 36 |piv5LiA5p2h5rWL6|
002195d0 4b 2b 56 36 59 4b 75 35 4c 75 32 37 37 79 42 43 |K+V6YKu5Lu277yBC|
002195e0 6c 52 6f 61 58 4d 67 61 58 4d 67 59 53 42 30 5a |lRoaXMgaXMgYSB0Z|
002195f0 58 4e 30 49 47 56 74 59 57 6c 73 49 51 3d 3d 0d |XN0IGVtYWlsIQ==.|
00219600 0a 2d 2d 2d 2d 2d 2d 3d 5f 50 61 72 74 5f 35 38 |.------=_Part_58|
00219610 31 31 31 32 5f 37 38 38 34 33 32 36 31 34 2e 31 |1112_788432614.1|
00219620 36 31 34 36 34 34 39 35 33 32 32 30 2d 2d 0d 0a |614644953220--..|
00219630 2e 0d 0a 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00219640 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00219820 45 48 4c 4f 20 68 6f 73 74 2d 31 30 2d 33 33 2d |EHLO host-xxx-|
00219830 31 34 34 2d 31 39 30 0d 0a 00 00 00 00 00 00 00 |xx-xx.........|
00219840 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
通过内容,确认数据时间(注意到秒,后面3位毫秒值去掉),如
[root@host-x-x-x-x ~]# date -d @1614625596
Tue Mar 2 03:06:36 CST 2021
3、strace查看申请内存的pid
通过strace日志,查找上一步得到的泄露的内存地址是哪个进程申请的,通过搜索内存地址,可以查看是否该地址只被该进行mmap没有munmap,如果没有进行释放,则怀疑是其导致内存泄露
通过上面分析,内存0x7f4674483000确认发生泄露,查找泄露内存的线程
[root@host-x-x-x-x opt]# grep "0x7f4674483000" strace_0301.txt
25897 08:22:40.071076 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4674483000 <0.000524>
由查找结果,发现线程25897只使用了mmap申请内存(大小为16M),未进行释放munmap,存在泄漏
通过申请内存的大小16781312,我们可以过滤一下
[root@host-10-33-144-227 opt]# grep 16781312 strace_0301.txt |wc -l
96
[root@host-10-33-144-227 opt]# grep 16781312 strace_0301.txt
25900 22:48:40.293407 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46d34e2000 <0.001026>
25898 22:48:40.779514 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46d24e1000 <0.000931>
25895 22:48:42.072311 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 22:48:43.126104 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46d04df000 <0.000846>
25896 22:48:43.453893 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 22:48:54.240661 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 22:48:55.871836 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 22:48:58.918825 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46cc4db000 <0.000926>
25899 23:40:52.070042 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25900 23:40:52.076028 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ca4d9000 <0.001435>
25894 23:40:53.138579 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 23:40:53.993883 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 23:40:54.021415 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46c74d6000 <0.000961>
25896 23:40:55.046603 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46c64d5000 <0.001035>
25898 23:41:01.209776 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46c54d4000 <0.001205>
25893 23:41:38.545986 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 00:32:43.584853 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 00:32:45.662573 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25900 00:32:46.247013 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25896 00:32:47.807808 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46c04cf000 <0.000738>
25894 00:32:48.105049 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46bf4ce000 <0.000832>
25895 00:32:48.646481 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46be4cd000 <0.000840>
25897 00:32:51.705296 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 00:33:31.796557 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 01:24:53.214635 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 01:24:54.268791 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25900 01:24:55.509481 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b94c8000 <0.001084>
25894 01:24:55.975972 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 01:24:56.313316 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b74c6000 <0.000877>
25896 01:24:57.998530 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b64c5000 <0.001148>
25895 01:25:02.973712 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 01:25:39.658599 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b44c3000 <0.000751>
25898 02:16:08.550683 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b34c2000 <0.000872>
25894 02:16:09.253646 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b24c1000 <0.000981>
25900 02:16:09.868975 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 02:16:10.299096 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46b04bf000 <0.000774>
25895 02:16:10.898468 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46af4be000 <0.000921>
25896 02:16:11.490967 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ae4bd000 <0.000816>
25899 02:16:16.064957 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ad4bc000 <0.001132>
25893 02:16:56.362348 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 03:07:37.304262 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46ab4ba000 <0.000832>
25899 03:07:37.985234 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46aa4b9000 <0.000840>
25895 03:07:40.860608 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a94b8000 <0.001047>
25900 03:07:41.419694 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25896 03:07:43.400255 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 03:07:43.572092 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a64b5000 <0.001165>
25894 03:07:49.068627 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a54b4000 <0.001139>
25893 03:08:22.889127 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 04:00:37.946945 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a34b2000 <0.001101>
25900 04:00:41.183475 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 04:00:41.470902 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46a14b0000 <0.000641>
25896 04:00:42.676048 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 04:00:43.223683 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469f4ae000 <0.001021>
25897 04:00:45.400673 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469e4ad000 <0.000564>
25895 04:00:46.895991 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 04:01:24.135505 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 04:52:44.120418 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469b4aa000 <0.000958>
25894 04:52:46.038504 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469a4a9000 <0.001086>
25900 04:52:46.404094 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46994a8000 <0.001060>
25896 04:52:46.780690 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 04:52:47.319693 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 04:52:47.591897 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 04:52:54.528811 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46954a4000 <0.000924>
25893 04:53:27.404271 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 05:44:25.134003 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 05:44:26.525414 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25900 05:44:27.165630 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 05:44:28.995850 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f469049f000 <0.000962>
25896 05:44:31.020882 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 05:44:31.657577 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 05:44:33.198010 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f468d49c000 <0.000796>
25893 05:45:08.843821 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25898 06:37:21.368773 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f468b49a000 <0.000512>
25899 06:37:23.990442 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f468a499000 <0.000969>
25894 06:37:24.215510 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4689498000 <0.001021>
25897 06:37:26.174882 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 06:37:26.414563 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4687496000 <0.000920>
25900 06:37:26.507630 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25896 06:37:39.847076 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 06:37:51.091554 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 07:31:27.933343 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25896 07:31:29.163123 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4682491000 <0.000441>
25898 07:31:29.324907 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4681490000 <0.000298>
25900 07:31:31.294559 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25893 07:31:31.443325 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 07:31:39.104532 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467e48d000 <0.000974>
25895 07:31:43.078283 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25897 07:31:52.607503 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467c48b000 <0.001193>
25898 08:22:19.669250 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467b48a000 <0.000921>
25900 08:22:20.093744 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f467a489000 <0.000832>
25896 08:22:22.625889 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4679488000 <0.001093>
25893 08:22:23.076138 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25894 08:22:27.951371 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25895 08:22:38.615218 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
25899 08:22:38.734819 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4675484000 <0.000974>
25897 08:22:40.071076 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f4674483000 <0.000524>
申请的总内存为1536.375MB(16781312*96=1611005952),发现与第2步中分析可能泄露的内存
7f4674483000 rw-p 00000000 00:00 0 1575296 1574868 1574868 1535980 1574868 0 0
中的大小1575296/1024=1538.375MB,基本相等。由此推断,这些内存均是泄露的内存
4、jstack日志查找线程
通过上一步骤获取的pid 25897,转换16进制后6529(`printf "%x " 25897),到jstack日志中查找,通过对比上一步骤的时间与jstack日志的时间及线程号,找到其对应的线程(使用的是线程池,会重复利用)
[root@host-x-x-x-x opt]# printf "%x
" 25897
6529
[root@host-x-x-x-x opt]# grep 6529 -C 10 jstack_0301.txt |more
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"WorkerPetalGroup-9-5" #236 prio=5 os_prio=0 tid=0x0000000002d4b000 nid=0x6529 waiting for monitor entry [0x00007f47142cc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1906)
- waiting to lock <0x000000009bfd68b8> (a java.lang.Object)
at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
at org.apache.activemq.ActiveMQMessageProducerSupport.send(ActiveMQMessageProducerSupport.java:269)
at org.apache.james.queue.activemq.ActiveMQMailQueue.produceMail(ActiveMQMailQueue.java:203)
at org.apache.james.queue.jms.JMSMailQueue.enQueue(JMSMailQueue.java:275)
at org.apache.james.queue.api.MailQueue.enQueue(MailQueue.java:96)
at org.apache.james.queue.jms.JMSMailQueue.enQueue(JMSMailQueue.java:304)
--
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Locked ownable synchronizers:
- None
"WorkerPetalGroup-9-5" #236 prio=5 os_prio=0 tid=0x0000000002d4b000 nid=0x6529 waiting for monitor entry [0x00007f47142cc000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.activemq.ActiveMQSession.send(ActiveMQSession.java:1906)
- waiting to lock <0x000000009bfd68b8> (a java.lang.Object)
at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:288)
at org.apache.activemq.ActiveMQMessageProducer.send(ActiveMQMessageProducer.java:223)
at org.apache.activemq.ActiveMQMessageProducerSupport.send(ActiveMQMessageProducerSupport.java:269)
at org.apache.james.queue.activemq.ActiveMQMailQueue.produceMail(ActiveMQMailQueue.java:203)
at org.apache.james.queue.jms.JMSMailQueue.enQueue(JMSMailQueue.java:275)
at org.apache.james.queue.api.MailQueue.enQueue(MailQueue.java:96)
at org.apache.james.queue.jms.JMSMailQueue.enQueue(JMSMailQueue.java:304)
--
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
可以发现该线程WorkerPetalGroup-9-5为我们自定义的Netty线程,为了进一步确认线程,需要根据时间,如22:48:58
25897 22:48:58.918825 mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f46cc4db000 <0.000926>
在jstack的日志中搜索2021-03-01 22:48:5,因为jstack每10秒打一次,从下面的搜索结果向下搜索线程 ,当前行号是8093667
找到如下内容
8114644 "WorkerPetalGroup-9-5" #236 prio=5 os_prio=0 tid=0x0000000002d4b000 nid=0x6529 runnable [0x00007f47142cd000]
8114645 java.lang.Thread.State: RUNNABLE
8114646 at sun.nio.ch.NativeThread.current(Native Method)
8114647 at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:326)
8114648 - locked <0x000000009a3d1f10> (a java.lang.Object)
8114649 - locked <0x000000009a3d1ef0> (a java.lang.Object)
8114650 at io.netty.buffer.PooledByteBuf.setBytes(PooledByteBuf.java:253)
8114651 at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:1133)
8114652 at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:350)
8114653 at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:148)
8114654 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
8114655 at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
8114656 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
8114657 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
8114658 at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
8114659 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
8114660 at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
8114661 at java.lang.Thread.run(Thread.java:748)
8114662
8114663 Locked ownable synchronizers:
8114664 - None
通过以上样例发现,内存的申请都是调用了
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163)
在代码中查找
io.netty.channel.nio.AbstractNioByteChannel.NioByteUnsafe#read
发现在该方法中申请了内存
5、分析代码
由于netty的堆外内存回收依赖于ByteBuf的引用计数器,可以通过refCnt()方法获取,由于继承了io.netty.channel.SimpleChannelInboundHandler#channelRead,该方法会调用释放方法释放堆外内存(当引用计数为0时才会释放)
@Override
public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
boolean release = true;
try {
if (acceptInboundMessage(msg)) {
@SuppressWarnings("unchecked")
I imsg = (I) msg;
channelRead0(ctx, imsg);
} else {
release = false;
ctx.fireChannelRead(msg);
}
} finally {
if (autoRelease && release) {
ReferenceCountUtil.release(msg);
}
}
}
通过在第15行打断点,观察msg在调用ReferenceCountUtil.release(msg);前和调用后的refCnt()的值,如果调用后该值仍不为0,则会导致堆外内存无法释放。可以在代码中查看是否使用了msg.retain()方法(该方法会使引用计数+1)。
其他
自定义Netty线程名称
由于有组件也依赖于Netty,为了区分线程,需要设置业务线程名称,如下
// 指定线程池并指定线程名称
bossGroup = new NioEventLoopGroup(new DefaultThreadFactory("BossPetalGroup"));
workerGroup = new NioEventLoopGroup(ioWorker, new DefaultThreadFactory("WorkerPetalGroup"));
涉及的工具
1、可以使用JVisualVM/JConsole监控JVM资源
2、堆可以使用MAT(MemoryAnalyzer Tool)分析泄露堆内存,线程资源;使用JProfiler分析对象实例信息
3、堆外内存需要使用NMT(Native Memory Tracing)结合jcmd命令(比jmap更强大)
4、内存分析可以使用Linux工具pmap结合strace跟踪内存分配,内存内容使用gdb进行解析
5、线程使用jstack结合4中的工具分析
6、使用jps查看java进程,jinfo -flags查看JVM参数信息
7、通过free -g查看系统内存情况
8、netstat -aon|grep port查看端口情况
9、jstat -gcutil 31974 1000 10查看JVM GC情况
Netty防止内存泄露常识
InBound Message
在AbstractNioByteChannel.NioByteUnsafe.read() 处创建了ByteBuf并调用 pipeline.fireChannelRead(byteBuf) 送入Handler链。根据上面的谁最后谁负责原则,每个Handler对消息可能有三种处理方式
- 对原消息不做处理,调用 ctx.fireChannelRead(msg)把原消息往下传,那不用做什么释放。
- 将原消息转化为新的消息并调用 ctx.fireChannelRead(newMsg)往下传,那必须把原消息release掉。
- 如果已经不再调用ctx.fireChannelRead(msg)传递任何消息,那更要把原消息release掉。
假设每一个Handler都把消息往下传,Handler并也不知道谁是启动Netty时所设定的Handler链的最后一员,所以Netty在Handler链的最末补了一个TailHandler,如果此时消息仍然是ReferenceCounted类型就会被release掉。