最近笔者遇到一个问题 监控平台忽然告警 GC overhead limit exceeded 这个异常
第一反应估计是堆溢出了。于是各种各种jmap jstack下载堆栈文件和堆日志文件。
以下是线程堆栈dump下来的日志文件
Jstack pid > xxx.log 线程dump【pid是进程ID】
"DubboClientHandler-172.16.3.244:20885-thread-168" #5165 daemon prio=5 os_prio=0 tid=0x00007f6604070000 nid=0x1151 waiting on condition [0x00007f65c31f8000] java.lang.Thread.State: TIMED_WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000731228070> (a java.util.concurrent.SynchronousQueue$TransferStack) at java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) at java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460) at java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:362) at java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:941) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1066) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
很明显这个dubbo线程一直在等待其他线程释放资源 它目前是阻塞状态
还有一个异常:
"DubboClientReconnectTimer-thread-3" #13057 daemon prio=5 os_prio=0 tid=0x00007f01e8e8d000 nid=0x4631 waiting on condition [0x00007f01dd5a6000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for <0x0000000730e115a8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1088) at java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:748)
这个异常描述是dubbo客户端重连线程也一直处于阻塞状态 ;为什么会重连呢 原因是dubbo的心跳检测机制发现与服务端的连接超时,一般1分钟后 它会发起重连[消费者和生产者需要通过心跳机制来保持长连接]
综合描述 客户端调用的dubbo服务超时了 响应过于缓慢 客户端不断在重连。
本质原因第三方服务超时导致的客户端消费程序响应缓慢 超时严重 大量线程堆积 不释放 导致内除溢出...