部分内容参考自CSDN
测试环境通过agent注入了部分代码,其中包括几个Timer.
在通过启动脚本重启tomcat时,会一直有一个stop进程卡住,导致tomcat无法正常重启,进程卡住不动。
通过jstack tomcat进程,发现没有死锁进程,只有两个进程是TIMED_WAITING
,这两个进程是通过agent注入的两个原生timer,原生timer很不建议使用.
/** * 1.获取路由节点队列数据,超过Config.Message.NODES大小发送至GRCC */ new Timer("route-nodes-to-grcc-timer-1").scheduleAtFixedRate(new TimerTask() { @Override public void run() { try { while (true) { try { if(isCanPoll){ Node node = nodesLinkedQueue.poll(); if(null != node){ nodes.add(node); if(nodes.size() >= Config.Message.NODES){ try{ semaphore.acquire(); if(!nodes.isEmpty()){ sendToGRCC(nodes); } }catch (Exception e){ logger.error("Consumer Task发送链路信息至GRCC异常,异常信息如下:"+e.getMessage()); }finally { semaphore.release(); } } } } Thread.sleep(10L); } catch (Exception e) { logger.error("nodesLinkedQueue poll异常,错误信息如下:"+e.getMessage()); } } } catch (Exception e) { logger.error("schedule执行异常,错误信息如下:"+e.getMessage()); } } }, 1000L, 1000L); /** * 2.间隔Config.Message.INTERVAL时间发送一次路由节点信息至GRCC */ new Timer("route-nodes-to-grcc-timer-2").scheduleAtFixedRate(new TimerTask() { @Override public void run() { /** * nodes不为null再获取锁进行发送 */ if (!nodes.isEmpty()) { try { semaphore.acquire(); isCanPoll = false; if (!nodes.isEmpty()) { sendToGRCC(nodes); } } catch (Exception e) { logger.error("Schedule Task发送链路信息至GRCC异常,异常信息如下:" + e.getMessage()); } finally { semaphore.release(); isCanPoll = true; } } } },1000L,Config.Message.INTERVAL);
每次tomcat重启时,进程卡住的同时,后端tomcat日志会打印如下类似的提示内存泄漏的warning:
02-Jan-2019 19:37:46.918 警告 [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [fx-route] appears to have started a thread named [dubbo-remoting-client-heartbeat-thread-2] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 02-Jan-2019 19:37:46.920 警告 [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [fx-route] appears to have started a thread named [pool-13-thread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 02-Jan-2019 19:37:46.922 警告 [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [fx-route] appears to have started a thread named [pool-14-thread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:215) java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2078) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1093) java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:809) java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1067) java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1127) java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) java.lang.Thread.run(Thread.java:745) 02-Jan-2019 19:37:46.923 警告 [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.clearReferencesThreads The web application [fx-route] appears to have started a thread named [Abandoned connection cleanup thread] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: java.lang.Object.wait(Native Method) java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143) java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164) com.mysql.jdbc.NonRegisteringDriver$1.run(NonRegisteringDriver.java:93)
通过查询,发现上述CSDN中描述的:存在线程未关闭,比如Timer。
后来将Timer修改为ScheduledExecutorService后,发现tomcat启动正常。
ScheduledExecutorService executor = Executors.newScheduledThreadPool(2,new DefaultNamedThreadFactory("route-nodes-to-grcc")); /** * 1.获取路由节点队列数据,超过Config.Message.NODES大小发送至GRCC */ executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { try { while (true) { try { if(isCanPoll){ Node node = nodesLinkedQueue.poll(); if(null != node){ nodes.add(node); if(nodes.size() >= Config.Message.NODES){ try{ semaphore.acquire(); if(!nodes.isEmpty()){ sendToGRCC(nodes); } }catch (Exception e){ logger.error("Consumer Task发送链路信息至GRCC异常,异常信息如下:"+e.getMessage()); }finally { semaphore.release(); } } } } Thread.sleep(10L); } catch (Exception e) { logger.error("nodesLinkedQueue poll异常,错误信息如下:"+e.getMessage()); } } } catch (Exception e) { logger.error("schedule执行异常,错误信息如下:"+e.getMessage()); } } },1000L, 1000L, TimeUnit.MILLISECONDS); /** * 2.间隔Config.Message.INTERVAL时间发送一次路由节点信息至GRCC */ executor.scheduleAtFixedRate(new Runnable() { @Override public void run() { /** * nodes不为null再获取锁进行发送 */ if (!nodes.isEmpty()) { try { semaphore.acquire(); isCanPoll = false; if (!nodes.isEmpty()) { sendToGRCC(nodes); } } catch (Exception e) { logger.error("Schedule Task发送链路信息至GRCC异常,异常信息如下:" + e.getMessage()); } finally { semaphore.release(); isCanPoll = true; } } } },1000L,Config.Message.INTERVAL,TimeUnit.MILLISECONDS);
阿里编程规范有如下两条:
- 【规范】线程资源必须通过线程池提供,不允许在应用中自行显式创建线程。
说明:使用线程池的好处是减少在创建和销毁线程上所花的时间以及系统资源的开销,解决资源不足的问题。如果不使用线程池,有可能造成系统创建大量同类线程而导致消耗完内存或者 “过度切换”的问题。 - 【规范】线程池不允许使用 Executors 去创建,而是通过 ThreadPoolExecutor 的方式,这样的处理方式让写的同学更加明确线程池的运行规则,规避资源耗尽的风险。
说明:Executors 返回的线程池对象的弊端如下:
1)FixedThreadPool 和 SingleThreadPool:
允许的请求队列长度为 Integer.MAX_VALUE,可能会堆积大量的请求,从而导致 OOM。
2)CachedThreadPool 和 ScheduledThreadPool:
允许的创建线程数量为 Integer.MAX_VALUE,可能会创建大量的线程,从而导致 OOM。