• 压测问题现象分析及解决复盘


    一、
    现象:压测过程中接口返回慢、有超时情况,后其中一台容器自动重启,重启后生成了dump目录
    分析:猜测是发生了OOM并且容器配置了OOM自动输出dump文件。得到的dump文件有thread dump,gcutil,netstat
    解决过程:
    (1)
    从jstack日志入手,依次根据deadlock、blocked关键字查找, 找到以下处于blocked状态的线程堆栈信息

    "asyncExecutor-1093" #9490 prio=5 os_prio=0 tid=0x00007f12b400f800 nid=0x25c7 waiting for monitor entry [0x00007f0d1aa03000]
       java.lang.Thread.State: BLOCKED (on object monitor)
    	at org.apache.logging.log4j.core.appender.rolling.RollingFileManager.checkRollover(RollingFileManager.java:217)
    	- waiting to lock <0x000000065804dcc8> (a org.apache.logging.log4j.core.appender.rolling.RollingRandomAcceslalaileManager)
    	at org.apache.logging.log4j.core.appender.RollingRandomAcceslalaileAppender.append(RollingRandomAcceslalaileAppender.java:207)
    	at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
    	at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
    	at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
    	at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
    	at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
    	at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
    	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
    	at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403)
    	at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
    	at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
    	at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2091)
    	at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2011)
    	at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1884)
    	at com.lala.sgs.lalalog.log4j2.Log4j2Logger.error(Log4j2Logger.java:520)
    
    "http-nio-8888-exec-4" #363 daemon prio=5 os_prio=0 tid=0x00007f0f48003000 nid=0x222 waiting for monitor entry [0x00007f11abeb8000]
       java.lang.Thread.State: BLOCKED (on object monitor)
    	at org.apache.logging.log4j.core.layout.TextEncoderHelper.copyDataToDestination(TextEncoderHelper.java:61)
    	- waiting to lock <0x000000065804dcc8> (a org.apache.logging.log4j.core.appender.rolling.RollingRandomAcceslalaileManager)
    	at org.apache.logging.log4j.core.layout.TextEncoderHelper.encodeTextWithCopy(TextEncoderHelper.java:57)
    	at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encodeWithThreadLocals(StringBuilderEncoder.java:70)
    	at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:63)
    	at org.apache.logging.log4j.core.layout.StringBuilderEncoder.encode(StringBuilderEncoder.java:32)
    	at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:219)
    	at org.apache.logging.log4j.core.layout.PatternLayout.encode(PatternLayout.java:57)
    

    该线程blocked现象通过引入disruptor框架并配置asyncLogger后没有再出现blocked现象。(参考:https://www.cnblogs.com/waterystone/p/11170540.html
    (2)解决日志打印性能问题后重新压测,发现在send kafka,redis pool方面也有相应的性能问题。

    "http-nio-8888-exec-295" #1570 daemon prio=5 os_prio=0 tid=0x00007f0fa40f6000 nid=0x6d7 waiting for monitor entry [0x00007f0d839e1000]
       java.lang.Thread.State: BLOCKED (on object monitor)
    	at kafka.producer.Producer.send(Producer.scala:72)
    	- waiting to lock <0x0000000656965bf8> (a java.lang.Object)
    	at kafka.javaapi.producer.Producer.send(Producer.scala:33)
    	at com.lala.kafka.api.produce.KafkaProducer.send(KafkaProducer.java:65)
    	at com.lala.kafka.api.produce.KafkaProducer.sendBytes(KafkaProducer.java:54)
    	at com.lala.kafka.api.produce.ProducerPool.sendBytes(ProducerPool.java:49)
    	at com.lala.kafka.api.produce.BaseProducer.sendString(BaseProducer.java:14)
    

    通过增大发送kafka的线程数poolSize得以解决。

    "asyncExecutor-253" #1470 prio=5 os_prio=0 tid=0x00007f151000b000 nid=0x672 waiting on condition [0x00007f0d857da000]
       java.lang.Thread.State: WAITING (parking)
    	at sun.misc.Unsafe.park(Native Method)
    	- parking to wait for  <0x0000000659daa5c8> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
    	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
    	at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
    	at org.apache.commons.pool2.impl.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:524)
    	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:438)
    	at org.apache.commons.pool2.impl.GenericObjectPool.borrowObject(GenericObjectPool.java:361)
    	at redis.clients.util.Pool.getResource(Pool.java:49)
    	at redis.clients.jedis.JedisSentinelPool.getResource(JedisSentinelPool.java:209)
    	at redis.clients.jedis.JedisSentinelPool.getResource(JedisSentinelPool.java:17)
    	at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.fetchJedisConnector(JedisConnectionFactory.java:194)
    	at org.springframework.data.redis.connection.jedis.JedisConnectionFactory.getConnection(JedisConnectionFactory.java:348)
    	at org.springframework.data.redis.core.RedisConnectionUtils.doGetConnection(RedisConnectionUtils.java:129)
    	at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:92)
    	at org.springframework.data.redis.core.RedisConnectionUtils.getConnection(RedisConnectionUtils.java:79)
    	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:194)
    	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:169)
    	at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:157)
    

    看redis的配置使用的是默认的配置,默认值可从GenericObjectPoolConfig查到。
    通过增大maxTotal、maxIdle、minIdle值得以解决。
    (3)压测还在进行中
    (4)通过gcutil查看gc的情况,发现YGC比较频繁,

      S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
     12.42   0.00  96.21  50.59  90.24  86.46   1770  201.800     8    0.950  202.750
    

    JVM配置是:

    -server -Xmx6g -Xms6g -Xmn256m -XX:PermSize=128m -Xss256k -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70
    

    发现年轻代配得太小,于是加大了该项配置。改成如下:

    -server -Xmx6g -Xms6g -Xmn3072m -Xss512k -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70 -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintClassHistogram -Xloggc:../logs/gc.log
    

    重新压测后gc情况有好转

      S0     S1     E      O      M     CCS    YGC     YGCT    FGC    FGCT     GCT   
     15.07   0.00  77.15  32.25  90.37  86.32    116   24.551     8    6.216   30.767
    

    dump获取方法(转):
    (1)获取thread dump

    使用 jstack
    jstack 是 JDK 自带的工具,用于 dump 指定进程 ID(PID)的 JVM 的线程堆栈信息。
    
    # 打印堆栈信息到标准输出
    jstack PID
    
    # 打印堆栈信息到标准输出,会打印关于锁的信息
    jstack -l PID
    
    强制打印堆栈信息到标准输出,如果使用 jstack PID 没有响应的情况下(此时 JVM 进程可能挂起),加 -F 参数
    jstack -F PID
    使用 jcmd
    jcmd 是 JDK 自带的工具,用于向 JVM 进程发送命令,根据命令的不同,可以代替或部分代替 jstack、jmap 等。可以发送命令 Thread.print 来打印出 JVM 的线程堆栈信息。
    
    # 下面的命令同等于 jstack PID
    jcmd PID Thread.print
    
    # 同等于 jstack -l PID
    jcmd PID Thread.print -l
    使用 kill -3
    kill 可以向特定的进程发送信号(SIGNAL),缺省情况是发送终止(TERM) 的信号 ,即 kill PID 与 kill -15 PID 或 kill -TERM PID 是等价的。JVM 进程会监听 QUIT 信号(其值为 3),当收到这个信号时,会打印出当时的线程堆栈和堆内存使用概要,相比 jstack,此时多了堆内存的使用概要情况。但 jstack 可以指定 -l 参数,打印锁的信息。
    
    kill -3 PID
    # 或
    kill -QUIT PID
    

    (2)获取heap dump

    # 将 JVM 的堆 dump 到指定文件,如果堆中对象较多,需要的时间会较长,子参数 format 只支持 b,即二进制格式
    jmap -dump:format=b,file=FILE_WITH_PATH PID
    
    # 如果 JVM 进程未响应命令,可以加上参数 -F 尝试
    jmap -F -dump:format=b,file=FILE_WITH_PATH PID
    
    # 可以只 dump 堆中的存活对象,加上 live 子参数,但使用 -F 时不支持 live
    jmap -dump:live,format=b,file=FILE_WITH_PATH PID
    

    (3)获取堆概要信息

    # -heap 参数用于查看指定 JVM 进程的堆的信息,包括堆的各个参数的值,堆中新生代、年老代的内存大小、使用率等
    jmap -heap PID
    
    # 同样,如果 JVM 进程未响应命令,可以加上参数 -F 尝试
    jmap -F -heap PID
    

    (4)gcutil、gccapacity

    jstat -gcutil PID
    jstat -gccapacity PID
    
  • 相关阅读:
    hive匹配中文
    修改GIT密码
    Oracle中文排序问题
    redis-cli显示中文
    iOS应用图标AppIcon
    Flink开发环境搭建(maven)
    Flink安装部署
    java连Oracle连接字符串写法
    centos设置路由route
    Android 手机卫士--xutils说明与下载方法使用
  • 原文地址:https://www.cnblogs.com/frankwin608/p/13420221.html
Copyright © 2020-2023  润新知