• 解决hbase频繁掉的问题


    1、情况说明,测试集群,6台hdfs,一台hbase

    在使用hbase的时候,出现hbase总是挂掉问题

    2、错误现象:

    2020-06-05 15:28:27,670 WARN  [RS_OPEN_META-bb-cc-aa:16020-0-MetaLogRoller] wal.ProtobufLogWriter: Failed to write trailer, non-fatal, continuing...
    java.io.IOException: All datanodes xxx:50010 are bad. Aborting...
            at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:1137)
            at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:933)
            at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487)
    2020-06-05 15:28:27,670 WARN  [RS_OPEN_META-xxx4:16020-0-MetaLogRoller] wal.FSHLog: Riding over failed WAL close of hdfs://xxxx:9000/hbase/WALs/xxx,16020,1591341651357/xxx%2C16020%2C1591341651357.meta.1591341967425.meta, cause="All datanodes xxx:50010 are bad. Aborting...", errors=2; THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
    2020-06-05 15:28:27,671 INFO  [RS_OPEN_META-xxx:16020-0-MetaLogRoller] wal.FSHLog: Rolled WAL /hbase/WALs/xxx,16020,1591341651357/xxx%2C16020%2C1591341651357.meta.1591341967425.meta with entries=61, filesize=47.94 KB; new WAL /hbase/WALs/xxx,16020,1591341651357/xxx%2C16020%2C1591341651357.meta.1591342107655.meta
    2020-06-05 15:28:53,482 WARN  [ResponseProcessor for block BP-705947195-xxx-1495826397385:blk_1080155959_6415222] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-705947195-xxx.82-1495826397385:blk_1080155959_6415222
    java.io.EOFException: Premature EOF: no length prefix available
            at org.apache.hadoop.hdfs.protocolPB.PBHelper.vintPrefixed(PBHelper.java:2000)
            at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:176)
            at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer$ResponseProcessor.run(DFSOutputStream.java:798)
    2020-06-05 15:29:27,704 WARN  [ResponseProcessor for block BP-705947195-xxx-1495826397385:blk_1080156011_6415223] hdfs.DFSClient: DFSOutputStream ResponseProcessor exception  for block BP-705947195-xxx-1495826397385:blk_1080156011_6415223
    java.io.EOFException: Premature EOF: no length prefix available

    3、分析

    看日志出现上述的问题是由于hbase在flush的时候,写入数据到hdfs出现的异常,这种测试数据量本身就很少,可能flush的时候,仅仅只有几十kb的数据量,不应该出现datanode写入异常。

    4、尝试解决办法:

    1、尝试socket超时,修改超时时间:
    
    <property>
    <name>dfs.client.socket-timeout</name>
    <value>6000000</value>
    </property>
    <property>
    <name>dfs.datanode.socket.write.timeout</name>
    <value>6000000</value>
    </property>
    
    2、尝试调整jvm
    export HBASE_REGIONSERVER_OPTS=" -XX:+UseG1GC -Xmx8g -Xms8g -XX:+UnlockExperimentalVMOptions -XX:MaxGCPauseMillis=100 -XX:-ResizePLAB -XX:+ParallelRefProcEnabled -XX:+AlwaysPreTouch -XX:ParallelGCThreads=32 -XX:ConcGCThreads=8 -XX:G1HeapWastePercent=3 -XX:InitiatingHeapOccupancyPercent=35 -XX:G1MixedGCLiveThresholdPercent=85 -XX:MaxDirectMemorySize=12g -XX:G1NewSizePercent=1 -XX:G1MaxNewSizePercent=15 -verbose:gc -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintAdaptiveSizePolicy -XX:PrintSafepointStatisticsCount=1 -XX:PrintFLSStatistics=1 -Xloggc:${HBASE_LOG_DIR}/gc-hbase-regionserver-`hostname`.log"
    
    3、尝试修改Zookeeper的超时时间以及HBase超时后不abort变为restart
    <property>
        <name>zookeeper.session.timeout</name>
        <value>600000</value>
    </property>
    
    <property>
        <name>hbase.regionserver.restart.on.zk.expire</name>
        <value>true</value>
    </property>
    
    4、修改hdfs-site.xml 尝试修改副本失败的时候分配策略
    <property>
        <name>dfs.client.block.write.replace-datanode-on-failure.enable</name>
        <value>true</value>
    </property>
    
    <property>
        <name>dfs.client.block.write.replace-datanode-on-failure.policy</name>
        <value>ALWAYS</value>  # 可以改成其他参数值
    </property>
    
    5、修改datanode处理线程数量
    <property>
        <name>dfs.datanode.max.transfer.threads</name>
        <value>8192</value>
    </property>

    等待看是否还有问题

    借鉴:

    https://blog.csdn.net/microGP/article/details/81234065

  • 相关阅读:
    【Android游戏开发之八】游戏中添加音频详解MediaPlayer与SoundPool的利弊以及各个在游戏中的用途!
    【Android游戏开发之九】(细节处理)触屏事件中的Bug解决方案以及禁止横屏和竖屏切换!
    【Android游戏开发之七】(游戏开发中需要的样式)再次剖析游戏开发中对SurfaceView中添加组件方案!
    前端要给力之:URL应该有多长?
    【Android游戏开发之三】剖析 SurfaceView ! Callback以及SurfaceHolder!!
    charactersFound方法中的陷阱
    前端要给力之:分解对象构造过程new()
    结合UIImageView实现图片的移动和缩放
    【Android游戏开发之一】设置全屏以及绘画简单的图形
    扩展BaseAdapter实现在ListView中浏览文件
  • 原文地址:https://www.cnblogs.com/yjt1993/p/13098737.html
Copyright © 2020-2023  润新知