1、Cannot allocate memory
报错信息:
Java HotSpot(TM) 64-Bit Server VM warning: INFO: os::commit_memory(0x00000005c5330000, 8502706176, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 8502706176 bytes for committing reserved memory.
# An error report file with more information is saved as:
# /root/hs_err_pid9168.log
日志
上面报错信息提示,查看更多,去/root/hs_err_pid9168.log里面查看。
#查看 vim /root/hs_err_pid9168.log #内容 # # There is insufficient memory for the Java Runtime Environment to continue. # Native memory allocation (mmap) failed to map 8502706176 bytes for committing reserved memory. # Possible reasons: # The system is out of physical RAM or swap space # In 32 bit mode, the process size limit was hit # Possible solutions: # Reduce memory load on the system # Increase physical memory or swap space # Check if swap backing store is full # Use 64 bit Java on a 64 bit OS # Decrease Java heap size (-Xmx/-Xms) # Decrease number of Java threads # Decrease Java thread stack sizes (-Xss) # Set larger code cache with -XX:ReservedCodeCacheSize= # This output file may be truncated or incomplete. # # Out of Memory Error (os_linux.cpp:2743), pid=9168, tid=0x00007f22fdcce700 # # JRE version: (8.0_191-b12) (build ) # Java VM: Java HotSpot(TM) 64-Bit Server VM (25.191-b12 mixed mode linux-amd64 compressed oops) # Core dump written. Default location: /root/core or core.9168 # --------------- T H R E A D --------------- Current thread (0x00007f22f4016000): JavaThread "Unknown thread" [_thread_in_vm, id=9255, stack(0x00007f22fdbcf000,0x00007f22fdccf000)] Stack: [0x00007f22fdbcf000,0x00007f22fdccf000], sp=0x00007f22fdccd4c0, free space=1017k Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code) V [libjvm.so+0xace425] VMError::report_and_die()+0x2c5 V [libjvm.so+0x4deb77] report_vm_out_of_memory(char const*, int, unsigned long, VMErrorType, char const*)+0x67 V [libjvm.so+0x90c570] os::pd_commit_memory(char*, unsigned long, unsigned long, bool)+0x100 V [libjvm.so+0x903eaf] os::commit_memory(char*, unsigned long, unsigned long, bool)+0x1f V [libjvm.so+0xaca93c] VirtualSpace::initialize(ReservedSpace, unsigned long)+0x20c V [libjvm.so+0x5ea477] CardGeneration::CardGeneration(ReservedSpace, unsigned long, int, GenRemSet*)+0xc7 V [libjvm.so+0x5eb842] GenerationSpec::init(ReservedSpace, int, GenRemSet*)+0x182 V [libjvm.so+0x5d699f] GenCollectedHeap::initialize()+0x20f V [libjvm.so+0xa922ba] Universe::initialize_heap()+0x16a V [libjvm.so+0xa92593] universe_init()+0x33 V [libjvm.so+0x62f0f0] init_globals()+0x50 V [libjvm.so+0xa74c57] Threads::create_vm(JavaVMInitArgs*, bool*)+0x257 V [libjvm.so+0x6d49ff] JNI_CreateJavaVM+0x4f C [libjli.so+0x7e74] JavaMain+0x84 C [libpthread.so.0+0x7dd5] start_thread+0xc5 --------------- P R O C E S S --------------- Java Threads: ( => current thread ) Other Threads: =>0x00007f22f4016000 (exited) JavaThread "Unknown thread" [_thread_in_vm, id=9255, stack(0x00007f22fdbcf000,0x00007f22fdccf000)] VM state:not at safepoint (not fully initialized) VM Mutex/Monitor currently owned by a thread: None GC Heap History (0 events): No events Deoptimization events (0 events): No events Classes redefined (0 events): No events Internal exceptions (0 events): No events Events (0 events): No events Dynamic libraries: 00400000-00401000 r-xp 00000000 fd:00 202099235 /usr/local/soft/jdk/jdk1.8.0_191/bin/java 00600000-00601000 r--p 00000000 fd:00 202099235 /usr/local/soft/jdk/jdk1.8.0_191/bin/java 00601000-00602000 rw-p 00001000 fd:00 202099235 /usr/local/soft/jdk/jdk1.8.0_191/bin/java 00dc0000-00dfa000 rw-p 00000000 00:00 0 [heap] 5c0000000-5c5330000 rw-p 00000000 00:00 0 7f22e5000000-7f22e5270000 rwxp 00000000 00:00 0 7f22e5270000-7f22f4000000 ---p 00000000 00:00 0 7f22f4000000-7f22f4043000 rw-p 00000000 00:00 0 7f22f4043000-7f22f8000000 ---p 00000000 00:00 0 7f22f9b03000-7f22f9ec1000 rw-p 00000000 00:00 0 7f22f9ec1000-7f22fae97000 ---p 00000000 00:00 0 7f22fae97000-7f22fae98000 rw-p 00000000 00:00 0 7f22fae98000-7f22fae99000 ---p 00000000 00:00 0 7f22fae99000-7f22fafa3000 rw-p 00000000 00:00 0 7f22fafa3000-7f22fb359000 ---p 00000000 00:00 0 7f22fb359000-7f22fb373000 r-xp 00000000 fd:00 134320156 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so 7f22fb373000-7f22fb573000 ---p 0001a000 fd:00 134320156 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so 7f22fb573000-7f22fb574000 r--p 0001a000 fd:00 134320156 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so 7f22fb574000-7f22fb575000 rw-p 0001b000 fd:00 134320156 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/libzip.so 7f22fb575000-7f22fb581000 r-xp 00000000 fd:00 1048275 /usr/lib64/libnss_files-2.17.so 7f22fb581000-7f22fb780000 ---p 0000c000 fd:00 1048275 /usr/lib64/libnss_files-2.17.so 7f22fb780000-7f22fb781000 r--p 0000b000 fd:00 1048275 /usr/lib64/libnss_files-2.17.so 7f22fb781000-7f22fb782000 rw-p 0000c000 fd:00 1048275 /usr/lib64/libnss_files-2.17.so 7f22fcdb2000-7f22fcfb2000 ---p 00ce2000 fd:00 19060 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so 7f22fcfb2000-7f22fd048000 r--p 00ce2000 fd:00 19060 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so 7f22fd048000-7f22fd079000 rw-p 00d78000 fd:00 19060 /usr/local/soft/jdk/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so 7f22fd079000-7f22fd0b4000 rw-p 00000000 00:00 0 7f22fd476000-7f22fd47a000 r--p 001c2000 fd:00 34193 /usr/lib64/libc-2.17.so 7f22fd47a000-7f22fd47c000 rw-p 001c6000 fd:00 34193 /usr/lib64/libc-2.17.so 7f22fd47c000-7f22fd481000 rw-p 00000000 00:00 0 7f22fd481000-7f22fd483000 r-xp 00000000 fd:00 34199 /usr/lib64/libdl-2.17.so 7f22fd483000-7f22fd683000 ---p 00002000 fd:00 34199 /usr/lib64/libdl-2.17.so 7f22fd683000-7f22fd684000 r--p 00002000 fd:00 34199 /usr/lib64/libdl-2.17.so 7f22fd684000-7f22fd685000 rw-p 00003000 fd:00 34199 /usr/lib64/libdl-2.17.so 7f22fd685000-7f22fd69c000 r-xp 00000000 fd:00 67296956 /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so 7f22fd69c000-7f22fd89b000 ---p 00017000 fd:00 67296956 /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so 7f22fd89b000-7f22fd89c000 r--p 00016000 fd:00 67296956 /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so 7f22fd89c000-7f22fd89d000 rw-p 00017000 fd:00 67296956 /usr/local/soft/jdk/jdk1.8.0_191/lib/amd64/jli/libjli.so 7f22fd89d000-7f22fd8b4000 r-xp 00000000 fd:00 1051052 /usr/lib64/libpthread-2.17.so 7f22fd8b4000-7f22fdab3000 ---p 00017000 fd:00 1051052 /usr/lib64/libpthread-2.17.so 7f22fdab3000-7f22fdab4000 r--p 00016000 fd:00 1051052 /usr/lib64/libpthread-2.17.so 7f22fdab4000-7f22fdab5000 rw-p 00017000 fd:00 1051052 /usr/lib64/libpthread-2.17.so 7f22fdab5000-7f22fdab9000 rw-p 00000000 00:00 0 7f22fdab9000-7f22fdadb000 r-xp 00000000 fd:00 33871 /usr/lib64/ld-2.17.so 7f22fdbc6000-7f22fdbce000 rw-s 00000000 fd:00 503948 /tmp/hsperfdata_root/9168 7f22fdbce000-7f22fdbd2000 ---p 00000000 00:00 0 7f22fdbd2000-7f22fdcd3000 rw-p 00000000 00:00 0 7f22fdcd4000-7f22fdcd8000 rw-p 00000000 00:00 0 7f22fdcd8000-7f22fdcd9000 r--p 00000000 00:00 0 7f22fdcd9000-7f22fdcda000 rw-p 00000000 00:00 0 7f22fdcda000-7f22fdcdb000 r--p 00021000 fd:00 33871 /usr/lib64/ld-2.17.so 7f22fdcdb000-7f22fdcdc000 rw-p 00022000 fd:00 33871 /usr/lib64/ld-2.17.so 7f22fdcdc000-7f22fdcdd000 rw-p 00000000 00:00 0 7fff013af000-7fff013d3000 rw-p 00000000 00:00 0 [stack] 7fff013ec000-7fff013ee000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] VM Arguments: Launcher Type: SUN_STANDARD Environment Variables: JAVA_HOME=/usr/local/soft/jdk/jdk1.8.0_191 LD_LIBRARY_PATH=:/opt/hadoop-3.1.2/lib:/opt/hadoop-3.1.2/lib/native SHELL=/bin/bash Signal Handlers: SIGSEGV: [libjvm.so+0xaced60], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGBUS: [libjvm.so+0xaced60], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGFPE: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGPIPE: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGXFSZ: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGILL: [libjvm.so+0x907ca0], sa_mask[0]=11111111011111111101111111111110, sa_flags=SA_RESTART|SA_SIGINFO SIGUSR1: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGUSR2: [libjvm.so+0x907b70], sa_mask[0]=00000000000000000000000000000000, sa_flags=SA_RESTART|SA_SIGINFO SIGHUP: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGINT: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGTERM: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none SIGQUIT: SIG_DFL, sa_mask[0]=00000000000000000000000000000000, sa_flags=none --------------- S Y S T E M --------------- OS:CentOS Linux release 7.5.1804 (Core) uname:Linux 3.10.0-862.el7.x86_64 #1 SMP Fri Apr 20 16:44:24 UTC 2018 x86_64 libc:glibc 2.17 NPTL 2.17 rlimit: STACK 8192k, CORE infinity, NPROC 71318, NOFILE 102400, AS infinity load average:4.18 4.20 4.11 /proc/meminfo: MemTotal: 18281924 kB MemFree: 199928 kB MemAvailable: 1747332 kB Buffers: 0 kB Cached: 1806360 kB SwapCached: 2488 kB Active: 15057324 kB Inactive(anon): 1729572 kB Active(file): 865640 kB Inactive(file): 864956 kB Unevictable: 0 kB Mlocked: 0 kB SwapTotal: 5238780 kB SwapFree: 5196284 kB Dirty: 11104 kB Writeback: 0 kB AnonPages: 15843208 kB Mapped: 70700 kB Shmem: 75776 kB Slab: 202108 kB SReclaimable: 156948 kB SUnreclaim: 45160 kB KernelStack: 14960 kB PageTables: 38372 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 14379740 kB HardwareCorrupted: 0 kB AnonHugePages: 2625536 kB CmaTotal: 0 kB CmaFree: 0 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB DirectMap4k: 81792 kB DirectMap2M: 5111808 kB DirectMap1G: 13631488 kB container (cgroup) information: container_type: cgroupv1 cpu_cpuset_cpus: 0 cpu_memory_nodes: 0 active_processor_count: 1 cpu_quota: -1 cpu_period: 100000 cpu_shares: -1 memory_limit_in_bytes: -1 memory_and_swap_limit_in_bytes: -1 memory_soft_limit_in_bytes: -1 memory_usage_in_bytes: 18073210880 memory_max_usage_in_bytes: 0 CPU:total 1 (initial active 1) (1 cores per cpu, 1 threads per core) family 6 model 85 stepping 4, cmov, cx8, fxsr, mmx, sse, sse2, sse3, ssse3, sse4.1, sse 4.2, popcnt, avx, avx2, aes, clmul, erms, rtm, 3dnowpref, lzcnt, tsc, tscinvbit, bmi1, bmi2, adx /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 85 model name : Intel(R) Xeon(R) Gold 5118 CPU @ 2.30GHz stepping : 4 microcode : 0x200004d cpu MHz : 2294.738 cache size : 16896 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 22 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss syscall nx pdpe1gb rdtscp lm constant _tsc arch_perfmon nopl xtopology tsc_reliable nonstop_tsc eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer ae s xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm mpx avx512f avx512dq rdseed adx smap cl flushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec ibpb ibrs stibp arat pku ospke spec_ctrl intel_stibp arch_capabilities bogomips : 4589.47 clflush size : 64 cache_alignment : 64 address sizes : 43 bits physical, 48 bits virtual power management: Memory: 4k page, physical 18281924k(199928k free), swap 5238780k(5196284k free) vm_info: Java HotSpot(TM) 64-Bit Server VM (25.191-b12) for linux-amd64 JRE (1.8.0_191-b12), built on Oct 6 2018 05:43:09 by "java_re" with gcc 7.3.0 time: Tue Sep 17 09:54:53 2019 elapsed time: 0 seconds (0d 0h 0m 0s)
原因分析
明显是由于内存不够,查看内存占用
df -h
解决
#重启hbase、hadoop stop-hbase.sh stop-all.sh start-all.sh start-hbase.sh #清除缓存 sync ; echo 1 >/proc/sys/vm/drop_caches echo 2 >/proc/sys/vm/drop_caches echo 3 >/proc/sys/vm/drop_caches #再次查看内存占用 df -h
2、Application is added to the scheduler and is not yet activated
再次迁移快照时,任务一直停留着不动
查看web,显示如下:
Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details :
AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>; Queue Resource Limit for AM = <memory:0, vCores:0>;
User AM Resource Limit of the queue = <memory:0, vCores:0>; Queue AM Resource Usage = <memory:0, vCores:0>;
hbase shell能正常进入,但是输入命令,报错
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2977) at org.apache.hadoop.hbase.master.MasterRpcServices.getCompletedSnapshots(MasterRpcServices.java:949) at org.apache.hadoop.hbase.shaded.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java) at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:413) at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:130) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:324) at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:304)
jps查看,HRegionServer没有起来
分析HRegionServer的日志
tailf /opt/hbase-2.1.4/logs/hbase-root-regionserver-hbase2.log -n 500 #报错信息
2019-09-17 10:59:22,539 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-0] coordination.ZkSplitLogWorkerCoordination: successfully transitioned task /hbase/splitWAL/WALs%2Fhbase2%2C16020%2C1568617738890-splitting%2Fhbase2%252C16020%252C1568617738890.1568684725860 to final state ERR hbase2,16020,1568688376805
2019-09-17 10:59:22,539 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-0] handler.WALSplitterHandler: Worker hbase2,16020,1568688376805 done with task org.apache.hadoop.hbase.coordination.ZkSplitLogWorkerCoordination$ZkSplitTaskDetails@9885422 in 200ms. Status = ERR
2019-09-17 10:59:23,158 INFO [SplitLogWorker-hbase2:16020] coordination.ZkSplitLogWorkerCoordination: worker hbase2,16020,1568688376805 acquired task /hbase/splitWAL/WALs%2Fhbase2%2C16020%2C1568617738890-splitting%2Fhbase2%252C16020%252C1568617738890.1568683832472
2019-09-17 10:59:23,179 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] wal.WALSplitter: Splitting WAL=hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472, length=138158040
2019-09-17 10:59:23,183 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] util.FSHDFSUtils: Recover lease on dfs file hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472
2019-09-17 10:59:23,183 INFO [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1] util.FSHDFSUtils: Recovered lease, attempt=0 on file=hdfs://hbase2:9000/hbase/WALs/hbase2,16020,1568617738890-splitting/hbase2%2C16020%2C1568617738890.1568683832472 after 0ms
2019-09-17 10:59:23,201 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter:
Found old edits file. It could be the result of a previous failed split attempt.
Deleting hdfs://hbase2:9000/hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,222 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-0] wal.WALSplitter: Found old edits file.
It could be the result of a previous failed split attempt.
Deleting hdfs://hbase2:9000/hbase/default/tsdb/2501a70608674eab4974e7f8006dac12/recovered.edits/0000000000007214923-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,235 WARN [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-2] wal.WALSplitter: Found old edits file. It could be the result of a previous failed split attempt. Deleting hdfs://hbase2:9000/hbase/default/tsdb/8d5fe84d54f4170f35d33dff7b830444/recovered.edits/0000000000006173823-hbase2%2C16020%2C1568617738890.1568683832472.temp, length=0
2019-09-17 10:59:23,411 WARN [Thread-8138] hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1388)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)
2019-09-17 10:59:23,412 ERROR [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter: Got while writing log entry to log
java.io.IOException: File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.appendBuffer(WALSplitter.java:1601)
at org.apache.hadoop.hbase.wal.WALSplitter$LogRecoveredEditsOutputSink.append(WALSplitter.java:1559)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.writeBuffer(WALSplitter.java:1084)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.doRun(WALSplitter.java:1076)
at org.apache.hadoop.hbase.wal.WALSplitter$WriterThread.run(WALSplitter.java:1046)
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /hbase/default/tsdb/0d0a4577bfb611d1f8f7b903e296b38f/recovered.edits/0000000000007153973-hbase2%2C16020%2C1568617738890.1568683832472.temp could only be written to 0 of the 1 minReplication nodes. There are 1 datanode(s) running and 1 node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2121)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:295)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2702)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:875)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:561)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:872)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:818)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2678)
at org.apache.hadoop.ipc.Client.call(Client.java:1476)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy18.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:418)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy19.addBlock(Unknown Source)
at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:372)
at com.sun.proxy.$Proxy20.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.locateFollowingBlock(DFSOutputStream.java:1603)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1388)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)
2019-09-17 10:59:23,412 ERROR [RS_LOG_REPLAY_OPS-regionserver/hbase2:16020-1-Writer-1] wal.WALSplitter: Exiting thread
分析日志,发现,在反复切割1568617738890文件,那我们将这个文件删除,再重启
#停止hbase stop-hbase.sh #查看WALs hdfs dfs -ls -R /hbase/WALs #删除WALs hdfs dfs -rm -R /hbase/WALs #清空zk里面的hbase zkCli.sh
rmr /hbase #启动hbase start-hbase.sh
再次jps查看
发现HRegionServer成功启动,去hbase shell里面输命令
3、Unexpected error starting NodeStatusUpdater
问题描述
每次迁移快照时,就停留在2019-09-20 14:08:39,184 INFO [main] mapreduce.Job: Running job: job_1568959466252_0001,去web上查看任务,可看到其诊断提示:
Application is added to the scheduler and is not yet activated. Skipping AM assignment as cluster resource is empty. Details :
AM Partition = <DEFAULT_PARTITION>; AM Resource Request = <memory:2048, vCores:1>;
Queue Resource Limit for AM = <memory:0, vCores:0>; User AM Resource Limit of the queue = <memory:0, vCores:0>;
Queue AM Resource Usage = <memory:0, vCores:0>;
退回8088页面,查看到Active Nodes为0:
去50070页面,可以看到Live Nodes为2:
到这里,我估计是两个子节点,出现了问题,jps查看每个节点:
看到ResourceManager上面的在hbase0上时,我想起,yarn.resourcemanager.hostname这个配置,当时集群搭建时,我全部给的是本机的hostname,这样肯定是不对的,于是去查两个子节点的日志:
tailf /opt/sfot/hadoop/hadoop-3.1.3/hadoop-root-nodemanager-hbase1.log -n 500
果不其然,看到了以下报错:
2019-09-20 11:53:52,458 ERROR org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl: Unexpected error starting NodeStatusUpdater java.net.ConnectException: Call From hbase1/192.168.0.211 to hbase1:8031 failed on connection exception: java.net.ConnectException:
Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.GeneratedConstructorAccessor28.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755) at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1515) at org.apache.hadoop.ipc.Client.call(Client.java:1457) at org.apache.hadoop.ipc.Client.call(Client.java:1367) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(P
在主节点上查看8031端口:
netstat -ntual |grep 8031
解决
去2个子节点将配置文件中有关yarn.resourcemanager.hostname的配置修改过来:
vim /opt/soft/hadoop/hadoop-3.1.2/etc/hadoop/yarn-site.xml
验证
重启hadoop、hbase之后,再在主节点上查看8031端口
再次迁移快照:
hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot -snapshot snapshot_tsdb_212 -copy-from hdfs://192.168.0.210:9000/hbase -copy-to hdfs://192.168.0.212:9000/hbase -mappers 20 -bandwidth 1024
发现成功
并且在web上查看Active Nodes为2
PS
到这里,想起要提醒一下,与伪分布式不同的是,要注意hbase的配置文件hbase-site.xml,在指定hbase.rootdir时,保持一致,我这里主节点是hbase0,3个节点的配置都如下:
<property>
<name>hbase.rootdir</name>
<value>hdfs://hbase0:9000/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hbase0,hbase1,hbase3</value>
</property>
<property>
<name>hbase.unsafe.stream.capability.enforce</name>
<value>false</value>
</property>
<property>
<name>hbase.wal.provider</name>
<value>filesystem</value>
</property>
<property>
<name>hbase.tmp.dir</name>
<value>/opt/soft/hbase/hbase-2.1.4/tmpdata</value>
</property>
<property>
<name>hfile.block.cache.size</name>
<value>0.2</value>
</property>
<property>
<name>hbase.snapshot.enabled</name>
<value>true</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>180000</value>
</property>