• 大叔问题定位分享(51)hbase有一个region一直处于rit状态(超时)


    hbase有一个region一直处于rit状态,对该region进行move/assign/unassign都没有反应,使用hbck2进行assigns/unassigns也没有反应

    查改hbase当前的lock状态发现

    hbase(main):003:0> list_locks
    NAMESPACE(default)                                                                                                                                                                                                                                      
    Lock type: SHARED, count: 1                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                            
    TABLE(apache_atlas_janus)                                                                                                                                                                                                                               
    Lock type: SHARED, count: 1                                                                                                                                                                                                                             
                                                                                                                                                                                                                                                            
    REGION(05021681c404140ffcee58ea06f6c7d1)                                                                                                                                                                                                                
    Lock type: EXCLUSIVE, procedure: {"className"=>"org.apache.hadoop.hbase.master.assignment.UnassignProcedure", "procId"=>"2", "submittedTime"=>"1655288642895", "owner"=>"root", "state"=>"WAITING_TIMEOUT", "stackId"=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], "lastUpdate"=>"1655350334877", "timeout"=>600000, "stateMessage"=>[{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}], "locked"=>true}
                                                                                                                                                                                                                                                            
    Took 0.0737 seconds                                                                                                                                                                                                                                     
    => [{"resourceType"=>"NAMESPACE", "resourceName"=>"default", "lockType"=>"SHARED", "sharedLockCount"=>1}, {"resourceType"=>"TABLE", "resourceName"=>"apache_atlas_janus", "lockType"=>"SHARED", "sharedLockCount"=>1}, {"resourceType"=>"REGION", "resourceName"=>"05021681c404140ffcee58ea06f6c7d1", "lockType"=>"EXCLUSIVE", "exclusiveLockOwnerProcedure"=>{"className"=>"org.apache.hadoop.hbase.master.assignment.UnassignProcedure", "procId"=>"2", "submittedTime"=>"1655288642895", "owner"=>"root", "state"=>"WAITING_TIMEOUT", "stackId"=>[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111], "lastUpdate"=>"1655350334877", "timeout"=>600000, "stateMessage"=>[{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}], "locked"=>true}, "sharedLockCount"=>0}]
    

    改region上有一把lock,是procId=2的procedure加上的,查看所有的procedure

    hbase(main):001:0> list_procedures
     PID Name State Submitted Last_Update Parameters
     2 org.apache.hadoop.hbase.master.assignment.UnassignProcedure WAITING_TIMEOUT 2022-06-15 18:24:02 +0800 2022-06-16 11:32:14 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}, "attempt"=>112}]
     3 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-15 18:25:02 +0800 2022-06-15 18:25:02 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
     4 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-15 18:26:03 +0800 2022-06-15 18:26:03 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
     37 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:43:25 +0800 2022-06-16 10:43:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
     38 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:44:25 +0800 2022-06-16 10:44:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
     39 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:45:25 +0800 2022-06-16 10:45:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
     40 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 10:45:48 +0800 2022-06-16 10:45:48 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
     41 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:46:25 +0800 2022-06-16 10:46:25 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
     42 org.apache.hadoop.hbase.master.assignment.UnassignProcedure RUNNABLE 2022-06-16 10:54:54 +0800 2022-06-16 10:54:54 +0800 [{"transitionState"=>"REGION_TRANSITION_DISPATCH", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "hostingServer"=>{"hostName"=>"hadoop-server1", "port"=>16020, "startCode"=>"1653675967711"}}]
     43 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:11:45 +0800 2022-06-16 11:11:45 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
     44 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:13:14 +0800 2022-06-16 11:13:14 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}, "override"=>true}]
     45 org.apache.hadoop.hbase.master.procedure.DisableTableProcedure RUNNABLE 2022-06-16 11:17:20 +0800 2022-06-16 11:17:20 +0800 [{}, {"userInfo"=>{"effectiveUser"=>"root"}, "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "skipTableStateCheck"=>false}]
     1556 org.apache.hadoop.hbase.master.assignment.AssignProcedure RUNNABLE 2022-06-16 11:30:11 +0800 2022-06-16 11:30:11 +0800 [{"transitionState"=>"REGION_TRANSITION_QUEUE", "regionInfo"=>{"regionId"=>"1611567786769", "tableName"=>{"namespace"=>"ZGVmYXVsdA==", "qualifier"=>"YXBhY2hlX2F0bGFzX2phbnVz"}, "startKey"=>"szMzLw==", "endKey"=>"zMzMyA==", "offline"=>false, "split"=>false, "replicaId"=>0}}]
    13 row(s)
    Took 0.6656 seconds
    

    发现刚才的命令触发了有很多个procedure都在尝试操作该region,然后卡在第一个procedure上,因为第一个procedure持有lock

    hbase hbck -j hbase-operator-tools-1.1.0/hbase-hbck2/hbase-hbck2-1.1.0.jar bypass -o -r $PROCEDURE_PID

    通过hbck2来bypass这些procedure,问题解决。

    参考:
    https://stackoverflow.com/questions/56321514/how-to-abort-kill-a-procedure-in-hbase

  • 相关阅读:
    基于linux、c的倒排索引
    关于A类,B类,C类IP地址的网段和主机数的计算方法
    如何找出字典中的兄弟单词
    简单验证码识别程序(源码)
    (一)SVM的八股简介
    验证码识别程序
    倒排索引
    验证码识别技术 Captcha Decode Technology
    字符串的组合
    C# 中panel的mousewheel事件触发 (转)
  • 原文地址:https://www.cnblogs.com/barneywill/p/16381778.html
Copyright © 2020-2023  润新知