• YARN 启动后失败退出——没有请求资源——Invalid resource request, no resources request


    在ambari-server中修改了yarn的配置,重新启动服务,结果RM启动失败,错误也很奇怪,“不合理的资源请求,没有请求任何资源”!详细如下:

    2018-08-21 16:06:16,639 FATAL resourcemanager.ResourceManager (ResourceManager.java:main(1495)) - Error starting ResourceManager
    org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested
            at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1213)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1254)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1250)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1250)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.serviceStart(ResourceManager.java:1301)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.main(ResourceManager.java:1492)
    Caused by: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested
            at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.validateAndCreateResourceRequest(RMAppManager.java:489)
            at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.createAndPopulateNewRMApp(RMAppManager.java:389)
            at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recoverApplication(RMAppManager.java:357)
            at org.apache.hadoop.yarn.server.resourcemanager.RMAppManager.recover(RMAppManager.java:568)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.recover(ResourceManager.java:1464)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$RMActiveServices.serviceStart(ResourceManager.java:825)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
            ... 10 more
    2018-08-21 16:06:16,656 INFO  zookeeper.ZooKeeper (ZooKeeper.java:close(684)) - Session: 0x36546c044dc0113 closed
    2018-08-21 16:06:16,656 INFO  zookeeper.ClientCnxn (ClientCnxn.java:run(524)) - EventThread shut down
    2018-08-21 16:06:16,741 INFO  resourcemanager.ResourceManager (LogAdapter.java:info(49)) - SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down ResourceManager at ep-bd01/192.168.58.11

    网上多方搜索无解,最后无奈重新启动主机,重启所有服务,结果成功! 再次重启RM,失败,原因同上。

    一、配置RM HA,这次启动了,但是配置的两个RM节点都是standby状态! 期间再次修改配置文件无数次,无效,错误信息依然。

    二、手工激活一台主机上的RM,失败,错误原因相同

    [root@ep-bd01 zookeeper]# yarn rmadmin -transitionToActive --forceactive --forcemanual rm1
    You have specified the --forcemanual flag. This flag is dangerous, as it can induce a split-brain scenario that WILL CORRUPT your HDFS namespace, possibly irrecoverably.
    
    It is recommended not to use this flag, but instead to shut down the cluster and disable automatic failover if you prefer to manually manage your HA state.
    
    You may abort safely by answering 'n' or hitting ^C now.
    
    Are you sure you want to continue? (Y or N) y
    ......
    ......
    18/08/29 14:31:10 WARN ha.ActiveStandbyElector: Exception handling the winning of election
    org.apache.hadoop.ha.ServiceFailedException: RM could not transition to Active
            at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:146)
            at org.apache.hadoop.ha.ActiveStandbyElector.becomeActive(ActiveStandbyElector.java:896)
            at org.apache.hadoop.ha.ActiveStandbyElector.processResult(ActiveStandbyElector.java:476)
            at org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:611)
            at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:510)
    Caused by: org.apache.hadoop.ha.ServiceFailedException: Error when transitioning to Active mode
            at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:325)
            at org.apache.hadoop.yarn.server.resourcemanager.ActiveStandbyElectorBasedElectorService.becomeActive(ActiveStandbyElectorBasedElectorService.java:144)
            ... 4 more
    Caused by: org.apache.hadoop.service.ServiceStateException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, no resources requested
            at org.apache.hadoop.service.ServiceStateException.convert(ServiceStateException.java:105)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:203)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.startActiveServices(ResourceManager.java:1213)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1254)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager$1.run(ResourceManager.java:1250)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
            at org.apache.hadoop.yarn.server.resourcemanager.ResourceManager.transitionToActive(ResourceManager.java:1250)
            at org.apache.hadoop.yarn.server.resourcemanager.AdminService.transitionToActive(AdminService.java:320)
            ... 5 more

    At Last! 经过好几天的网上搜索以及思考,这个错误可能是HDP3.0的新错误信息,和网上搜索到的一个问题有些类似,现象同样是RM启动成功后马上挂掉! 其中提到可能是RM回复application的状态引起的故障,急忙实验一下。

    简而言之,使用zookeeper命令删除 /rmstore/ZKRMStateRoot/RMAppRoot 下面的所有子目录。

    然后重启RM,没想到困扰几天的问题就这么解决了,具体请看输出吧(容我乐一会儿先)。

    [root@ep-bd03 pg_log]# sudo -u zookeeper /usr/hdp/3.0.0.0-1634/zookeeper/bin/zkCli.sh

    Connecting to localhost:
    2181 2018-08-29 15:04:02,395 - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.6-1634--1, built on 07/12/2018 20:01 GMT 2018-08-29 15:04:02,397 - INFO [main:Environment@100] - Client environment:host.name=ep-bd03 2018-08-29 15:04:02,397 - INFO [main:Environment@100] - Client environment:java.version=1.8.0_181 2018-08-29 15:04:02,398 - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation 2018-08-29 15:04:02,398 - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.8.0_181-amd64/jre 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:java.class.path=/usr/hdp/3.0.0.0-1634/zookeeper/bin/../build/classes:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../build/lib/*.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/xercesMinimal-1.9.6.2.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-provider-api-2.4.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-http-shared4-2.4.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-http-shared-1.0-beta-6.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-http-lightweight-1.0-beta-6.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-http-2.4.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/wagon-file-1.0-beta-6.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/plexus-utils-3.0.8.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/plexus-interpolation-1.11.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/plexus-container-default-1.0-alpha-9-stable-1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/netty-3.10.5.Final.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/nekohtml-1.9.6.2.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-settings-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-repository-metadata-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-project-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-profile-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-plugin-registry-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-model-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-error-diagnostics-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-artifact-manager-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-artifact-2.2.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/maven-ant-tasks-2.1.3.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/log4j-1.2.16.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/jsoup-1.7.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/commons-logging-1.1.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/commons-io-2.2.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/commons-codec-1.6.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/classworlds-1.1-alpha-2.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/backport-util-concurrent-3.1.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/ant-launcher-1.8.0.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../lib/ant-1.8.0.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../zookeeper-3.4.6.3.0.0.0-1634.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../src/java/lib/*.jar:/usr/hdp/3.0.0.0-1634/zookeeper/bin/../conf::/usr/share/zookeeper/* 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:java.compiler=<NA> 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:os.name=Linux 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:os.arch=amd64 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:os.version=3.10.0-862.6.3.el7.x86_64 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:user.name=zookeeper 2018-08-29 15:04:02,399 - INFO [main:Environment@100] - Client environment:user.home=/var/lib/zookeeper 2018-08-29 15:04:02,400 - INFO [main:Environment@100] - Client environment:user.dir=/tmp/hsperfdata_zookeeper 2018-08-29 15:04:02,401 - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@6438a396 Welcome to ZooKeeper! 2018-08-29 15:04:02,417 - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1019] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error) JLine support is enabled 2018-08-29 15:04:02,461 - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@864] - Socket connection established, initiating session, client: /127.0.0.1:7637, server: localhost/127.0.0.1:2181 [zk: localhost:2181(CONNECTING) 0] 2018-08-29 15:04:02,484 - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1279] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x3658450e5f202da, negotiated timeout = 30000 WATCHER:: WatchedEvent state:SyncConnected type:None path:null [zk: localhost:2181(CONNECTED) 0] ls /rmstore [ZKRMStateRoot] [zk: localhost:2181(CONNECTED) 1] ls /rmstore/ZKRMStateRoot [ReservationSystemRoot, RMAppRoot, AMRMTokenSecretManagerRoot, EpochNode, RMDTSecretManagerRoot, RMVersionNode]

    [zk: localhost:2181(CONNECTED) 6] ls /rmstore/ZKRMStateRoot/RMAppRoot
    [application_1534904073745_0001, HIERARCHIES, application_1534904073745_0003, application_1534904073745_0002]

    [zk: localhost:2181(CONNECTED) 3] rmr /rmstore/ZKRMStateRoot/RMAppRoot/application_1534904073745_0001
    [zk: localhost:2181(CONNECTED) 4] rmr /rmstore/ZKRMStateRoot/RMAppRoot/HIERARCHIES
    [zk: localhost:2181(CONNECTED) 5] rmr /rmstore/ZKRMStateRoot/RMAppRoot/application_1534904073745_0003
    [zk: localhost:2181(CONNECTED) 5] rmr /rmstore/ZKRMStateRoot/RMAppRoot/application_1534904073745_0002

    [zk: localhost:2181(CONNECTED) 7] ls /rmstore/ZKRMStateRoot
    /RMAppRoot
    []
    [zk: localhost:2181(CONNECTED) 8] 
  • 相关阅读:
    Docker 第一章 安装测试
    时区
    操作系统层虚拟化
    KVM
    基于秘钥的登录认证
    Hypervisor
    通过域名区分不同网站服务
    硬件虚拟化
    虚拟化初识
    nginx rewrite (转发)
  • 原文地址:https://www.cnblogs.com/dajianshi/p/9516155.html
Copyright © 2020-2023  润新知