• yarn程序95%挂起


    场景:
    总资源:18G内存,9vcore
    剩余资源:14G,5vcore
    4个running application 每个占用1G,1vcore
    5个accepted application


    配置:

    Dynamic Resource Pool Configuration:

    最小资源数:1vcore,512mb

    最大资源数:9vcore,18gb

    最大运行数:6

    Application Master 最大份额:0.4(限制可用于运行 Application Master 的资源池公平份额的比例。例如,如果设为 1.0,叶池中的 AM 最多可使用 100% 的内存和 CPU 公平份额。如果值为 -1.0,则此功能被禁用,Application Master 份额不会被检查。默认值为 0.5。)

    有三个节点:每个节点6gb内存,3vcore

    yarn-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    
    <!--Autogenerated by Cloudera Manager-->
    <configuration>
      <property>
        <name>yarn.acl.enable</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.admin.acl</name>
        <value>*</value>
      </property>
      <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>259200</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>sz280111:2181,sz280113:2181,sz280112:2181</value>
      </property>
      <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
      </property>
      <property>
        <name>yarn.client.failover-sleep-base-ms</name>
        <value>100</value>
      </property>
      <property>
        <name>yarn.client.failover-sleep-max-ms</name>
        <value>2000</value>
      </property>
      <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarnRM</value>
      </property>
      <property>
        <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.resourcemanager.address.rm49</name>
        <value>sz280111:8032</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address.rm49</name>
        <value>sz280111:8030</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm49</name>
        <value>sz280111:8031</value>
      </property>
      <property>
        <name>yarn.resourcemanager.admin.address.rm49</name>
        <value>sz280111:8033</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.address.rm49</name>
        <value>sz280111:8088</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.https.address.rm49</name>
        <value>sz280111:8090</value>
      </property>
      <property>
        <name>yarn.resourcemanager.address.rm61</name>
        <value>sz280112:8032</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address.rm61</name>
        <value>sz280112:8030</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm61</name>
        <value>sz280112:8031</value>
      </property>
      <property>
        <name>yarn.resourcemanager.admin.address.rm61</name>
        <value>sz280112:8033</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.address.rm61</name>
        <value>sz280112:8088</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.https.address.rm61</name>
        <value>sz280112:8090</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm49,rm61</value>
      </property>
      <property>
        <name>yarn.nodemanager.recovery.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>yarn.nodemanager.recovery.dir</name>
        <value>/qhapp/cdh/var/lib/hadoop-yarn/yarn-nm-recovery</value>
      </property>
      <property>
        <name>yarn.resourcemanager.client.thread-count</name>
        <value>50</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.client.thread-count</name>
        <value>50</value>
      </property>
      <property>
        <name>yarn.resourcemanager.admin.client.thread-count</name>
        <value>1</value>
      </property>
      <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>512</value>
      </property>
      <property>
        <name>yarn.scheduler.increment-allocation-mb</name>
        <value>256</value>
      </property>
      <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
      </property>
      <property>
        <name>yarn.scheduler.minimum-allocation-vcores</name>
        <value>1</value>
      </property>
      <property>
        <name>yarn.scheduler.increment-allocation-vcores</name>
        <value>1</value>
      </property>
      <property>
        <name>yarn.scheduler.maximum-allocation-vcores</name>
        <value>8</value>
      </property>
      <property>
        <name>yarn.resourcemanager.amliveliness-monitor.interval-ms</name>
        <value>1000</value>
      </property>
      <property>
        <name>yarn.am.liveness-monitor.expiry-interval-ms</name>
        <value>600000</value>
      </property>
      <property>
        <name>yarn.resourcemanager.am.max-attempts</name>
        <value>2</value>
      </property>
      <property>
        <name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name>
        <value>600000</value>
      </property>
      <property>
        <name>yarn.resourcemanager.nm.liveness-monitor.interval-ms</name>
        <value>1000</value>
      </property>
      <property>
        <name>yarn.nm.liveness-monitor.expiry-interval-ms</name>
        <value>600000</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.client.thread-count</name>
        <value>50</value>
      </property>
      <property>
        <name>yarn.application.classpath</name>
        <value>$HADOOP_CLIENT_CONF_DIR,$HADOOP_CONF_DIR,$HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,$HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,$HADOOP_YARN_HOME/*,$HADOOP_YARN_HOME/lib/*</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
      </property>
      <property>
        <name>yarn.nodemanager.container-monitor.interval-ms</name>
        <value>3000</value>
      </property>
      <property>
        <name>yarn.resourcemanager.max-completed-applications</name>
        <value>1000</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/qhapp/cdh/var/lib/yarn/nm</value>
      </property>
      <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/qhapp/cdh/var/log/yarn/container-logs</value>
      </property>
      <property>
        <name>yarn.nodemanager.webapp.address</name>
        <value>sz280108:8042</value>
      </property>
      <property>
        <name>yarn.nodemanager.webapp.https.address</name>
        <value>sz280108:8044</value>
      </property>
      <property>
        <name>yarn.nodemanager.address</name>
        <value>sz280108:8041</value>
      </property>
      <property>
        <name>yarn.nodemanager.admin-env</name>
        <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
      </property>
      <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,HADOOP_YARN_HOME</value>
      </property>
      <property>
        <name>yarn.nodemanager.container-manager.thread-count</name>
        <value>20</value>
      </property>
      <property>
        <name>yarn.nodemanager.delete.thread-count</name>
        <value>4</value>
      </property>
      <property>
        <name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name>
        <value>100</value>
      </property>
      <property>
        <name>yarn.nodemanager.localizer.address</name>
        <value>sz280108:8040</value>
      </property>
      <property>
        <name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
        <value>600000</value>
      </property>
      <property>
        <name>yarn.nodemanager.localizer.cache.target-size-mb</name>
        <value>5120</value>
      </property>
      <property>
        <name>yarn.nodemanager.localizer.client.thread-count</name>
        <value>5</value>
      </property>
      <property>
        <name>yarn.nodemanager.localizer.fetch.thread-count</name>
        <value>4</value>
      </property>
      <property>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>10800</value>
      </property>
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
      </property>
      <property>
        <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
        <value>logs</value>
      </property>
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>6144</value>
      </property>
      <property>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>3</value>
      </property>
      <property>
        <name>yarn.nodemanager.delete.debug-delay-sec</name>
        <value>0</value>
      </property>
      <property>
        <name>yarn.nodemanager.health-checker.script.path</name>
        <value></value>
      </property>
      <property>
        <name>yarn.nodemanager.health-checker.script.opts</name>
        <value></value>
      </property>
      <property>
        <name>yarn.nodemanager.disk-health-checker.interval-ms</name>
        <value>120000</value>
      </property>
      <property>
        <name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
        <value>0</value>
      </property>
      <property>
        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
        <value>90.0</value>
      </property>
      <property>
        <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
        <value>0.25</value>
      </property>
      <property>
        <name>mapreduce.shuffle.max.threads</name>
        <value>80</value>
      </property>
      <property>
        <name>yarn.log.server.url</name>
        <value>http://sz280111:19888/jobhistory/logs/</value>
      </property>
      <property>
        <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user</name>
        <value>nobody</value>
      </property>
      <property>
        <name>yarn.nodemanager.linux-container-executor.resources-handler.class</name>
        <value>org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler</value>
      </property>
      <property>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>6</value>
      </property>
    </configuration>
    View Code

    解决思路1:

    增加  Application Master 最大份额 到0.8 

    "Application Master 最大份额" 控制可用于AM容器的总群集内存,和cpu的百分比。如果你有几个作业,那么每个AM将为每个容器消耗所需的内存和cpu。
    如果这超出了给定的总群集内存的百分比,下一个AM运行将等待,直到它有空闲的资源才会运行。

    参考:https://community.hortonworks.com/questions/77454/tez-job-hang-waiting-for-am-container-to-be-alloca.html

  • 相关阅读:
    Windows Phone7官方更新 增加复制粘贴
    使Apache(Linux)支持Silverlight
    WPF触摸屏项目案例(放置在奔驰公司触摸屏展厅)
    项目开发项目管理(转)
    诺基亚WP7手机界面曝光
    Windows 8十大传言:令苹果坐立不安
    如何为 iPad 打造速度超快的 HTML5 软件
    Silverlight5.0正式发布附下载地址
    我们的案例请访问我们团队官网http://Silverlighter.net
    【转】NTFS3G的安装和配置
  • 原文地址:https://www.cnblogs.com/treehesoft/p/6952756.html
Copyright © 2020-2023  润新知