• Hadoop3.2.0使用详解


    1.概述

    Hadoop3已经发布很久了,迭代集成的一些新特性也是很有用的。截止本篇博客书写为止,Hadoop发布了3.2.0。接下来,笔者就为大家分享一下在使用Hadoop3中遇到到一些问题,以及解决方法。

    2.内容

    2.1 基础软件包

    在使用这些组件时,我们需要做一些准备工作,内容如下:

    • Hadoop-3.2.0安装包(建议Hadoop-3.2.0源代码也一起下载,后面步骤需要用到)
    • Maven-3.6.1(编译Hadoop-3.2.0源代码)
    • ProtoBuf-2.5.0(编译Hadoop-3.2.0源代码)

    2.2 部署环境

    SSH,用户创建,免密登录等这些操作这里就不介绍了,大家可以参考这篇博客【配置高可用的Hadoop平台】。在部署用户下配置好Hadoop的环境变量,例如HADOOP_HOME、HADOOP_CONF_DIR等。

    2.2.1 配置环境变量

    具体内容如下:

    vi ~/.bash_profile
    
    # 编辑如下变量
    export MAVEN_OPTS="-Xms256m -Xmx512m"
    export JAVA_HOME=/data/soft/new/jdk
    export HADOOP_HOME=/data/soft/new/hadoop
    export HADOOP_CONF_DIR=/data/soft/new/hadoop-config
    export HADOOP_YARN_HOME=$HADOOP_HOME
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$MAVEN_OPTS:$HBASE_HOME/bin

    2.2.2 编译Hadoop-3.2.0源代码

    为什么需要编译Hadoop-3.2.0源代码,因为在使用Hadoop-3.2.0时,提交任务到YARN时,可能会出现如下异常:

    2019-04-21 22:47:45,307 ERROR [main] org.apache.hadoop.mapreduce.v2.app.MRAppMaster: Error starting MRAppMaste
    r
    org.apache.hadoop.yarn.exceptions.YarnRuntimeException: java.lang.NullPointerException
            at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:178)
            at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.serviceStart(RMCommunicator.java:122)
            at org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator.serviceStart(RMContainerAllocator.java:280)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
            at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$ContainerAllocatorRouter.serviceStart(MRAppMaster.java:979)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
            at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:121)
            at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.serviceStart(MRAppMaster.java:1293)
            at org.apache.hadoop.service.AbstractService.start(AbstractService.java:194)
            at org.apache.hadoop.mapreduce.v2.app.MRAppMaster$6.run(MRAppMaster.java:1761)
            at java.security.AccessController.doPrivileged(Native Method)
            at javax.security.auth.Subject.doAs(Subject.java:422)
            at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
            at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.initAndStartAppMaster(MRAppMaster.java:1757)
            at org.apache.hadoop.mapreduce.v2.app.MRAppMaster.main(MRAppMaster.java:1691)
    Caused by: java.lang.NullPointerException
            at org.apache.hadoop.mapreduce.v2.app.client.MRClientService.getHttpPort(MRClientService.java:177)
            at org.apache.hadoop.mapreduce.v2.app.rm.RMCommunicator.register(RMCommunicator.java:159)
            ... 14 more

    阅读源代码,会发现这是由于org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer.java这个类中的一段代码引起的,涉及的代码如下:

    if (rmIds != null) {
          List<String> urls = new ArrayList<>();
          for (String rmId : rmIds) {
            String url = getUrlByRmId(yarnConf, rmId);
            // urls.add(url); // 注释掉这端代码,修改为下面的if语句判断是否为null
            if (url != null) {
              urls.add(url);
            }        
          }
          if (!urls.isEmpty()) {
            params.put(RM_HA_URLS, StringUtils.join(",", urls));
          }
        }

    这与yarn-site.xml配置HA的兼容性有关,取决于 yarn.resourcemanager.webapp.address 和 yarn.resourcemanager.webapp.https.address 是否为空。

    准备好Maven环境(建议使用最新的,因为在Hadoop-3.2.0中使用了Maven较新的Plugins插件),ProtoBuf的版本Hadoop还是使用的2.5.0,这里保持不变,在编译环境中配置好即可。然后,开始编译Hadoop-3.2.0源代码,执行命令如下:

    # 为了加快编译速度,不编译单元测试和文档
    mvn package -Pdist -DskipTests -Dtar -Dmaven.javadoc.skip=true

    执行命令,等待编译结果,编译成功后,会出现如下所示的结果:

    [INFO] ------------------------------------------------------------------------
    [INFO] Reactor Summary for Apache Hadoop Main 3.2.0:
    [INFO]
    [INFO] Apache Hadoop Main ................................. SUCCESS [  1.040 s]
    [INFO] Apache Hadoop Build Tools .......................... SUCCESS [  1.054 s]
    [INFO] Apache Hadoop Project POM .......................... SUCCESS [  0.845 s]
    [INFO] Apache Hadoop Annotations .......................... SUCCESS [  0.546 s]
    [INFO] Apache Hadoop Assemblies ........................... SUCCESS [  0.185 s]
    [INFO] Apache Hadoop Project Dist POM ..................... SUCCESS [  1.460 s]
    [INFO] Apache Hadoop Maven Plugins ........................ SUCCESS [  2.556 s]
    [INFO] Apache Hadoop MiniKDC .............................. SUCCESS [  0.529 s]
    [INFO] Apache Hadoop Auth ................................. SUCCESS [  2.412 s]
    [INFO] Apache Hadoop Auth Examples ........................ SUCCESS [  0.977 s]
    [INFO] Apache Hadoop Common ............................... SUCCESS [ 28.555 s]
    [INFO] Apache Hadoop NFS .................................. SUCCESS [  1.319 s]
    [INFO] Apache Hadoop KMS .................................. SUCCESS [ 11.622 s]
    [INFO] Apache Hadoop Common Project ....................... SUCCESS [  0.049 s]
    [INFO] Apache Hadoop HDFS Client .......................... SUCCESS [05:37 min]
    [INFO] Apache Hadoop HDFS ................................. SUCCESS [ 28.582 s]
    [INFO] Apache Hadoop HDFS Native Client ................... SUCCESS [  0.966 s]
    [INFO] Apache Hadoop HttpFS ............................... SUCCESS [  6.328 s]
    [INFO] Apache Hadoop HDFS-NFS ............................. SUCCESS [  0.859 s]
    [INFO] Apache Hadoop HDFS-RBF ............................. SUCCESS [  3.071 s]
    [INFO] Apache Hadoop HDFS Project ......................... SUCCESS [  0.035 s]
    [INFO] Apache Hadoop YARN ................................. SUCCESS [  0.039 s]
    [INFO] Apache Hadoop YARN API ............................. SUCCESS [  5.060 s]
    [INFO] Apache Hadoop YARN Common .......................... SUCCESS [02:24 min]
    [INFO] Apache Hadoop YARN Registry ........................ SUCCESS [  1.147 s]
    [INFO] Apache Hadoop YARN Server .......................... SUCCESS [  0.041 s]
    [INFO] Apache Hadoop YARN Server Common ................... SUCCESS [01:44 min]
    [INFO] Apache Hadoop YARN NodeManager ..................... SUCCESS [  4.143 s]
    [INFO] Apache Hadoop YARN Web Proxy ....................... SUCCESS [  0.921 s]
    [INFO] Apache Hadoop YARN ApplicationHistoryService ....... SUCCESS [ 12.087 s]
    [INFO] Apache Hadoop YARN Timeline Service ................ SUCCESS [  4.518 s]
    [INFO] Apache Hadoop YARN ResourceManager ................. SUCCESS [  7.887 s]
    [INFO] Apache Hadoop YARN Server Tests .................... SUCCESS [  0.982 s]
    [INFO] Apache Hadoop YARN Client .......................... SUCCESS [  1.712 s]
    [INFO] Apache Hadoop YARN SharedCacheManager .............. SUCCESS [  0.919 s]
    [INFO] Apache Hadoop YARN Timeline Plugin Storage ......... SUCCESS [  1.269 s]
    [INFO] Apache Hadoop YARN TimelineService HBase Backend ... SUCCESS [  0.062 s]
    [INFO] Apache Hadoop YARN TimelineService HBase Common .... SUCCESS [ 26.109 s]
    [INFO] Apache Hadoop YARN TimelineService HBase Client .... SUCCESS [ 33.811 s]
    [INFO] Apache Hadoop YARN TimelineService HBase Servers ... SUCCESS [  0.041 s]
    [INFO] Apache Hadoop YARN TimelineService HBase Server 1.2  SUCCESS [  1.659 s]
    [INFO] Apache Hadoop YARN TimelineService HBase tests ..... SUCCESS [ 44.305 s]
    [INFO] Apache Hadoop YARN Router .......................... SUCCESS [  1.186 s]
    [INFO] Apache Hadoop YARN Applications .................... SUCCESS [  0.049 s]
    [INFO] Apache Hadoop YARN DistributedShell ................ SUCCESS [  0.843 s]
    [INFO] Apache Hadoop YARN Unmanaged Am Launcher ........... SUCCESS [  0.571 s]
    [INFO] Apache Hadoop MapReduce Client ..................... SUCCESS [  0.136 s]
    [INFO] Apache Hadoop MapReduce Core ....................... SUCCESS [  3.399 s]
    [INFO] Apache Hadoop MapReduce Common ..................... SUCCESS [  1.819 s]
    [INFO] Apache Hadoop MapReduce Shuffle .................... SUCCESS [  1.289 s]
    [INFO] Apache Hadoop MapReduce App ........................ SUCCESS [  2.320 s]
    [INFO] Apache Hadoop MapReduce HistoryServer .............. SUCCESS [  1.450 s]
    [INFO] Apache Hadoop MapReduce JobClient .................. SUCCESS [  2.856 s]
    [INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [  0.969 s]
    [INFO] Apache Hadoop YARN Services ........................ SUCCESS [  0.041 s]
    [INFO] Apache Hadoop YARN Services Core ................... SUCCESS [ 13.856 s]
    [INFO] Apache Hadoop YARN Services API .................... SUCCESS [  1.034 s]
    [INFO] Apache Hadoop Image Generation Tool ................ SUCCESS [  0.715 s]
    [INFO] Yet Another Learning Platform ...................... SUCCESS [  0.946 s]
    [INFO] Apache Hadoop YARN Site ............................ SUCCESS [  0.065 s]
    [INFO] Apache Hadoop YARN UI .............................. SUCCESS [  0.048 s]
    [INFO] Apache Hadoop YARN Project ......................... SUCCESS [  8.150 s]
    [INFO] Apache Hadoop MapReduce HistoryServer Plugins ...... SUCCESS [  0.525 s]
    [INFO] Apache Hadoop MapReduce NativeTask ................. SUCCESS [  0.931 s]
    [INFO] Apache Hadoop MapReduce Uploader ................... SUCCESS [  0.575 s]
    [INFO] Apache Hadoop MapReduce Examples ................... SUCCESS [  0.829 s]
    [INFO] Apache Hadoop MapReduce ............................ SUCCESS [  3.370 s]
    [INFO] Apache Hadoop MapReduce Streaming .................. SUCCESS [  6.949 s]
    [INFO] Apache Hadoop Distributed Copy ..................... SUCCESS [  1.523 s]
    [INFO] Apache Hadoop Archives ............................. SUCCESS [  0.392 s]
    [INFO] Apache Hadoop Archive Logs ......................... SUCCESS [  0.515 s]
    [INFO] Apache Hadoop Rumen ................................ SUCCESS [  0.807 s]
    [INFO] Apache Hadoop Gridmix .............................. SUCCESS [  0.774 s]
    [INFO] Apache Hadoop Data Join ............................ SUCCESS [  0.385 s]
    [INFO] Apache Hadoop Extras ............................... SUCCESS [  0.425 s]
    [INFO] Apache Hadoop Pipes ................................ SUCCESS [  0.055 s]
    [INFO] Apache Hadoop OpenStack support .................... SUCCESS [  0.688 s]
    [INFO] Apache Hadoop Amazon Web Services support .......... SUCCESS [ 54.379 s]
    [INFO] Apache Hadoop Kafka Library support ................ SUCCESS [  3.304 s]
    [INFO] Apache Hadoop Azure support ........................ SUCCESS [01:42 min]
    [INFO] Apache Hadoop Aliyun OSS support ................... SUCCESS [  3.943 s]
    [INFO] Apache Hadoop Client Aggregator .................... SUCCESS [  2.479 s]
    [INFO] Apache Hadoop Scheduler Load Simulator ............. SUCCESS [  1.577 s]
    [INFO] Apache Hadoop Resource Estimator Service ........... SUCCESS [  5.400 s]
    [INFO] Apache Hadoop Azure Data Lake support .............. SUCCESS [02:40 min]
    [INFO] Apache Hadoop Tools Dist ........................... SUCCESS [  7.984 s]
    [INFO] Apache Hadoop Tools ................................ SUCCESS [  0.056 s]
    [INFO] Apache Hadoop Client API ........................... SUCCESS [01:18 min]
    [INFO] Apache Hadoop Client Runtime ....................... SUCCESS [ 51.046 s]
    [INFO] Apache Hadoop Client Packaging Invariants .......... SUCCESS [  1.265 s]
    [INFO] Apache Hadoop Client Test Minicluster .............. SUCCESS [01:41 min]
    [INFO] Apache Hadoop Client Packaging Invariants for Test . SUCCESS [  0.172 s]
    [INFO] Apache Hadoop Client Packaging Integration Tests ... SUCCESS [  0.146 s]
    [INFO] Apache Hadoop Distribution ......................... SUCCESS [ 23.411 s]
    [INFO] Apache Hadoop Client Modules ....................... SUCCESS [  0.045 s]
    [INFO] Apache Hadoop Cloud Storage ........................ SUCCESS [  0.649 s]
    [INFO] Apache Hadoop Cloud Storage Project ................ SUCCESS [  0.060 s]
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time:  24:49 min
    [INFO] Finished at: 2019-04-22T03:21:56+08:00
    [INFO] ------------------------------------------------------------------------

    最后,在hadoop-dist/target目录中将编译后的/hadoop-dist/target/hadoop-3.2.0/share/hadoop/yarn/hadoop-yarn-server-web-proxy-3.2.0.jar包上传到Hadoop集群中,替换$HADOOP_HOME/share/hadoop/yarn中的jar包。

    2.2.3 配置Hadoop文件

    之前,介绍过Hadoop2的配置文件,这次为大家重新整理了一份Hadoop3的配置文件,具体内容如下:

    1.hdfs-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
        <property>
            <name>dfs.nameservices</name>
            <value>cluster1</value>
        </property>
        <property>
            <name>dfs.ha.namenodes.cluster1</name>
            <value>nna,nns</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.cluster1.nna</name>
            <value>nna:9820</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.cluster1.nns</name>
            <value>nns:9820</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.cluster1.nna</name>
            <value>nna:9870</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.cluster1.nns</name>
            <value>nns:9870</value>
        </property>
        <property>
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://dn1:8485;dn2:8485;dn3:8485/cluster1</value>
        </property>
        <property>
            <name>dfs.client.failover.proxy.provider.cluster1</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
            <name>dfs.ha.fencing.methods</name>
            <value>sshfence</value>
        </property>
        <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/home/hadoop/.ssh/id_rsa</value>
        </property>
        <!-- 如果条件允许,建议挂在独立磁盘 -->
        <property>
            <name>dfs.journalnode.edits.dir</name>
            <value>/data/soft/new/dfs/journal</value>
        </property>
        <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
        <!-- 如果条件允许,建议挂在独立磁盘 -->
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/data/soft/new/dfs/name</value>
        </property>
        <!-- 实际物理机中会有若干块独立磁盘,以英文逗号分隔即可 -->
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/data/soft/new/dfs/data</value>
        </property>
        <!-- 副本视情况而定进行设置,HDFS空间充足可设置为3 -->
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
        </property>
        <property>
            <name>dfs.journalnode.http-address</name>
            <value>0.0.0.0:8480</value>
        </property>
        <property>
            <name>dfs.journalnode.rpc-address</name>
            <value>0.0.0.0:8485</value>
        </property>
        <property>
            <name>ha.zookeeper.quorum</name>
            <value>dn1:2181,dn2:2181,dn3:2181</value>
        </property>
    </configuration>

    2.core-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://cluster1</value>
        </property>
        <property>
            <name>io.file.buffer.size</name>
            <value>131072</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/soft/new/dfs/tmp</value>
        </property>
        <property>
            <name>hadoop.proxyuser.root.hosts</name>
            <value>*</value>
        </property>
        <property>
            <name>hadoop.proxyuser.root.groups</name>
            <value>*</value>
        </property>
        <property>
            <name>ha.zookeeper.quorum</name>
            <value>dn1:2181,dn2:2181,dn3:2181</value>
        </property>
    </configuration>

    3.mapred-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>0.0.0.0:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>nna:19888</value>
        </property>
        <property>
            <name>yarn.app.mapreduce.am.resource.mb</name>
            <value>512</value>
        </property>
        <property>
            <name>mapreduce.map.memory.mb</name>
            <value>512</value>
        </property>
        <property>
            <name>mapreduce.map.java.opts</name>
            <value>-Xmx512M</value>
        </property>
        <property>
            <name>mapreduce.reduce.memory.mb</name>
            <value>512</value>
        </property>
        <property>
            <name>mapreduce.reduce.java.opts</name>
            <value>-Xmx512M</value>
        </property>
        <property>
            <name>mapred.child.java.opts</name>
            <value>-Xmx512M</value>
        </property>
        <!-- 加载依赖JAR包和配置文件 -->
       <property>
          <name>mapreduce.application.classpath</name>
          <value>/data/soft/new/hadoop-config,/data/soft/new/hadoop/share/hadoop/common/*,/data/soft/new/hadoop/share/hadoop/common/lib/*,/data/soft/new/hadoop/share/hadoop/hdfs/*,/data/soft/new/hadoop/share/hadoop/hdfs/lib/*,/data/soft/new/hadoop/share/hadoop/yarn/*,/data/soft/new/hadoop/share/hadoop/yarn/lib/*,/data/soft/new/hadoop/share/hadoop/mapreduce/*,/data/soft/new/hadoop/share/hadoop/mapreduce/lib/*</value>
        </property>
    </configuration>

    4.yarn-site.xml

    <?xml version="1.0"?>
    <configuration>
      <property>
        <description>Factory to create client IPC classes.</description>
        <name>yarn.ipc.client.factory.class</name>
      </property>
    
      <property>
        <description>Factory to create server IPC classes.</description>
        <name>yarn.ipc.server.factory.class</name>
      </property>
    
      <property>
        <description>Factory to create serializeable records.</description>
        <name>yarn.ipc.record.factory.class</name>
      </property>
      <property>
        <description>RPC class implementation</description>
        <name>yarn.ipc.rpc.class</name>
        <value>org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC</value>
      </property>
      
      <!-- Resource Manager Configs -->
      <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname.hdp-rm01</name>
        <value>nna</value>
      </property>    
    
      <property>
        <description>The hostname of the RM.</description>
        <name>yarn.resourcemanager.hostname.hdp-rm02</name>
        <value>nns</value>
      </property>
    
      
      <property>
        <description>The address of the applications manager interface in the RM.</description>
        <name>yarn.resourcemanager.address</name>
        <value></value>
      </property>
    
      <property>
        <description>
          The actual address the server will bind to. If this optional address is
          set, the RPC and webapp servers will bind to this address and the port specified in
          yarn.resourcemanager.address and yarn.resourcemanager.webapp.address, respectively. This
          is most useful for making RM listen to all interfaces by setting to 0.0.0.0.
        </description>
        <name>yarn.resourcemanager.bind-host</name>
        <value></value>
      </property>
    
      <property>
        <description>The number of threads used to handle applications manager requests.</description>
        <name>yarn.resourcemanager.client.thread-count</name>
        <value>50</value>
      </property>
    
      <property>
        <description>The expiry interval for application master reporting.</description>
        <name>yarn.am.liveness-monitor.expiry-interval-ms</name>
        <value>600000</value>
      </property>
    
      <property>
        <description>The Kerberos principal for the resource manager.</description>
        <name>yarn.resourcemanager.principal</name>
      </property>
    
      <property>
        <description>The address of the scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>yarn01-sh:8030</value>
      </property>
    
      <property>
        <description>Number of threads to handle scheduler interface.</description>
        <name>yarn.resourcemanager.scheduler.client.thread-count</name>
        <value>50</value>
      </property>
    
      <property>
          <description>
            This configures the HTTP endpoint for Yarn Daemons.The following
            values are supported:
            - HTTP_ONLY : Service is provided only on http
            - HTTPS_ONLY : Service is provided only on https
          </description>
          <name>yarn.http.policy</name>
          <value>HTTP_ONLY</value>
      </property>
    
      <property>
        <description>The http address of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.address</name>
        <value>yarn01-sh:8088</value>
      </property>
    
      <property>
        <description>The https adddress of the RM web application.</description>
        <name>yarn.resourcemanager.webapp.https.address</name>
        <value>yarn01-sh:8088</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>yarn01-sh:8031</value>
      </property>
    
      <property>
        <description>Are acls enabled.</description>
        <name>yarn.acl.enable</name>
        <value>false</value>
      </property>
    
      <property>
        <description>ACL of who can be admin of the YARN cluster.</description>
        <name>yarn.admin.acl</name>
        <value>*</value>
      </property>
    
      <property>
        <description>The address of the RM admin interface.</description>
        <name>yarn.resourcemanager.admin.address</name>
        <value></value>
      </property>
    
      <property>
        <description>Number of threads used to handle RM admin interface.</description>
        <name>yarn.resourcemanager.admin.client.thread-count</name>
        <value>1</value>
      </property>
    
      <property>
        <description>Maximum time to wait to establish connection to
        ResourceManager.</description>
        <name>yarn.resourcemanager.connect.max-wait.ms</name>
        <value>900000</value>
      </property>
    
      <property>
        <description>How often to try connecting to the
        ResourceManager.</description>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>30000</value>
      </property>
    
      <property>
        <description>The maximum number of application attempts. It's a global
        setting for all application masters. Each application master can specify
        its individual maximum number of application attempts via the API, but the
        individual number cannot be more than the global upper bound. If it is,
        the resourcemanager will override it. The default number is set to 2, to
        allow at least one retry for AM.</description>
        <name>yarn.resourcemanager.am.max-attempts</name>
        <value>2</value>
      </property>
    
      <property>
        <description>How often to check that containers are still alive. </description>
        <name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name>
        <value>600000</value>
      </property>
    
      <property>
        <description>Flag to enable override of the default kerberos authentication
        filter with the RM authentication filter to allow authentication using
        delegation tokens(fallback to kerberos if the tokens are missing). Only
        applicable when the http authentication type is kerberos.</description>
        <name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name>
        <value>false</value>
      </property>
    
      <property>
        <description>How long to wait until a node manager is considered dead.</description>
        <name>yarn.nm.liveness-monitor.expiry-interval-ms</name>
        <value>600000</value>
      </property>
    
      <property>
        <description>Path to file with nodes to include.</description>
        <name>yarn.resourcemanager.nodes.include-path</name>
        <value></value>
      </property>
    
      <property>
        <description>Path to file with nodes to exclude.</description>
        <name>yarn.resourcemanager.nodes.exclude-path</name>
        <value></value>
      </property>
    
      <property>
        <description>Number of threads to handle resource tracker calls.</description>
        <name>yarn.resourcemanager.resource-tracker.client.thread-count</name>
        <value>50</value>
      </property>
    
      <property>
        <description>The class to use as the resource scheduler.</description>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
      </property>
    
      <property>
        <description>The minimum allocation for every container request at the RM,
        in MBs. Memory requests lower than this won't take effect,
        and the specified value will get allocated at minimum.</description>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>1024</value>
      </property>
    
      <property>
        <description>The maximum allocation for every container request at the RM,
        in MBs. Memory requests higher than this won't take effect,
        and will get capped to this value.</description>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>2048</value>
      </property>
    
      <property>
        <description>The minimum allocation for every container request at the RM,
        in terms of virtual CPU cores. Requests lower than this won't take effect,
        and the specified value will get allocated the minimum.</description>
        <name>yarn.scheduler.minimum-allocation-vcores</name>
        <value>1</value>
      </property>
    
      <property>
        <description>The maximum allocation for every container request at the RM,
        in terms of virtual CPU cores. Requests higher than this won't take effect,
        and will get capped to this value.</description>
        <name>yarn.scheduler.maximum-allocation-vcores</name>
        <value>2</value>
      </property>
    
      <property>
        <description>Enable RM to recover state after starting. If true, then 
          yarn.resourcemanager.store.class must be specified. </description>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
      </property>
    
      <property>
        <description>Enable RM work preserving recovery. This configuration is private
        to YARN for experimenting the feature.
        </description>
        <name>yarn.resourcemanager.work-preserving-recovery.enabled</name>
        <value>false</value>
      </property>
    
      <property>
        <description>Set the amount of time RM waits before allocating new
        containers on work-preserving-recovery. Such wait period gives RM a chance
        to settle down resyncing with NMs in the cluster on recovery, before assigning
        new containers to applications.
        </description>
        <name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name>
        <value>10000</value>
      </property>
    
      <property>
        <description>The class to use as the persistent store.
    
          If org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
          is used, the store is implicitly fenced; meaning a single ResourceManager
          is able to use the store at any point in time. More details on this
          implicit fencing, along with setting up appropriate ACLs is discussed
          under yarn.resourcemanager.zk-state-store.root-node.acl.
        </description>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
      </property>
    
      <property>
        <description>The maximum number of completed applications RM state
        store keeps, less than or equals to ${yarn.resourcemanager.max-completed-applications}.
        By default, it equals to ${yarn.resourcemanager.max-completed-applications}.
        This ensures that the applications kept in the state store are consistent with
        the applications remembered in RM memory.
        Any values larger than ${yarn.resourcemanager.max-completed-applications} will
        be reset to ${yarn.resourcemanager.max-completed-applications}.
        Note that this value impacts the RM recovery performance.Typically,
        a smaller value indicates better performance on RM recovery.
        </description>
        <name>yarn.resourcemanager.state-store.max-completed-applications</name>
        <value>${yarn.resourcemanager.max-completed-applications}</value>
      </property>
    
      <property>
        <description>Host:Port of the ZooKeeper server to be used by the RM. This
          must be supplied when using the ZooKeeper based implementation of the
          RM state store and/or embedded automatic failover in a HA setting.
        </description>
        <name>yarn.resourcemanager.zk-address</name>
         <value>dn1:2181,dn2:2181,dn3:2181</value>
      </property>
    
      <property>
        <description>Number of times RM tries to connect to ZooKeeper.</description>
        <name>yarn.resourcemanager.zk-num-retries</name>
        <value>1000</value>
      </property>
    
      <property>
        <description>Retry interval in milliseconds when connecting to ZooKeeper.
          When HA is enabled, the value here is NOT used. It is generated
          automatically from yarn.resourcemanager.zk-timeout-ms and
          yarn.resourcemanager.zk-num-retries.
        </description>
        <name>yarn.resourcemanager.zk-retry-interval-ms</name>
        <value>1000</value>
      </property>
    
      <property>
        <description>Full path of the ZooKeeper znode where RM state will be
        stored. This must be supplied when using
        org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
        as the value for yarn.resourcemanager.store.class</description>
        <name>yarn.resourcemanager.zk-state-store.parent-path</name>
        <value>/rmstore</value>
      </property>
    
      <property>
        <description>ZooKeeper session timeout in milliseconds. Session expiration
        is managed by the ZooKeeper cluster itself, not by the client. This value is
        used by the cluster to determine when the client's session expires.
        Expirations happens when the cluster does not hear from the client within
        the specified session timeout period (i.e. no heartbeat).</description>
        <name>yarn.resourcemanager.zk-timeout-ms</name>
        <value>10000</value>
      </property>
    
      <property>
        <description>ACL's to be used for ZooKeeper znodes.</description>
        <name>yarn.resourcemanager.zk-acl</name>
        <value>world:anyone:rwcda</value>
      </property>
    
      <property>
        <description>
          ACLs to be used for the root znode when using ZKRMStateStore in a HA
          scenario for fencing.
    
          ZKRMStateStore supports implicit fencing to allow a single
          ResourceManager write-access to the store. For fencing, the
          ResourceManagers in the cluster share read-write-admin privileges on the
          root node, but the Active ResourceManager claims exclusive create-delete
          permissions.
    
          By default, when this property is not set, we use the ACLs from
          yarn.resourcemanager.zk-acl for shared admin access and
          rm-address:random-number for username-based exclusive create-delete
          access.
    
          This property allows users to set ACLs of their choice instead of using
          the default mechanism. For fencing to work, the ACLs should be
          carefully set differently on each ResourceManger such that all the
          ResourceManagers have shared admin access and the Active ResourceManger
          takes over (exclusively) the create-delete access.
        </description>
        <name>yarn.resourcemanager.zk-state-store.root-node.acl</name>
      </property>
    
      <property>
        <description>
            Specify the auths to be used for the ACL's specified in both the
            yarn.resourcemanager.zk-acl and
            yarn.resourcemanager.zk-state-store.root-node.acl properties.  This
            takes a comma-separated list of authentication mechanisms, each of the
            form 'scheme:auth' (the same syntax used for the 'addAuth' command in
            the ZK CLI).
        </description>
        <name>yarn.resourcemanager.zk-auth</name>
      </property>
    
      <property>
        <description>URI pointing to the location of the FileSystem path where
        RM state will be stored. This must be supplied when using
        org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore
        as the value for yarn.resourcemanager.store.class</description>
        <name>yarn.resourcemanager.fs.state-store.uri</name>
        <value>${hadoop.tmp.dir}/yarn/system/rmstore</value>
        <!--value>hdfs://localhost:9000/rmstore</value-->
      </property>
    
      <property>
        <description>hdfs client retry policy specification. hdfs client retry
        is always enabled. Specified in pairs of sleep-time and number-of-retries
        and (t0, n0), (t1, n1), ..., the first n0 retries sleep t0 milliseconds on
        average, the following n1 retries sleep t1 milliseconds on average, and so on.
        </description>
        <name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name>
        <value>2000, 500</value>
      </property>
    
      <property>
        <description>Enable RM high-availability. When enabled,
          (1) The RM starts in the Standby mode by default, and transitions to
          the Active mode when prompted to.
          (2) The nodes in the RM ensemble are listed in
          yarn.resourcemanager.ha.rm-ids
          (3) The id of each RM either comes from yarn.resourcemanager.ha.id
          if yarn.resourcemanager.ha.id is explicitly specified or can be
          figured out by matching yarn.resourcemanager.address.{id} with local address
          (4) The actual physical addresses come from the configs of the pattern
          - {rpc-config}.{id}</description>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
      </property>
    
      <property>
        <description>Enable automatic failover.
          By default, it is enabled only when HA is enabled</description>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
      </property>
    
      <property>
        <description>Enable embedded automatic failover.
          By default, it is enabled only when HA is enabled.
          The embedded elector relies on the RM state store to handle fencing,
          and is primarily intended to be used in conjunction with ZKRMStateStore.
        </description>
        <name>yarn.resourcemanager.ha.automatic-failover.embedded</name>
        <value>true</value>
      </property>
    
      <property>
        <description>The base znode path to use for storing leader information,
          when using ZooKeeper based leader election.</description>
        <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
        <value>/yarn-leader-election</value>
      </property>
    
      <property>
        <description>Name of the cluster. In a HA setting,
          this is used to ensure the RM participates in leader
          election for this cluster and ensures it does not affect
          other clusters</description>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>yarn01-sh</value>
      </property>
    
      <property>
        <description>The list of RM nodes in the cluster when HA is
          enabled. See description of yarn.resourcemanager.ha
          .enabled for full details on how this is used.</description>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>hdp-rm01,hdp-rm02</value>
      </property>
    
      <property>
        <description>The id (string) of the current RM. When HA is enabled, this
          is an optional config. The id of current RM can be set by explicitly
          specifying yarn.resourcemanager.ha.id or figured out by matching
          yarn.resourcemanager.address.{id} with local address
          See description of yarn.resourcemanager.ha.enabled
          for full details on how this is used.</description>
        <name>yarn.resourcemanager.ha.id</name>
        <!--value>rm1</value-->
      </property>
    
      <property>
        <description>When HA is enabled, the class to be used by Clients, AMs and
          NMs to failover to the Active RM. It should extend
          org.apache.hadoop.yarn.client.RMFailoverProxyProvider</description>
        <name>yarn.client.failover-proxy-provider</name>
        <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
      </property>
    
      <property>
        <description>When HA is enabled, the max number of times
          FailoverProxyProvider should attempt failover. When set,
          this overrides the yarn.resourcemanager.connect.max-wait.ms. When
          not set, this is inferred from
          yarn.resourcemanager.connect.max-wait.ms.</description>
        <name>yarn.client.failover-max-attempts</name>
        <!--value>15</value-->
      </property>
    
      <property>
        <description>When HA is enabled, the sleep base (in milliseconds) to be
          used for calculating the exponential delay between failovers. When set,
          this overrides the yarn.resourcemanager.connect.* settings. When
          not set, yarn.resourcemanager.connect.retry-interval.ms is used instead.
        </description>
        <name>yarn.client.failover-sleep-base-ms</name>
        <!--value>500</value-->
      </property>
    
      <property>
        <description>When HA is enabled, the maximum sleep time (in milliseconds)
          between failovers. When set, this overrides the
          yarn.resourcemanager.connect.* settings. When not set,
          yarn.resourcemanager.connect.retry-interval.ms is used instead.</description>
        <name>yarn.client.failover-sleep-max-ms</name>
        <!--value>15000</value-->
      </property>
    
      <property>
        <description>When HA is enabled, the number of retries per
          attempt to connect to a ResourceManager. In other words,
          it is the ipc.client.connect.max.retries to be used during
          failover attempts</description>
        <name>yarn.client.failover-retries</name>
        <value>0</value>
      </property>
    
      <property>
        <description>When HA is enabled, the number of retries per
          attempt to connect to a ResourceManager on socket timeouts. In other
          words, it is the ipc.client.connect.max.retries.on.timeouts to be used
          during failover attempts</description>
        <name>yarn.client.failover-retries-on-socket-timeouts</name>
        <value>60</value>
      </property>
    
      <property>
        <description>The maximum number of completed applications RM keeps. </description>
        <name>yarn.resourcemanager.max-completed-applications</name>
        <value>10000</value>
      </property>
    
      <property>
        <description>Interval at which the delayed token removal thread runs</description>
        <name>yarn.resourcemanager.delayed.delegation-token.removal-interval-ms</name>
        <value>30000</value>
      </property>
    
      <property>
      <description>If true, ResourceManager will have proxy-user privileges.
        Use case: In a secure cluster, YARN requires the user hdfs delegation-tokens to
        do localization and log-aggregation on behalf of the user. If this is set to true,
        ResourceManager is able to request new hdfs delegation tokens on behalf of
        the user. This is needed by long-running-service, because the hdfs tokens
        will eventually expire and YARN requires new valid tokens to do localization
        and log-aggregation. Note that to enable this use case, the corresponding
        HDFS NameNode has to configure ResourceManager as the proxy-user so that
        ResourceManager can itself ask for new tokens on behalf of the user when
        tokens are past their max-life-time.</description>
        <name>yarn.resourcemanager.proxy-user-privileges.enabled</name>
        <value>false</value>
      </property>
    
      <property>
        <description>Interval for the roll over for the master key used to generate
            application tokens
        </description>
        <name>yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs</name>
        <value>86400</value>
      </property>
    
      <property>
        <description>Interval for the roll over for the master key used to generate
            container tokens. It is expected to be much greater than
            yarn.nm.liveness-monitor.expiry-interval-ms and
            yarn.rm.container-allocation.expiry-interval-ms. Otherwise the
            behavior is undefined.
        </description>
        <name>yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs</name>
        <value>86400</value>
      </property>
    
      <property>
        <description>The heart-beat interval in milliseconds for every NodeManager in the cluster.</description>
        <name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name>
        <value>1000</value>
      </property>
    
      <property>
        <description>The minimum allowed version of a connecting nodemanager.  The valid values are
          NONE (no version checking), EqualToRM (the nodemanager's version is equal to
          or greater than the RM version), or a Version String.</description>
        <name>yarn.resourcemanager.nodemanager.minimum.version</name>
        <value>NONE</value>
      </property>
    
      <property>
        <description>Enable a set of periodic monitors (specified in
            yarn.resourcemanager.scheduler.monitor.policies) that affect the
            scheduler.</description>
        <name>yarn.resourcemanager.scheduler.monitor.enable</name>
        <value>false</value>
      </property>
    
      <property>
        <description>The list of SchedulingEditPolicy classes that interact with
            the scheduler. A particular module may be incompatible with the
            scheduler, other policies, or a configuration of either.</description>
        <name>yarn.resourcemanager.scheduler.monitor.policies</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy</value>
      </property>
    
      <property>
        <description>The class to use as the configuration provider.
        If org.apache.hadoop.yarn.LocalConfigurationProvider is used,
        the local configuration will be loaded.
        If org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider is used,
        the configuration which will be loaded should be uploaded to remote File system first.
        </description>
        <name>yarn.resourcemanager.configuration.provider-class</name>
        <value>org.apache.hadoop.yarn.LocalConfigurationProvider</value>
        <!-- <value>org.apache.hadoop.yarn.FileSystemBasedConfigurationProvider</value> -->
      </property>
    
      <property>
        <description>The setting that controls whether yarn system metrics is
        published on the timeline server or not by RM.</description>
        <name>yarn.resourcemanager.system-metrics-publisher.enabled</name>
        <value>false</value>
      </property>
    
      <property>
        <description>Number of worker threads that send the yarn system metrics
        data.</description>
        <name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name>
        <value>10</value>
      </property>
    
      <!-- Node Manager Configs -->
      <property>
        <description>The hostname of the NM.</description>
        <name>yarn.nodemanager.hostname</name>
        <value>0.0.0.0</value>
      </property>
      
      <property>
        <description>The address of the container manager in the NM.</description>
        <name>yarn.nodemanager.address</name>
        <value>${yarn.nodemanager.hostname}:0</value>
      </property>
    
      <property>
        <description>
          The actual address the server will bind to. If this optional address is
          set, the RPC and webapp servers will bind to this address and the port specified in
          yarn.nodemanager.address and yarn.nodemanager.webapp.address, respectively. This is
          most useful for making NM listen to all interfaces by setting to 0.0.0.0.
        </description>
        <name>yarn.nodemanager.bind-host</name>
        <value></value>
      </property>
    
      <property>
        <description>Environment variables that should be forwarded from the NodeManager's environment to the container's.</description>
        <name>yarn.nodemanager.admin-env</name>
        <value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value>
      </property>
    
      <property>
        <description>Environment variables that containers may override rather than use NodeManager's default.</description>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,HADOOP_YARN_HOME</value>
      </property>
    
      <property>
        <description>who will execute(launch) the containers.</description>
        <name>yarn.nodemanager.container-executor.class</name>
        <value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</value>
    <!--<value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>-->
      </property>
    
      <property>
        <description>Number of threads container manager uses.</description>
        <name>yarn.nodemanager.container-manager.thread-count</name>
        <value>20</value>
      </property>
    
      <property>
        <description>Number of threads used in cleanup.</description>
        <name>yarn.nodemanager.delete.thread-count</name>
        <value>4</value>
      </property>
    
      <property>
        <description>
          Number of seconds after an application finishes before the nodemanager's 
          DeletionService will delete the application's localized file directory
          and log directory.
          
          To diagnose Yarn application problems, set this property's value large
          enough (for example, to 600 = 10 minutes) to permit examination of these
          directories. After changing the property's value, you must restart the 
          nodemanager in order for it to have an effect.
    
          The roots of Yarn applications' work directories is configurable with
          the yarn.nodemanager.local-dirs property (see below), and the roots
          of the Yarn applications' log directories is configurable with the 
          yarn.nodemanager.log-dirs property (see also below).
        </description>
        <name>yarn.nodemanager.delete.debug-delay-sec</name>
        <value>0</value>
      </property>
    
      <property>
        <description>List of directories to store localized files in. An 
          application's localized file directory will be found in:
          ${yarn.nodemanager.local-dirs}/usercache/${user}/appcache/application_${appid}.
          Individual containers' work directories, called container_${contid}, will
          be subdirectories of this.
       </description>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/data/soft/new/dfs/nm-local-dir</value>
      </property>
    
      <property>
        <description>It limits the maximum number of files which will be localized
          in a single local directory. If the limit is reached then sub-directories
          will be created and new files will be localized in them. If it is set to
          a value less than or equal to 36 [which are sub-directories (0-9 and then
          a-z)] then NodeManager will fail to start. For example; [for public
          cache] if this is configured with a value of 40 ( 4 files +
          36 sub-directories) and the local-dir is "/tmp/local-dir1" then it will
          allow 4 files to be created directly inside "/tmp/local-dir1/filecache".
          For files that are localized further it will create a sub-directory "0"
          inside "/tmp/local-dir1/filecache" and will localize files inside it
          until it becomes full. If a file is removed from a sub-directory that
          is marked full, then that sub-directory will be used back again to
          localize files.
       </description>
        <name>yarn.nodemanager.local-cache.max-files-per-directory</name>
        <value>8192</value>
      </property>
    
      <property>
        <description>Address where the localizer IPC is.</description>
        <name>yarn.nodemanager.localizer.address</name>
        <value>${yarn.nodemanager.hostname}:8040</value>
      </property>
    
      <property>
        <description>Interval in between cache cleanups.</description>
        <name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name>
        <value>600000</value>
      </property>
    
      <property>
        <description>Target size of localizer cache in MB, per nodemanager. It is
          a target retention size that only includes resources with PUBLIC and 
          PRIVATE visibility and excludes resources with APPLICATION visibility
        </description>
        <name>yarn.nodemanager.localizer.cache.target-size-mb</name>
        <value>11010048</value>
      </property>
    
      <property>
        <description>Number of threads to handle localization requests.</description>
        <name>yarn.nodemanager.localizer.client.thread-count</name>
        <value>5</value>
      </property>
    
      <property>
        <description>Number of threads to use for localization fetching.</description>
        <name>yarn.nodemanager.localizer.fetch.thread-count</name>
        <value>4</value>
      </property>
    
      <property>
        <description>
          Where to store container logs. An application's localized log directory 
          will be found in ${yarn.nodemanager.log-dirs}/application_${appid}.
          Individual containers' log directories will be below this, in directories 
          named container_{$contid}. Each container directory will contain the files
          stderr, stdin, and syslog generated by that container.
        </description>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/data/soft/new/dfs/userlogs</value>
      </property>
    
      <property>
        <description>Whether to enable log aggregation. Log aggregation collects
          each container's logs and moves these logs onto a file-system, for e.g.
          HDFS, after the application completes. Users can configure the
          "yarn.nodemanager.remote-app-log-dir" and
          "yarn.nodemanager.remote-app-log-dir-suffix" properties to determine
          where these logs are moved to. Users can access the logs via the
          Application Timeline Server.
        </description>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
      </property>
    
      <property>
        <description>How long to keep aggregation logs before deleting them.  -1 disables. 
        Be careful set this too small and you will spam the name node.</description>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>-1</value>
      </property> 
      
      <property>
        <description>How long to wait between aggregated log retention checks.
        If set to 0 or a negative value then the value is computed as one-tenth
        of the aggregated log retention time. Be careful set this too small and
        you will spam the name node.</description>
        <name>yarn.log-aggregation.retain-check-interval-seconds</name>
        <value>-1</value>
      </property>
    
      <property>
        <description>Time in seconds to retain user logs. Only applicable if
        log aggregation is disabled
        </description>
        <name>yarn.nodemanager.log.retain-seconds</name>
        <value>10800</value>
      </property>
    
      <property>
        <description>Where to aggregate logs to.</description>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/tmp/logs</value>
      </property>
      <property>
        <description>The remote log dir will be created at 
          {yarn.nodemanager.remote-app-log-dir}/${user}/{thisParam}
        </description>
        <name>yarn.nodemanager.remote-app-log-dir-suffix</name>
        <value>logs</value>
      </property>
     <!-- 以实际物理机可用内存为准 -->
      <property>
        <description>Amount of physical memory, in MB, that can be allocated 
        for containers.</description>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>2048</value>
      </property>
    
      <property>
        <description>Whether physical memory limits will be enforced for
        containers.</description>
        <name>yarn.nodemanager.pmem-check-enabled</name>
        <value>true</value>
      </property>
    
      <property>
        <description>Whether virtual memory limits will be enforced for
        containers.</description>
        <name>yarn.nodemanager.vmem-check-enabled</name>
        <value>false</value>
      </property>
    
      <property>
        <description>Ratio between virtual memory to physical memory when
        setting memory limits for containers. Container allocations are
        expressed in terms of physical memory, and virtual memory usage
        is allowed to exceed this allocation by this ratio.
        </description>
        <name>yarn.nodemanager.vmem-pmem-ratio</name>
        <value>2.1</value>
      </property>
     <!-- 以实际物理机CPU可用核数为准 -->
      <property>
        <description>Number of vcores that can be allocated
        for containers. This is used by the RM scheduler when allocating
        resources for containers. This is not used to limit the number of
        physical cores used by YARN containers.</description>
        <name>yarn.nodemanager.resource.cpu-vcores</name>
        <value>2</value>
      </property>
    
      <property>
        <description>Percentage of CPU that can be allocated
        for containers. This setting allows users to limit the amount of
        CPU that YARN containers use. Currently functional only
        on Linux using cgroups. The default is to use 100% of CPU.
        </description>
        <name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name>
        <value>100</value>
      </property>
    
      <property>
        <description>NM Webapp address.</description>
        <name>yarn.nodemanager.webapp.address</name>
        <value>${yarn.nodemanager.hostname}:8042</value>
      </property>
    
      <property>
        <description>How often to monitor containers.</description>
        <name>yarn.nodemanager.container-monitor.interval-ms</name>
        <value>3000</value>
      </property>
    
      <property>
        <description>Class that calculates containers current resource utilization.</description>
        <name>yarn.nodemanager.container-monitor.resource-calculator.class</name>
      </property>
    
      <property>
        <description>Frequency of running node health script.</description>
        <name>yarn.nodemanager.health-checker.interval-ms</name>
        <value>600000</value>
      </property>
    
      <property>
        <description>Script time out period.</description>
        <name>yarn.nodemanager.health-checker.script.timeout-ms</name>
        <value>1200000</value>
      </property>
    
      <property>
        <description>The health check script to run.</description>
        <name>yarn.nodemanager.health-checker.script.path</name>
        <value></value>
      </property>
    
      <property>
        <description>The arguments to pass to the health check script.</description>
        <name>yarn.nodemanager.health-checker.script.opts</name>
        <value></value>
      </property>
    
      <property>
        <description>Frequency of running disk health checker code.</description>
        <name>yarn.nodemanager.disk-health-checker.interval-ms</name>
        <value>120000</value>
      </property>
      
      <property>
        <description>Frequency of running disk health checker code.</description>
        <name>yarn.nodemanager.disk-health-checker.enable</name>
        <value>false</value>
      </property>
    
    
      <property>
        <description>The minimum fraction of number of disks to be healthy for the
        nodemanager to launch new containers. This correspond to both
        yarn-nodemanager.local-dirs and yarn.nodemanager.log-dirs. i.e. If there
        are less number of healthy local-dirs (or log-dirs) available, then
        new containers will not be launched on this node.</description>
        <name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name>
        <value>0.25</value>
      </property>
    
      <property>
        <description>The maximum percentage of disk space utilization allowed after 
        which a disk is marked as bad. Values can range from 0.0 to 100.0. 
        If the value is greater than or equal to 100, the nodemanager will check 
        for full disk. This applies to yarn-nodemanager.local-dirs and 
        yarn.nodemanager.log-dirs.</description>
        <name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name>
        <value>96.0</value>
      </property>
    
      <property>
        <description>The minimum space that must be available on a disk for
        it to be used. This applies to yarn-nodemanager.local-dirs and 
        yarn.nodemanager.log-dirs.</description>
        <name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name>
        <value>0</value>
      </property>
    
      <property>
        <description>The path to the Linux container executor.</description>
        <name>yarn.nodemanager.linux-container-executor.path</name>
      </property>
    
      <property>
        <description>The class which should help the LCE handle resources.</description>
        <name>yarn.nodemanager.linux-container-executor.resources-handler.class</name>
        <value>org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler</value>
        <!-- <value>org.apache.hadoop.yarn.server.nodemanager.util.CgroupsLCEResourcesHandler</value> -->
      </property>
    
      <property>
        <description>The cgroups hierarchy under which to place YARN proccesses (cannot contain commas).
        If yarn.nodemanager.linux-container-executor.cgroups.mount is false (that is, if cgroups have
        been pre-configured), then this cgroups hierarchy must already exist and be writable by the
        NodeManager user, otherwise the NodeManager may fail.
        Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler.</description>
        <name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name>
        <value>/hadoop-yarn</value>
      </property>
    
      <property>
        <description>Whether the LCE should attempt to mount cgroups if not found.
        Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler.</description>
        <name>yarn.nodemanager.linux-container-executor.cgroups.mount</name>
        <value>false</value>
      </property>
    
      <property>
        <description>Where the LCE should attempt to mount cgroups if not found. Common locations
        include /sys/fs/cgroup and /cgroup; the default location can vary depending on the Linux
        distribution in use. This path must exist before the NodeManager is launched.
        Only used when the LCE resources handler is set to the CgroupsLCEResourcesHandler, and
        yarn.nodemanager.linux-container-executor.cgroups.mount is true.</description>
        <name>yarn.nodemanager.linux-container-executor.cgroups.mount-path</name>
      </property>
    
      <property>
        <description>This determines which of the two modes that LCE should use on a non-secure
        cluster.  If this value is set to true, then all containers will be launched as the user 
        specified in yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user.  If 
        this value is set to false, then containers will run as the user who submitted the 
        application.
        </description>
        <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name>
        <value>true</value>
      </property>
    
      <property>
        <description>The UNIX user that containers will run as when Linux-container-executor
        is used in nonsecure mode (a use case for this is using cgroups) if the
        yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users is set 
        to true.</description>
        <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user</name>
        <value>nobody</value>
      </property>
    
      <property>
        <description>The allowed pattern for UNIX user names enforced by
        Linux-container-executor when used in nonsecure mode (use case for this
        is using cgroups). The default value is taken from /usr/sbin/adduser</description>
        <name>yarn.nodemanager.linux-container-executor.nonsecure-mode.user-pattern</name>
        <value>^[_.A-Za-z0-9][-@_.A-Za-z0-9]{0,255}?[$]?$</value>
      </property>
    
      <property>
        <description>This flag determines whether apps should run with strict resource limits
        or be allowed to consume spare resources if they need them. For example, turning the
        flag on will restrict apps to use only their share of CPU, even if the node has spare
        CPU cycles. The default value is false i.e. use available resources. Please note that
        turning this flag on may reduce job throughput on the cluster.</description>
        <name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name>
        <value>false</value>
      </property>
    
      <property>
        <description>T-file compression types used to compress aggregated logs.</description>
        <name>yarn.nodemanager.log-aggregation.compression-type</name>
        <value>none</value>
      </property>
    
      <property>
        <description>The kerberos principal for the node manager.</description>
        <name>yarn.nodemanager.principal</name>
        <value></value>
      </property>
    
      <property>
        <description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
        <name>yarn.nodemanager.aux-services</name>
        <!--<value>mapreduce_shuffle,spark_shuffle</value>-->
        <value>mapreduce_shuffle</value>
      </property>
      
      <property>
        <name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
        <value>org.apache.spark.network.yarn.YarnShuffleService</value>
      </property>
    
      <property>
        <description>No. of ms to wait between sending a SIGTERM and SIGKILL to a container</description>
        <name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name>
        <value>250</value>
      </property>
    
      <property>
        <description>Max time to wait for a process to come up when trying to cleanup a container</description>
        <name>yarn.nodemanager.process-kill-wait.ms</name>
        <value>2000</value>
      </property>
    
      <property>
        <description>The minimum allowed version of a resourcemanager that a nodemanager will connect to.  
          The valid values are NONE (no version checking), EqualToNM (the resourcemanager's version is 
          equal to or greater than the NM version), or a Version String.</description>
        <name>yarn.nodemanager.resourcemanager.minimum.version</name>
        <value>NONE</value>
      </property>
    
      <property>
        <description>Max number of threads in NMClientAsync to process container
        management events</description>
        <name>yarn.client.nodemanager-client-async.thread-pool-max-size</name>
        <value>500</value>
      </property>
    
      <property>
        <description>Max time to wait to establish a connection to NM</description>
        <name>yarn.client.nodemanager-connect.max-wait-ms</name>
        <value>900000</value>
      </property>
    
      <property>
        <description>Time interval between each attempt to connect to NM</description>
        <name>yarn.client.nodemanager-connect.retry-interval-ms</name>
        <value>10000</value>
      </property>
    
      <property>
        <description>
          Maximum number of proxy connections to cache for node managers. If set
          to a value greater than zero then the cache is enabled and the NMClient
          and MRAppMaster will cache the specified number of node manager proxies.
          There will be at max one proxy per node manager. Ex. configuring it to a
          value of 5 will make sure that client will at max have 5 proxies cached
          with 5 different node managers. These connections for these proxies will
          be timed out if idle for more than the system wide idle timeout period.
          Note that this could cause issues on large clusters as many connections
          could linger simultaneously and lead to a large number of connection
          threads. The token used for authentication will be used only at
          connection creation time. If a new token is received then the earlier
          connection should be closed in order to use the new token. This and
          (yarn.client.nodemanager-client-async.thread-pool-max-size) are related
          and should be in sync (no need for them to be equal).
          If the value of this property is zero then the connection cache is
          disabled and connections will use a zero idle timeout to prevent too
          many connection threads on large clusters.
        </description>
        <name>yarn.client.max-cached-nodemanagers-proxies</name>
        <value>0</value>
      </property>
      
      <property>
        <description>Enable the node manager to recover after starting</description>
        <name>yarn.nodemanager.recovery.enabled</name>
        <value>false</value>
      </property>
    
      <property>
        <description>The local filesystem directory in which the node manager will
        store state when recovery is enabled.</description>
        <name>yarn.nodemanager.recovery.dir</name>
        <value>${hadoop.tmp.dir}/yarn-nm-recovery</value>
      </property>
    
      <!--Map Reduce configuration-->
      <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
    
      <property>
        <name>mapreduce.job.jar</name>
        <value/>
      </property>
    
      <property>
        <name>mapreduce.job.hdfs-servers</name>
        <value>${fs.defaultFS}</value>
      </property>
    
      <!-- WebAppProxy Configuration-->
      
      <property>
        <description>The kerberos principal for the proxy, if the proxy is not
        running as part of the RM.</description>
        <name>yarn.web-proxy.principal</name>
        <value/>
      </property>
      
      <property>
        <description>Keytab for WebAppProxy, if the proxy is not running as part of 
        the RM.</description>
        <name>yarn.web-proxy.keytab</name>
      </property>
      
      <property>
        <description>The address for the web proxy as HOST:PORT, if this is not
         given then the proxy will run as part of the RM</description>
         <name>yarn.web-proxy.address</name>
         <value>nna:8090</value>
      </property>
    
      <!-- Applications' Configuration-->
      
      <property>
        <description>
          CLASSPATH for YARN applications. A comma-separated list
          of CLASSPATH entries. When this value is empty, the following default
          CLASSPATH for YARN applications would be used. 
          For Linux:
          $HADOOP_CONF_DIR,
          $HADOOP_COMMON_HOME/share/hadoop/common/*,
          $HADOOP_COMMON_HOME/share/hadoop/common/lib/*,
          $HADOOP_HDFS_HOME/share/hadoop/hdfs/*,
          $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*,
          $HADOOP_YARN_HOME/share/hadoop/yarn/*,
          $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*
          For Windows:
          %HADOOP_CONF_DIR%,
          %HADOOP_COMMON_HOME%/share/hadoop/common/*,
          %HADOOP_COMMON_HOME%/share/hadoop/common/lib/*,
          %HADOOP_HDFS_HOME%/share/hadoop/hdfs/*,
          %HADOOP_HDFS_HOME%/share/hadoop/hdfs/lib/*,
          %HADOOP_YARN_HOME%/share/hadoop/yarn/*,
          %HADOOP_YARN_HOME%/share/hadoop/yarn/lib/*
        </description>
        <name>yarn.application.classpath</name>
        <value>/data/soft/new/hadoop-config,/data/soft/new/hadoop/share/hadoop/common/*,/data/soft/new/hadoop/share/hadoop/common/lib/*,/data/soft/new/hadoop/share/hadoop/hdfs/*,/data/soft/new/hadoop/share/hadoop/hdfs/lib/*,/data/soft/new/hadoop/share/hadoop/yarn/*,/data/soft/new/hadoop/share/hadoop/yarn/lib/*,/data/soft/new/hadoop/share/hadoop/mapreduce/*,/data/soft/new/hadoop/share/hadoop/mapreduce/lib/*</value>
      </property>
    
      <!-- Timeline Service's Configuration-->
    
      <property>
        <description>Indicate to clients whether timeline service is enabled or not.
        If enabled, clients will put entities and events to the timeline server.
        </description>
        <name>yarn.timeline-service.enabled</name>
        <value>false</value>
      </property>
    
      <property>
        <description>The hostname of the timeline service web application.</description>
        <name>yarn.timeline-service.hostname</name>
        <value>0.0.0.0</value>
      </property>
    
      <property>
        <description>This is default address for the timeline server to start the
        RPC server.</description>
        <name>yarn.timeline-service.address</name>
        <value>${yarn.timeline-service.hostname}:10200</value>
      </property>
    
      <property>
        <description>The http address of the timeline service web application.</description>
        <name>yarn.timeline-service.webapp.address</name>
        <value>${yarn.timeline-service.hostname}:8188</value>
      </property>
    
      <property>
        <description>The https address of the timeline service web application.</description>
        <name>yarn.timeline-service.webapp.https.address</name>
        <value>${yarn.timeline-service.hostname}:8190</value>
      </property>
    
      <property>
        <description>
          The actual address the server will bind to. If this optional address is
          set, the RPC and webapp servers will bind to this address and the port specified in
          yarn.timeline-service.address and yarn.timeline-service.webapp.address, respectively.
          This is most useful for making the service listen to all interfaces by setting to
          0.0.0.0.
        </description>
        <name>yarn.timeline-service.bind-host</name>
        <value></value>
      </property>
    
      <property>
        <description>Store class name for timeline store.</description>
        <name>yarn.timeline-service.store-class</name>
        <value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value>
      </property>
    
      <property>
        <description>Enable age off of timeline store data.</description>
        <name>yarn.timeline-service.ttl-enable</name>
        <value>true</value>
      </property>
    
      <property>
        <description>Time to live for timeline store data in milliseconds.</description>
        <name>yarn.timeline-service.ttl-ms</name>
        <value>604800000</value>
      </property>
    
      <property>
        <description>Store file name for leveldb timeline store.</description>
        <name>yarn.timeline-service.leveldb-timeline-store.path</name>
        <value>${hadoop.tmp.dir}/yarn/timeline</value>
      </property>
    
      <property>
        <description>Length of time to wait between deletion cycles of leveldb timeline store in milliseconds.</description>
        <name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name>
        <value>300000</value>
      </property>
    
      <property>
        <description>Size of read cache for uncompressed blocks for leveldb timeline store in bytes.</description>
        <name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name>
        <value>104857600</value>
      </property>
    
      <property>
        <description>Size of cache for recently read entity start times for leveldb timeline store in number of entities.</description>
        <name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name>
        <value>10000</value>
      </property>
    
      <property>
        <description>Size of cache for recently written entity start times for leveldb timeline store in number of entities.</description>
        <name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name>
        <value>10000</value>
      </property>
    
      <property>
        <description>Handler thread count to serve the client RPC requests.</description>
        <name>yarn.timeline-service.handler-thread-count</name>
        <value>40</value>
      </property>
    
      <property>
        <name>yarn.timeline-service.http-authentication.type</name>
        <value>simple</value>
        <description>
          Defines authentication used for the timeline server HTTP endpoint.
          Supported values are: simple | kerberos | #AUTHENTICATION_HANDLER_CLASSNAME#
        </description>
      </property>
    
      <property>
        <name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name>
        <value>true</value>
        <description>
          Indicates if anonymous requests are allowed by the timeline server when using
          'simple' authentication.
        </description>
      </property>
    
      <property>
        <description>The Kerberos principal for the timeline server.</description>
        <name>yarn.timeline-service.principal</name>
        <value></value>
      </property>
    
      <property>
        <description>
        Default maximum number of retires for timeline servive client.
        </description>
        <name>yarn.timeline-service.client.max-retries</name>
        <value>30</value>
      </property>
    
      <property>
        <description>
        Default retry time interval for timeline servive client.
        </description>
        <name>yarn.timeline-service.client.retry-interval-ms</name>
        <value>1000</value>
      </property>
    
      <!-- Other configuration -->
      <property>
        <description>The interval that the yarn client library uses to poll the
        completion status of the asynchronous API of application client protocol.
        </description>
        <name>yarn.client.application-client-protocol.poll-interval-ms</name>
        <value>200</value>
      </property>
    
      <property>
        <description>RSS usage of a process computed via 
        /proc/pid/stat is not very accurate as it includes shared pages of a
        process. /proc/pid/smaps provides useful information like
        Private_Dirty, Private_Clean, Shared_Dirty, Shared_Clean which can be used
        for computing more accurate RSS. When this flag is enabled, RSS is computed
        as Min(Shared_Dirty, Pss) + Private_Clean + Private_Dirty. It excludes
        read-only shared mappings in RSS computation.  
        </description>
        <name>yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled</name>
        <value>false</value>
      </property>
    
      <!-- YARN registry -->
    
      <property>
        <description>
          Is the registry enabled: does the RM start it up,
          create the user and system paths, and purge
          service records when containers, application attempts
          and applications complete
        </description>
        <name>hadoop.registry.rm.enabled</name>
        <value>false</value>
      </property>
    
      <property>
        <description>
        </description>
        <name>hadoop.registry.zk.root</name>
        <value>/registry</value>
      </property>
    
      <property>
        <description>
          Zookeeper session timeout in milliseconds
        </description>
        <name>hadoop.registry.zk.session.timeout.ms</name>
        <value>60000</value>
      </property>
    
      <property>
        <description>
          Zookeeper session timeout in milliseconds
        </description>
        <name>hadoop.registry.zk.connection.timeout.ms</name>
        <value>15000</value>
      </property>
    
      <property>
        <description>
          Zookeeper connection retry count before failing
        </description>
        <name>hadoop.registry.zk.retry.times</name>
        <value>5</value>
      </property>
    
      <property>
        <description>
        </description>
        <name>hadoop.registry.zk.retry.interval.ms</name>
        <value>1000</value>
      </property>
    
      <property>
        <description>
          Zookeeper retry limit in milliseconds, during
          exponential backoff: {@value}
    
          This places a limit even
          if the retry times and interval limit, combined
          with the backoff policy, result in a long retry
          period
        </description>
        <name>hadoop.registry.zk.retry.ceiling.ms</name>
        <value>60000</value>
      </property>
    
      <property>
        <description>
          List of hostname:port pairs defining the
          zookeeper quorum binding for the registry
        </description>
        <name>hadoop.registry.zk.quorum</name>
        <value>dn1:2181,dn2:2181,dn3:2181</value>
      </property>
    
      <property>
        <description>
          Key to set if the registry is secure. Turning it on
          changes the permissions policy from "open access"
          to restrictions on kerberos with the option of
          a user adding one or more auth key pairs down their
          own tree.
        </description>
        <name>hadoop.registry.secure</name>
        <value>false</value>
      </property>
    
      <property>
        <description>
          A comma separated list of Zookeeper ACL identifiers with
          system access to the registry in a secure cluster.
    
          These are given full access to all entries.
    
          If there is an "@" at the end of a SASL entry it
          instructs the registry client to append the default kerberos domain.
        </description>
        <name>hadoop.registry.system.acls</name>
        <value>sasl:yarn@, sasl:mapred@, sasl:mapred@hdfs@</value>
      </property>
    
      <property>
        <description>
          The kerberos realm: used to set the realm of
          system principals which do not declare their realm,
          and any other accounts that need the value.
    
          If empty, the default realm of the running process
          is used.
    
          If neither are known and the realm is needed, then the registry
          service/client will fail.
        </description>
        <name>hadoop.registry.kerberos.realm</name>
        <value></value>
      </property>
    
      <property>
        <description>
          Key to define the JAAS context. Used in secure
          mode
        </description>
        <name>hadoop.registry.jaas.context</name>
        <value>Client</value>
      </property>
    
      <property>
        <description>Defines how often NMs wake up to upload log files.
        The default value is -1. By default, the logs will be uploaded when
        the application is finished. By setting this configure, logs can be uploaded
        periodically when the application is running. The minimum rolling-interval-seconds
        can be set is 3600.
        </description>
        <name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name>
        <value>-1</value>
      </property>
    
    <!-- 加载队列配置文件 -->
    <property>
            <name>yarn.scheduler.fair.allocation.file</name>
            <value>/data/soft/new/hadoop-config/fair-scheduler.xml</value>
    </property>
    
    <property>
            <name>yarn.scheduler.increment-allocation-mb</name>
            <value>256</value>
    </property>
    
    <property>
            <name>yarn.scheduler.fair.preemption</name>
            <value>true</value>
    </property>
    
    <property>
            <name>yarn.scheduler.fair.allow-undeclared-pools</name>
            <value>false</value>
    </property>
    <property>
            <name>yarn.scheduler.fair.user-al-default-queue</name>
            <value>false</value>
    </property>
    </configuration>
    View Code

    5.fair-scheduler.xml

    <?xml version="1.0"?>
    <allocations>
        <queue name="root">
            <aclSubmitApps>hadoop</aclSubmitApps>
            <aclAdministerApps>hadoop</aclAdministerApps>
            <!-- 默认队列设置CPU和内存 -->
            <queue name="default">
                <maxRunningApps>10</maxRunningApps>
                <minResources>1024mb,1vcores</minResources>
                <maxResources>6144mb,6vcores</maxResources>
                <schedulingPolicy>fair</schedulingPolicy>
                <weight>1.0</weight>
                <aclSubmitApps>hadoop</aclSubmitApps>
                <aclAdministerApps>hadoop</aclAdministerApps>
            </queue>
            <!-- 队列queue_1024_01设置CPU和内存 -->
            <queue name="queue_1024_01">
                <maxRunningApps>10</maxRunningApps>
                <minResources>1024mb,1vcores</minResources>
                <maxResources>4096mb,3vcores</maxResources>
                <schedulingPolicy>fair</schedulingPolicy>
                <weight>1.0</weight>
                <aclSubmitApps>hadoop</aclSubmitApps>
                <aclAdministerApps>hadoop</aclAdministerApps>
            </queue>
        </queue>
        
        <fairSharePreemptionTimeout>600000</fairSharePreemptionTimeout>
        <defaultMinSharePreemptionTimeout>600000</defaultMinSharePreemptionTimeout>
    </allocations>

    这里需要注意是,在Hadoop2中存储DataNode节点地址的是slaves文件,在Hadoop3中替换为workers文件了。

    3.启动Hadoop3

    首次启动Hadoop3时,需要注册ZK和格式化NameNode,具体操作如下:

    # 1.启动JournalNode进程(QJM使用)
    hadoop-daemon.sh start journalnode
    
    # 2.注册ZK
    hdfs zkfc -formatZK
    
    # 3.格式化NameNode
    hdfs namenode -format
    
    # 4.启动NameNode
    hadoop-daemon.sh start namenode
    
    # 5.在Standby节点同步元数据
    hdfs namenode -bootstrapStandby
    
    # 6.启动HDFS和YARN
    start-dfs.sh
    start-yarn.sh
    
    # 7.启动historyserver(在Hadoop3中proxyserver已集成到YARN的启动脚本中了)
    mr-jobhistory-daemon.sh  start historyserver

    4.提交测试用例

    在$HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar中,提供了示例程序,验证WordCount算法,操作如下:

    # 1.准备数据源
    vi /tmp/wc
    
    a a
    c s
    
    # 2.上传到HDFS
    hdfs dfs -put /tmp/wc /tmp
    
    # 3.提交WordCount任务
    hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /tmp/wc /tmp/res
    
    # 4.查看统计结果
    hdfs dfs -cat /tmp/res/part-r-00000

    5.预览

    5.1 HDFS页面结果

    5.2 YARN页面结果

    5.3 队列页面结果

    6.总结

    在编译Hadoop-3.2.0源代码的时候,需要注意Maven远程仓库地址的配置,通常会由于Maven远程仓库地址不可用,导致依赖JAR下载失败,从而无法正常编译。在Maven的settings.xml文件中配置可用的Maven远程地址即可。

    7.结束语

    这篇博客就和大家分享到这里,如果大家在研究学习的过程当中有什么问题,可以加群进行讨论或发送邮件给我,我会尽我所能为您解答,与君共勉!

    另外,博主出书了《Kafka并不难学》和《Hadoop大数据挖掘从入门到进阶实战》,喜欢的朋友或同学, 可以在公告栏那里点击购买链接购买博主的书进行学习,在此感谢大家的支持。关注下面公众号,根据提示,可免费获取书籍的教学视频。

  • 相关阅读:
    Java解析XML(一)、SAX
    Java注解
    NTKO OFFICE文档控件为何不能自动装载?
    Java解析XML(二)、DOM
    JAVA反射机制
    如何手工卸载和安装NTKO OFFICE文档控件
    使用内省的方式操作JavaBean
    JDK自带的native2ascii转码工具使用详解
    HTTP协议详解
    跳过编译器,获取泛型参数的实际类型
  • 原文地址:https://www.cnblogs.com/smartloli/p/10753998.html
Copyright © 2020-2023  润新知