补充:下载Hadoop源码
一:YARN框架:进行资源调度
(一)YARN框架流程图
注意:yarn框架只做资源的管理,如果要运行一个程序,则会为该程序分配节点、内存、cpu等资源,至于该程序如何运行,yarn框架不进行管理。故也不会知道mapreduce的运行逻辑 。同样因为这样的松耦合,yarn框架的使用范围更加广泛,可以兼容其他运行程序。
补充:MapReduce框架知道我们写的map-reduce程序的运行逻辑。我们写的map-reduce中并没有管理层的任务运行分配逻辑,该逻辑被封装在MapReduce框架里面,被封装为MRAppMaster类,该类用于管理整个map-reduce的运行逻辑。(map-reduce程序的管理者)
重点:步骤6中,由NodeManager主动发送心跳包,去ResourceManager检测是否有job任务,只当该NodeManager(即DataNode)有相关资源时,才会领取该job
MRAppMaster由YARN框架启动(动态启动,随机选取)
(二)补充:RunJar上传到HDFS中的资源:
1.其中job.jar就是我们生成的wc.jar包。
2.job.split中数据如下:
SPL/org.apache.hadoop.mapreduce.lib.input.FileSplit(hdfs://hadoopH1:9000/wc/input/wcdata.txt;
含有输入数据所在HDFS路径及文件名。
3.job.splitmetainfo数据如下:
META-SPhadoopH1;
为节点主机名。
4.job.xml含有集群的各种配置信息
<?xml version="1.0" encoding="UTF-8" standalone="no"?><configuration> <property><name>dfs.journalnode.rpc-address</name><value>0.0.0.0:8485</value><source>hdfs-default.xml</source></property> <property><name>yarn.ipc.rpc.class</name><value>org.apache.hadoop.yarn.ipc.HadoopYarnProtoRPC</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.maxtaskfailures.per.tracker</name><value>3</value><source>mapred-default.xml</source></property> <property><name>yarn.client.max-cached-nodemanagers-proxies</name><value>0</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.speculative.retry-after-speculate</name><value>15000</value><source>mapred-default.xml</source></property> <property><name>ha.health-monitor.connect-retry-interval.ms</name><value>1000</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.work-preserving-recovery.enabled</name><value>true</value><source>yarn-default.xml</source></property> <property><name>dfs.client.mmap.cache.size</name><value>256</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.reduce.markreset.buffer.percent</name><value>0.0</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.data.dir</name><value>/home/hadoop/App/hadoop-2.7.1/data/data</value><source>hdfs-site.xml</source></property> <property><name>mapreduce.jobhistory.max-age-ms</name><value>604800000</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.lazypersist.file.scrub.interval.sec</name><value>300</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.ubertask.enable</name><value>false</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.delegation.token.renew-interval</name><value>86400000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.input.fileinputformat.numinputfiles</name><value>1</value><source>programatically</source></property> <property><name>yarn.nodemanager.log-aggregation.compression-type</name><value>none</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.replication.considerLoad</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.complete.cancel.delegation.tokens</name><value>true</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobhistory.datestring.cache.size</name><value>200000</value><source>mapred-default.xml</source></property> <property><name>hadoop.security.kms.client.authentication.retry-count</name><value>1</value><source>core-default.xml</source></property> <property><name>hadoop.ssl.enabled.protocols</name><value>TLSv1</value><source>core-default.xml</source></property> <property><name>dfs.namenode.retrycache.heap.percent</name><value>0.03f</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.top.window.num.buckets</name><value>10</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.scheduler.address</name><value>${yarn.resourcemanager.hostname}:8030</value><source>yarn-default.xml</source></property> <property><name>fs.s3a.fast.buffer.size</name><value>1048576</value><source>core-default.xml</source></property> <property><name>dfs.client.file-block-storage-locations.num-threads</name><value>10</value><source>hdfs-default.xml</source></property> <property><name>dfs.datanode.balance.bandwidthPerSec</name><value>1048576</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.proxy-user-privileges.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.decommission.max.concurrent.tracked.nodes</name><value>100</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.fetch.retry.enabled</name><value>${yarn.nodemanager.recovery.enabled}</value><source>mapred-default.xml</source></property> <property><name>io.mapfile.bloom.error.rate</name><value>0.005</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.resourcemanager.minimum.version</name><value>NONE</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.nodemanagers.heartbeat-interval-ms</name><value>1000</value><source>yarn-default.xml</source></property> <property><name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name><value>${dfs.web.authentication.kerberos.principal}</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.delete.debug-delay-sec</name><value>0</value><source>yarn-default.xml</source></property> <property><name>dfs.client.read.shortcircuit.streams.cache.size</name><value>256</value><source>hdfs-default.xml</source></property> <property><name>dfs.image.transfer.bandwidthPerSec</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>yarn.scheduler.maximum-allocation-vcores</name><value>32</value><source>yarn-default.xml</source></property> <property><name>yarn.timeline-service.address</name><value>${yarn.timeline-service.hostname}:10200</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.disk-health-checker.min-free-space-per-disk-mb</name><value>0</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.hdfs-servers</name><value>${fs.defaultFS}</value><source>yarn-default.xml</source></property> <property><name>mapreduce.task.profile.reduce.params</name><value>${mapreduce.task.profile.params}</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.fs-limits.min-block-size</name><value>1048576</value><source>hdfs-default.xml</source></property> <property><name>ftp.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property> <property><name>dfs.client.use.legacy.blockreader.local</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>dfs.short.circuit.shared.memory.watcher.interrupt.check.ms</name><value>60000</value><source>hdfs-default.xml</source></property> <property><name>dfs.datanode.directoryscan.threads</name><value>1</value><source>hdfs-default.xml</source></property> <property><name>fs.s3a.buffer.dir</name><value>${hadoop.tmp.dir}/s3a</value><source>core-default.xml</source></property> <property><name>yarn.client.application-client-protocol.poll-interval-ms</name><value>200</value><source>yarn-default.xml</source></property> <property><name>yarn.timeline-service.leveldb-timeline-store.path</name><value>${hadoop.tmp.dir}/yarn/timeline</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.split.metainfo.maxsize</name><value>10000000</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.edits.noeditlogchannelflush</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>s3native.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property> <property><name>yarn.client.failover-retries-on-socket-timeouts</name><value>0</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.startup.delay.block.deletion.sec</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>dfs.webhdfs.user.provider.user.pattern</name><value>^[A-Za-z_][A-Za-z0-9._-]*[$]?$</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.tasktracker.tasks.sleeptimebeforesigkill</name><value>5000</value><source>mapred-default.xml</source></property> <property><name>yarn.timeline-service.client.retry-interval-ms</name><value>1000</value><source>yarn-default.xml</source></property> <property><name>dfs.encrypt.data.transfer.cipher.key.bitlength</name><value>128</value><source>hdfs-default.xml</source></property> <property><name>hadoop.http.authentication.type</name><value>simple</value><source>core-default.xml</source></property> <property><name>dfs.namenode.path.based.cache.refresh.interval.ms</name><value>30000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.local.clientfactory.class.name</name><value>org.apache.hadoop.mapred.LocalClientFactory</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.cache.revocation.timeout.ms</name><value>900000</value><source>hdfs-default.xml</source></property> <property><name>ipc.client.connection.maxidletime</name><value>10000</value><source>core-default.xml</source></property> <property><name>ipc.server.max.connections</name><value>0</value><source>core-default.xml</source></property> <property><name>mapreduce.jobhistory.recovery.store.leveldb.path</name><value>${hadoop.tmp.dir}/mapred/history/recoverystore</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.safemode.threshold-pct</name><value>0.999f</value><source>hdfs-default.xml</source></property> <property><name>fs.s3a.multipart.purge.age</name><value>86400</value><source>core-default.xml</source></property> <property><name>dfs.namenode.num.checkpoints.retained</name><value>2</value><source>hdfs-default.xml</source></property> <property><name>yarn.timeline-service.client.best-effort</name><value>false</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.ubertask.maxmaps</name><value>9</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.stale.datanode.interval</name><value>30000</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage</name><value>90.0</value><source>yarn-default.xml</source></property> <property><name>mapreduce.tasktracker.http.address</name><value>0.0.0.0:50060</value><source>mapred-default.xml</source></property> <property><name>mapreduce.ifile.readahead.bytes</name><value>4194304</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobhistory.admin.address</name><value>0.0.0.0:10033</value><source>mapred-default.xml</source></property> <property><name>yarn.sharedcache.uploader.server.thread-count</name><value>50</value><source>yarn-default.xml</source></property> <property><name>s3.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property> <property><name>dfs.block.access.token.lifetime</name><value>600</value><source>hdfs-default.xml</source></property> <property><name>yarn.app.mapreduce.am.resource.cpu-vcores</name><value>1</value><source>mapred-default.xml</source></property> <property><name>mapreduce.input.lineinputformat.linespermap</name><value>1</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.num.extra.edits.retained</name><value>1000000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.input.buffer.percent</name><value>0.70</value><source>mapred-default.xml</source></property> <property><name>hadoop.http.staticuser.user</name><value>dr.who</value><source>core-default.xml</source></property> <property><name>mapreduce.reduce.maxattempts</name><value>4</value><source>mapred-default.xml</source></property> <property><name>hadoop.security.group.mapping.ldap.search.filter.user</name><value>(&(objectClass=user)(sAMAccountName={0}))</value><source>core-default.xml</source></property> <property><name>mapreduce.jobhistory.admin.acl</name><value>*</value><source>mapred-default.xml</source></property> <property><name>dfs.client.context</name><value>default</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.map.maxattempts</name><value>4</value><source>mapred-default.xml</source></property> <property><name>yarn.resourcemanager.zk-retry-interval-ms</name><value>1000</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobhistory.cleaner.interval-ms</name><value>86400000</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.drop.cache.behind.reads</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>dfs.permissions.superusergroup</name><value>supergroup</value><source>hdfs-default.xml</source></property> <property><name>fs.s3n.block.size</name><value>67108864</value><source>core-default.xml</source></property> <property><name>hadoop.registry.system.acls</name><value>sasl:yarn@, sasl:mapred@, sasl:hdfs@</value><source>core-default.xml</source></property> <property><name>dfs.namenode.list.cache.pools.num.responses</name><value>100</value><source>hdfs-default.xml</source></property> <property><name>dfs.datanode.slow.io.warning.threshold.ms</name><value>300</value><source>hdfs-default.xml</source></property> <property><name>yarn.sharedcache.store.in-memory.check-period-mins</name><value>720</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.fs-limits.max-blocks-per-file</name><value>1048576</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.vmem-check-enabled</name><value>true</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.map.class</name><value>cn.hadoop.mr.wc.WCMapper</value><source>programatically</source></property> <property><name>hadoop.security.authentication</name><value>simple</value><source>core-default.xml</source></property> <property><name>mapreduce.reduce.cpu.vcores</name><value>1</value><source>mapred-default.xml</source></property> <property><name>net.topology.node.switch.mapping.impl</name><value>org.apache.hadoop.net.ScriptBasedMapping</value><source>core-default.xml</source></property> <property><name>fs.s3.sleepTimeSeconds</name><value>10</value><source>core-default.xml</source></property> <property><name>yarn.timeline-service.ttl-ms</name><value>604800000</value><source>yarn-default.xml</source></property> <property><name>yarn.sharedcache.root-dir</name><value>/sharedcache</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.keytab</name><value>/etc/krb5.keytab</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.container.liveness-monitor.interval-ms</name><value>600000</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobtracker.heartbeats.in.second</name><value>100</value><source>mapred-default.xml</source></property> <property><name>yarn.app.mapreduce.am.scheduler.heartbeat.interval-ms</name><value>1000</value><source>mapred-default.xml</source></property> <property><name>yarn.app.mapreduce.client-am.ipc.max-retries-on-timeouts</name><value>3</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.linux-container-executor.cgroups.hierarchy</name><value>/hadoop-yarn</value><source>yarn-default.xml</source></property> <property><name>s3.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property> <property><name>hadoop.ssl.require.client.cert</name><value>false</value><source>core-default.xml</source></property> <property><name>dfs.journalnode.http-address</name><value>0.0.0.0:8480</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.output.fileoutputformat.compress</name><value>false</value><source>mapred-default.xml</source></property> <property><name>dfs.ha.automatic-failover.enabled</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.webapp.delegation-token-auth-filter.enabled</name><value>true</value><source>yarn-default.xml</source></property> <property><name>mapreduce.shuffle.max.threads</name><value>0</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.invalidate.work.pct.per.iteration</name><value>0.32f</value><source>hdfs-default.xml</source></property> <property><name>s3native.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property> <property><name>dfs.client.block.write.replace-datanode-on-failure.policy</name><value>DEFAULT</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.client.submit.file.replication</name><value>10</value><source>mapred-default.xml</source></property> <property><name>yarn.app.mapreduce.am.job.committer.commit-window</name><value>10000</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.sleep-delay-before-sigkill.ms</name><value>250</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.env-whitelist</name><value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,HADOOP_YARN_HOME</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.acls.enabled</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.secondary.http-address</name><value>0.0.0.0:50090</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.map.speculative</name><value>true</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.speculative.slowtaskthreshold</name><value>1.0</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.linux-container-executor.cgroups.mount</name><value>false</value><source>yarn-default.xml</source></property> <property><name>mapreduce.tasktracker.http.threads</name><value>40</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobhistory.http.policy</name><value>HTTP_ONLY</value><source>mapred-default.xml</source></property> <property><name>fs.s3a.paging.maximum</name><value>5000</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.nodemanager-connect-retries</name><value>10</value><source>yarn-default.xml</source></property> <property><name>fs.s3.buffer.dir</name><value>${hadoop.tmp.dir}/s3</value><source>core-default.xml</source></property> <property><name>io.native.lib.available</name><value>true</value><source>core-default.xml</source></property> <property><name>dfs.namenode.heartbeat.recheck-interval</name><value>300000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobhistory.done-dir</name><value>${yarn.app.mapreduce.am.staging-dir}/history/done</value><source>mapred-default.xml</source></property> <property><name>hadoop.registry.zk.retry.interval.ms</name><value>1000</value><source>core-default.xml</source></property> <property><name>fs.s3a.threads.core</name><value>15</value><source>core-default.xml</source></property> <property><name>dfs.namenode.avoid.write.stale.datanode</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.checkpoint.txns</name><value>1000000</value><source>hdfs-default.xml</source></property> <property><name>hadoop.ssl.hostname.verifier</name><value>DEFAULT</value><source>core-default.xml</source></property> <property><name>mapreduce.task.timeout</name><value>600000</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.jar</name><value>/tmp/hadoop-yarn/staging/hadoop/.staging/job_1582165983362_0009/job.jar</value><source>programatically</source></property> <property><name>yarn.nodemanager.disk-health-checker.interval-ms</name><value>120000</value><source>yarn-default.xml</source></property> <property><name>dfs.journalnode.https-address</name><value>0.0.0.0:8481</value><source>hdfs-default.xml</source></property> <property><name>hadoop.security.groups.cache.secs</name><value>300</value><source>core-default.xml</source></property> <property><name>mapreduce.input.fileinputformat.split.minsize</name><value>0</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.sync.behind.writes</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.shuffle.port</name><value>13562</value><source>mapred-default.xml</source></property> <property><name>hadoop.rpc.protection</name><value>authentication</value><source>core-default.xml</source></property> <property><name>dfs.client.https.keystore.resource</name><value>ssl-client.xml</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.list.encryption.zones.num.responses</name><value>100</value><source>hdfs-default.xml</source></property> <property><name>yarn.client.failover-proxy-provider</name><value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value><source>yarn-default.xml</source></property> <property><name>yarn.timeline-service.recovery.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobtracker.retiredjobs.cache.size</name><value>1000</value><source>mapred-default.xml</source></property> <property><name>dfs.ha.tail-edits.period</name><value>60</value><source>hdfs-default.xml</source></property> <property><name>dfs.datanode.drop.cache.behind.writes</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>fs.s3.maxRetries</name><value>4</value><source>core-default.xml</source></property> <property><name>mapreduce.jobtracker.address</name><value>local</value><source>mapred-default.xml</source></property> <property><name>hadoop.http.authentication.kerberos.principal</name><value>HTTP/_HOST@LOCALHOST</value><source>core-default.xml</source></property> <property><name>nfs.server.port</name><value>2049</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.webapp.address</name><value>${yarn.resourcemanager.hostname}:8088</value><source>yarn-default.xml</source></property> <property><name>mapred.mapper.new-api</name><value>true</value><source>programatically</source></property> <property><name>mapreduce.task.profile.reduces</name><value>0-2</value><source>mapred-default.xml</source></property> <property><name>yarn.timeline-service.client.max-retries</name><value>30</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.am.max-attempts</name><value>2</value><source>yarn-default.xml</source></property> <property><name>nfs.dump.dir</name><value>/tmp/.hdfs-nfs</value><source>hdfs-default.xml</source></property> <property><name>dfs.bytes-per-checksum</name><value>512</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.end-notification.max.retry.interval</name><value>5000</value><source>mapred-default.xml</source></property> <property><name>ipc.client.connect.retry.interval</name><value>1000</value><source>core-default.xml</source></property> <property><name>fs.s3a.multipart.size</name><value>104857600</value><source>core-default.xml</source></property> <property><name>yarn.app.mapreduce.am.command-opts</name><value>-Xmx1024m</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.process-kill-wait.ms</name><value>2000</value><source>yarn-default.xml</source></property> <property><name>yarn.timeline-service.state-store-class</name><value>org.apache.hadoop.yarn.server.timeline.recovery.LeveldbTimelineStateStore</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.safemode.min.datanodes</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.speculative.minimum-allowed-tasks</name><value>10</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.write.stale.datanode.ratio</name><value>0.5f</value><source>hdfs-default.xml</source></property> <property><name>hadoop.jetty.logs.serve.aliases</name><value>true</value><source>core-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.fetch.retry.timeout-ms</name><value>30000</value><source>mapred-default.xml</source></property> <property><name>fs.du.interval</name><value>600000</value><source>core-default.xml</source></property> <property><name>mapreduce.tasktracker.dns.nameserver</name><value>default</value><source>mapred-default.xml</source></property> <property><name>yarn.sharedcache.admin.address</name><value>0.0.0.0:8047</value><source>yarn-default.xml</source></property> <property><name>hadoop.security.random.device.file.path</name><value>/dev/urandom</value><source>core-default.xml</source></property> <property><name>mapreduce.task.merge.progress.records</name><value>10000</value><source>mapred-default.xml</source></property> <property><name>dfs.webhdfs.enabled</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>hadoop.registry.secure</name><value>false</value><source>core-default.xml</source></property> <property><name>hadoop.ssl.client.conf</name><value>ssl-client.xml</value><source>core-default.xml</source></property> <property><name>mapreduce.job.counters.max</name><value>120</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.localizer.fetch.thread-count</name><value>4</value><source>yarn-default.xml</source></property> <property><name>io.mapfile.bloom.size</name><value>1048576</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.localizer.client.thread-count</name><value>5</value><source>yarn-default.xml</source></property> <property><name>fs.automatic.close</name><value>true</value><source>core-default.xml</source></property> <property><name>mapreduce.task.profile</name><value>false</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.edit.log.autoroll.multiplier.threshold</name><value>2.0</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.task.combine.progress.records</name><value>10000</value><source>mapred-default.xml</source></property> <property><name>mapreduce.shuffle.ssl.file.buffer.size</name><value>65536</value><source>mapred-default.xml</source></property> <property><name>yarn.app.mapreduce.client.job.max-retries</name><value>0</value><source>mapred-default.xml</source></property> <property><name>fs.swift.impl</name><value>org.apache.hadoop.fs.swift.snative.SwiftNativeFileSystem</value><source>core-default.xml</source></property> <property><name>yarn.app.mapreduce.am.container.log.backups</name><value>0</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-preference-fraction</name><value>0.75f</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.backup.address</name><value>0.0.0.0:50100</value><source>hdfs-default.xml</source></property> <property><name>dfs.client.https.need-auth</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.app-submission.cross-platform</name><value>false</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.name</name><value>wc.jar</value><source>programatically</source></property> <property><name>yarn.timeline-service.ttl-enable</name><value>true</value><source>yarn-default.xml</source></property> <property><name>dfs.user.home.dir.prefix</name><value>/user</value><source>hdfs-default.xml</source></property> <property><name>mapred.reducer.new-api</name><value>true</value><source>programatically</source></property> <property><name>yarn.nodemanager.container-monitor.procfs-tree.smaps-based-rss.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.keytab</name><value>/etc/krb5.keytab</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.xattrs.enabled</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>dfs.client.write.exclude.nodes.cache.expiry.interval.millis</name><value>600000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobtracker.restart.recover</name><value>false</value><source>mapred-default.xml</source></property> <property><name>yarn.sharedcache.client-server.address</name><value>0.0.0.0:8045</value><source>yarn-default.xml</source></property> <property><name>mapreduce.map.skip.proc.count.autoincr</name><value>true</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.datanode.registration.ip-hostname-check</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>dfs.image.transfer.chunksize</name><value>65536</value><source>hdfs-default.xml</source></property> <property><name>hadoop.security.instrumentation.requires.admin</name><value>false</value><source>core-default.xml</source></property> <property><name>io.compression.codec.bzip2.library</name><value>system-native</value><source>core-default.xml</source></property> <property><name>dfs.namenode.name.dir.restore</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.resource.checked.volumes.minimum</name><value>1</value><source>hdfs-default.xml</source></property> <property><name>hadoop.ssl.keystores.factory.class</name><value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value><source>core-default.xml</source></property> <property><name>dfs.namenode.list.cache.directives.num.responses</name><value>100</value><source>hdfs-default.xml</source></property> <property><name>fs.ftp.host</name><value>0.0.0.0</value><source>core-default.xml</source></property> <property><name>yarn.app.mapreduce.am.containerlauncher.threadpool-initial-size</name><value>10</value><source>mapred-default.xml</source></property> <property><name>s3.blocksize</name><value>67108864</value><source>core-default.xml</source></property> <property><name>s3native.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property> <property><name>mapreduce.jobtracker.taskscheduler</name><value>org.apache.hadoop.mapred.JobQueueTaskScheduler</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.dns.nameserver</name><value>default</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.resource.memory-mb</name><value>8192</value><source>yarn-default.xml</source></property> <property><name>mapreduce.task.userlog.limit.kb</name><value>0</value><source>mapred-default.xml</source></property> <property><name>hadoop.security.crypto.codec.classes.aes.ctr.nopadding</name><value>org.apache.hadoop.crypto.OpensslAesCtrCryptoCodec,org.apache.hadoop.crypto.JceAesCtrCryptoCodec</value><source>core-default.xml</source></property> <property><name>mapreduce.reduce.speculative</name><value>true</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.container-monitor.interval-ms</name><value>3000</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.submithostname</name><value>hadoopH1</value><source>programatically</source></property> <property><name>dfs.replication.max</name><value>512</value><source>hdfs-default.xml</source></property> <property><name>dfs.replication</name><value>1</value><source>hdfs-site.xml</source></property> <property><name>yarn.client.failover-retries</name><value>0</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.resource.cpu-vcores</name><value>8</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobhistory.recovery.enable</name><value>false</value><source>mapred-default.xml</source></property> <property><name>nfs.exports.allowed.hosts</name><value>* rw</value><source>core-default.xml</source></property> <property><name>yarn.sharedcache.checksum.algo.impl</name><value>org.apache.hadoop.yarn.sharedcache.ChecksumSHA256Impl</value><source>yarn-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.memory.limit.percent</name><value>0.25</value><source>mapred-default.xml</source></property> <property><name>file.replication</name><value>1</value><source>core-default.xml</source></property> <property><name>mapreduce.job.reduce.shuffle.consumer.plugin.class</name><value>org.apache.hadoop.mapreduce.task.reduce.Shuffle</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.jvm.numtasks</name><value>1</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.fsdatasetcache.max.threads.per.volume</name><value>4</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.am.max-attempts</name><value>2</value><source>mapred-default.xml</source></property> <property><name>mapreduce.shuffle.connection-keep-alive.timeout</name><value>5</value><source>mapred-default.xml</source></property> <property><name>hadoop.fuse.timer.period</name><value>5</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.reduces</name><value>1</value><source>mapred-default.xml</source></property> <property><name>yarn.app.mapreduce.am.job.task.listener.thread-count</name><value>30</value><source>mapred-default.xml</source></property> <property><name>yarn.resourcemanager.store.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.recovery.FileSystemRMStateStore</value><source>yarn-default.xml</source></property> <property><name>s3native.replication</name><value>3</value><source>core-default.xml</source></property> <property><name>mapreduce.tasktracker.reduce.tasks.maximum</name><value>2</value><source>mapred-default.xml</source></property> <property><name>fs.permissions.umask-mode</name><value>022</value><source>core-default.xml</source></property> <property><name>mapreduce.cluster.local.dir</name><value>${hadoop.tmp.dir}/mapred/local</value><source>mapred-default.xml</source></property> <property><name>mapreduce.client.output.filter</name><value>FAILED</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.pmem-check-enabled</name><value>true</value><source>yarn-default.xml</source></property> <property><name>mapred.queue.default.acl-administer-jobs</name><value>*</value><source>programatically</source></property> <property><name>dfs.client.failover.connection.retries.on.timeouts</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobtracker.instrumentation</name><value>org.apache.hadoop.mapred.JobTrackerMetricsInst</value><source>mapred-default.xml</source></property> <property><name>ftp.replication</name><value>3</value><source>core-default.xml</source></property> <property><name>mapreduce.map.output.key.class</name><value>org.apache.hadoop.io.Text</value><source>programatically</source></property> <property><name>hadoop.security.group.mapping.ldap.search.attr.member</name><value>member</value><source>core-default.xml</source></property> <property><name>fs.s3a.max.total.tasks</name><value>1000</value><source>core-default.xml</source></property> <property><name>dfs.namenode.replication.work.multiplier.per.iteration</name><value>2</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.fs.state-store.num-retries</name><value>0</value><source>yarn-default.xml</source></property> <property><name>yarn.timeline-service.leveldb-state-store.path</name><value>${hadoop.tmp.dir}/yarn/timeline</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.resource-tracker.address</name><value>${yarn.resourcemanager.hostname}:8031</value><source>yarn-default.xml</source></property> <property><name>mapreduce.tasktracker.outofband.heartbeat</name><value>false</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.edits.dir</name><value>${dfs.namenode.name.dir}</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.scheduler.monitor.enable</name><value>false</value><source>yarn-default.xml</source></property> <property><name>fs.trash.checkpoint.interval</name><value>0</value><source>core-default.xml</source></property> <property><name>hadoop.registry.zk.retry.times</name><value>5</value><source>core-default.xml</source></property> <property><name>dfs.client.read.shortcircuit.streams.cache.expiry.ms</name><value>300000</value><source>hdfs-default.xml</source></property> <property><name>yarn.timeline-service.leveldb-timeline-store.start-time-write-cache-size</name><value>10000</value><source>yarn-default.xml</source></property> <property><name>s3.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property> <property><name>fs.s3a.connection.maximum</name><value>15</value><source>core-default.xml</source></property> <property><name>file.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property> <property><name>mapreduce.tasktracker.healthchecker.script.timeout</name><value>600000</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.fs-limits.max-directory-items</name><value>1048576</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.tasktracker.taskcontroller</name><value>org.apache.hadoop.mapred.DefaultTaskController</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.path.based.cache.block.map.allocation.percent</name><value>0.25</value><source>hdfs-default.xml</source></property> <property><name>fs.s3a.impl</name><value>org.apache.hadoop.fs.s3a.S3AFileSystem</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.windows-container.memory-limit.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.checkpoint.dir</name><value>file://${hadoop.tmp.dir}/dfs/namesecondary</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.remote-app-log-dir</name><value>/tmp/logs</value><source>yarn-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.retry-delay.max.ms</name><value>60000</value><source>mapred-default.xml</source></property> <property><name>io.map.index.interval</name><value>128</value><source>core-default.xml</source></property> <property><name>dfs.namenode.replication.interval</name><value>3</value><source>hdfs-default.xml</source></property> <property><name>dfs.client.block.write.replace-datanode-on-failure.enable</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>hadoop.ssl.server.conf</name><value>ssl-server.xml</value><source>core-default.xml</source></property> <property><name>hadoop.rpc.socket.factory.class.default</name><value>org.apache.hadoop.net.StandardSocketFactory</value><source>core-default.xml</source></property> <property><name>yarn.app.mapreduce.client.max-retries</name><value>3</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.address</name><value>${yarn.nodemanager.hostname}:0</value><source>yarn-default.xml</source></property> <property><name>dfs.datanode.max.transfer.threads</name><value>4096</value><source>hdfs-default.xml</source></property> <property><name>ha.failover-controller.graceful-fence.rpc-timeout.ms</name><value>5000</value><source>core-default.xml</source></property> <property><name>dfs.datanode.ipc.address</name><value>0.0.0.0:50020</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.delayed.delegation-token.removal-interval-ms</name><value>30000</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.kerberos.principal.pattern</name><value>*</value><source>hdfs-default.xml</source></property> <property><name>yarn.timeline-service.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>dfs.client.cached.conn.retry</name><value>3</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.backup.http-address</name><value>0.0.0.0:50105</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.tasktracker.report.address</name><value>127.0.0.1:0</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.checkpoint.period</name><value>3600</value><source>hdfs-default.xml</source></property> <property><name>dfs.datanode.shared.file.descriptor.paths</name><value>/dev/shm,/tmp</value><source>hdfs-default.xml</source></property> <property><name>dfs.http.policy</name><value>HTTP_ONLY</value><source>hdfs-default.xml</source></property> <property><name>hadoop.security.groups.cache.warn.after.ms</name><value>5000</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.fs.state-store.retry-interval-ms</name><value>1000</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.fs-limits.max-xattrs-per-inode</name><value>32</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.zk-acl</name><value>world:anyone:rwcda</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.support.allow.format</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>yarn.sharedcache.app-checker.class</name><value>org.apache.hadoop.yarn.server.sharedcachemanager.RemoteAppChecker</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.checkpoint.max-retries</name><value>3</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.fs.state-store.retry-policy-spec</name><value>2000, 500</value><source>yarn-default.xml</source></property> <property><name>fs.s3a.fast.upload</name><value>false</value><source>core-default.xml</source></property> <property><name>mapreduce.job.committer.setup.cleanup.needed</name><value>true</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.cache.revocation.polling.ms</name><value>500</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.end-notification.retry.attempts</name><value>0</value><source>mapred-default.xml</source></property> <property><name>yarn.resourcemanager.state-store.max-completed-applications</name><value>${yarn.resourcemanager.max-completed-applications}</value><source>yarn-default.xml</source></property> <property><name>mapreduce.map.output.compress</name><value>false</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobhistory.cleaner.enable</name><value>true</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.running.reduce.limit</name><value>0</value><source>mapred-default.xml</source></property> <property><name>io.seqfile.local.dir</name><value>${hadoop.tmp.dir}/io/local</value><source>core-default.xml</source></property> <property><name>dfs.blockreport.split.threshold</name><value>1000000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.read.timeout</name><value>180000</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.queuename</name><value>default</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.scan.period.hours</name><value>504</value><source>hdfs-default.xml</source></property> <property><name>ipc.client.connect.max.retries</name><value>10</value><source>core-default.xml</source></property> <property><name>io.seqfile.lazydecompress</name><value>true</value><source>core-default.xml</source></property> <property><name>yarn.app.mapreduce.am.staging-dir</name><value>/tmp/hadoop-yarn/staging</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.linux-container-executor.resources-handler.class</name><value>org.apache.hadoop.yarn.server.nodemanager.util.DefaultLCEResourcesHandler</value><source>yarn-default.xml</source></property> <property><name>yarn.app.mapreduce.client.job.retry-interval</name><value>2000</value><source>mapred-default.xml</source></property> <property><name>yarn.timeline-service.leveldb-timeline-store.read-cache-size</name><value>104857600</value><source>yarn-default.xml</source></property> <property><name>io.file.buffer.size</name><value>4096</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.am-rm-tokens.master-key-rolling-interval-secs</name><value>86400</value><source>yarn-default.xml</source></property> <property><name>ha.zookeeper.parent-znode</name><value>/hadoop-ha</value><source>core-default.xml</source></property> <property><name>mapreduce.tasktracker.indexcache.mb</name><value>10</value><source>mapred-default.xml</source></property> <property><name>tfile.io.chunk.size</name><value>1048576</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.work-preserving-recovery.scheduling-wait-ms</name><value>10000</value><source>yarn-default.xml</source></property> <property><name>yarn.timeline-service.keytab</name><value>/etc/krb5.keytab</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.submithostaddress</name><value>192.168.58.100</value><source>programatically</source></property> <property><name>yarn.acl.enable</name><value>false</value><source>yarn-default.xml</source></property> <property><name>rpc.engine.org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolPB</name><value>org.apache.hadoop.ipc.ProtobufRpcEngine</value><source>programatically</source></property> <property><name>hadoop.security.group.mapping.ldap.directory.search.timeout</name><value>10000</value><source>core-default.xml</source></property> <property><name>mapreduce.job.token.tracking.ids.enabled</name><value>false</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.block-pinning.enabled</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.map.output.compress.codec</name><value>org.apache.hadoop.io.compress.DefaultCodec</value><source>mapred-default.xml</source></property> <property><name>yarn.sharedcache.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>s3.replication</name><value>3</value><source>core-default.xml</source></property> <property><name>hadoop.registry.zk.root</name><value>/registry</value><source>core-default.xml</source></property> <property><name>tfile.fs.input.buffer.size</name><value>262144</value><source>core-default.xml</source></property> <property><name>yarn.timeline-service.http-authentication.type</name><value>simple</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.user.name</name><value>hadoop</value><source>programatically</source></property> <property><name>ha.failover-controller.graceful-fence.connection.retries</name><value>1</value><source>core-default.xml</source></property> <property><name>net.topology.script.number.args</name><value>100</value><source>core-default.xml</source></property> <property><name>fs.s3n.multipart.uploads.block.size</name><value>67108864</value><source>core-default.xml</source></property> <property><name>yarn.sharedcache.admin.thread-count</name><value>1</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.recovery.dir</name><value>${hadoop.tmp.dir}/yarn-nm-recovery</value><source>yarn-default.xml</source></property> <property><name>hadoop.ssl.enabled</name><value>false</value><source>core-default.xml</source></property> <property><name>fs.AbstractFileSystem.ftp.impl</name><value>org.apache.hadoop.fs.ftp.FtpFs</value><source>core-default.xml</source></property> <property><name>yarn.timeline-service.handler-thread-count</name><value>10</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.reject-unresolved-dn-topology-mapping</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobhistory.recovery.store.class</name><value>org.apache.hadoop.mapreduce.v2.hs.HistoryServerFileSystemStateStoreService</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.log.retain-seconds</name><value>10800</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.admin.address</name><value>${yarn.resourcemanager.hostname}:8033</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.recovery.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>dfs.client.slow.io.warning.threshold.ms</name><value>30000</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name><value>/yarn-leader-election</value><source>yarn-default.xml</source></property> <property><name>fs.AbstractFileSystem.viewfs.impl</name><value>org.apache.hadoop.fs.viewfs.ViewFs</value><source>core-default.xml</source></property> <property><name>mapreduce.tasktracker.dns.interface</name><value>default</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobtracker.handler.count</name><value>10</value><source>mapred-default.xml</source></property> <property><name>dfs.blockreport.initialDelay</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>fs.AbstractFileSystem.hdfs.impl</name><value>org.apache.hadoop.fs.Hdfs</value><source>core-default.xml</source></property> <property><name>dfs.namenode.top.enabled</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.retrycache.expirytime.millis</name><value>600000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.speculative.speculative-cap-total-tasks</name><value>0.01</value><source>mapred-default.xml</source></property> <property><name>dfs.client.failover.sleep.max.millis</name><value>15000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.output.value.class</name><value>org.apache.hadoop.io.LongWritable</value><source>programatically</source></property> <property><name>yarn.sharedcache.nm.uploader.thread-count</name><value>20</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.blocks.per.postponedblocks.rescan</name><value>10000</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.max-completed-applications</name><value>10000</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.log-dirs</name><value>${yarn.log.dir}/userlogs</value><source>yarn-default.xml</source></property> <property><name>dfs.client.failover.sleep.base.millis</name><value>500</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.linux-container-executor.nonsecure-mode.user-pattern</name><value>^[_.A-Za-z0-9][-@_.A-Za-z0-9]{0,255}?[$]?$</value><source>yarn-default.xml</source></property> <property><name>dfs.default.chunk.view.size</name><value>32768</value><source>hdfs-default.xml</source></property> <property><name>dfs.client.read.shortcircuit</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>ftp.blocksize</name><value>67108864</value><source>core-default.xml</source></property> <property><name>mapreduce.job.acl-modify-job</name><value> </value><source>mapred-default.xml</source></property> <property><name>fs.defaultFS</name><value>hdfs://hadoopH1:9000/</value><source>programatically</source></property> <property><name>hadoop.http.filter.initializers</name><value>org.apache.hadoop.yarn.server.webproxy.amfilter.AmFilterInitializer</value><source>programatically</source></property> <property><name>fs.s3n.multipart.copy.block.size</name><value>5368709120</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.connect.max-wait.ms</name><value>900000</value><source>yarn-default.xml</source></property> <property><name>hadoop.security.group.mapping.ldap.ssl</name><value>false</value><source>core-default.xml</source></property> <property><name>dfs.namenode.max.extra.edits.segments.retained</name><value>10000</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.https-address</name><value>0.0.0.0:50470</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.aux-services</name><value>mapreduce_shuffle</value><source>yarn-site.xml</source></property> <property><name>dfs.block.scanner.volume.bytes.per.second</name><value>1048576</value><source>hdfs-default.xml</source></property> <property><name>yarn.sharedcache.store.class</name><value>org.apache.hadoop.yarn.server.sharedcachemanager.store.InMemorySCMStore</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.decommission.blocks.per.interval</name><value>500000</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.admin.client.thread-count</name><value>1</value><source>yarn-default.xml</source></property> <property><name>hadoop.security.kms.client.encrypted.key.cache.size</name><value>500</value><source>core-default.xml</source></property> <property><name>yarn.app.mapreduce.shuffle.log.separate</name><value>true</value><source>mapred-default.xml</source></property> <property><name>ipc.client.kill.max</name><value>10</value><source>core-default.xml</source></property> <property><name>hadoop.security.group.mapping.ldap.search.filter.group</name><value>(objectClass=group)</value><source>core-default.xml</source></property> <property><name>fs.AbstractFileSystem.file.impl</name><value>org.apache.hadoop.fs.local.LocalFs</value><source>core-default.xml</source></property> <property><name>hadoop.http.authentication.kerberos.keytab</name><value>${user.home}/hadoop.keytab</value><source>core-default.xml</source></property> <property><name>yarn.client.nodemanager-connect.max-wait-ms</name><value>180000</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.map.output.collector.class</name><value>org.apache.hadoop.mapred.MapTask$MapOutputBuffer</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.path.based.cache.retry.interval.ms</name><value>30000</value><source>hdfs-default.xml</source></property> <property><name>hadoop.security.uid.cache.secs</name><value>14400</value><source>core-default.xml</source></property> <property><name>mapreduce.map.cpu.vcores</name><value>1</value><source>mapred-default.xml</source></property> <property><name>yarn.log-aggregation.retain-check-interval-seconds</name><value>-1</value><source>yarn-default.xml</source></property> <property><name>mapreduce.map.log.level</name><value>INFO</value><source>mapred-default.xml</source></property> <property><name>mapred.child.java.opts</name><value>-Xmx200m</value><source>mapred-default.xml</source></property> <property><name>yarn.app.mapreduce.am.hard-kill-timeout-ms</name><value>10000</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.output.key.class</name><value>org.apache.hadoop.io.Text</value><source>programatically</source></property> <property><name>hadoop.registry.zk.session.timeout.ms</name><value>60000</value><source>core-default.xml</source></property> <property><name>mapreduce.job.running.map.limit</name><value>0</value><source>mapred-default.xml</source></property> <property><name>yarn.sharedcache.store.in-memory.initial-delay-mins</name><value>10</value><source>yarn-default.xml</source></property> <property><name>yarn.sharedcache.client-server.thread-count</name><value>50</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.local-cache.max-files-per-directory</name><value>8192</value><source>yarn-default.xml</source></property> <property><name>dfs.https.server.keystore.resource</name><value>ssl-server.xml</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobtracker.taskcache.levels</name><value>2</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.handler.count</name><value>10</value><source>hdfs-default.xml</source></property> <property><name>s3native.blocksize</name><value>67108864</value><source>core-default.xml</source></property> <property><name>mapreduce.client.completion.pollinterval</name><value>5000</value><source>mapred-default.xml</source></property> <property><name>dfs.stream-buffer-size</name><value>4096</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.delegation.key.update-interval</name><value>86400000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.maps</name><value>1</value><source>programatically</source></property> <property><name>mapreduce.job.acl-view-job</name><value> </value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.working.dir</name><value>hdfs://hadoopH1:9000/user/hadoop</value><source>programatically</source></property> <property><name>dfs.namenode.enable.retrycache</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.connect.retry-interval.ms</name><value>30000</value><source>yarn-default.xml</source></property> <property><name>yarn.timeline-service.leveldb-timeline-store.ttl-interval-ms</name><value>300000</value><source>yarn-default.xml</source></property> <property><name>fs.s3a.multipart.threshold</name><value>2147483647</value><source>core-default.xml</source></property> <property><name>dfs.namenode.decommission.interval</name><value>30</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.shuffle.max.connections</name><value>0</value><source>mapred-default.xml</source></property> <property><name>mapreduce.input.fileinputformat.inputdir</name><value>hdfs://hadoopH1:9000/wc/input</value><source>programatically</source></property> <property><name>yarn.log-aggregation-enable</name><value>true</value><source>yarn-site.xml</source></property> <property><name>dfs.client-write-packet-size</name><value>65536</value><source>hdfs-default.xml</source></property> <property><name>dfs.client.file-block-storage-locations.timeout.millis</name><value>1000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobtracker.expire.trackers.interval</name><value>600000</value><source>mapred-default.xml</source></property> <property><name>dfs.client.block.write.retries</name><value>3</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.task.io.sort.factor</name><value>10</value><source>mapred-default.xml</source></property> <property><name>ha.health-monitor.sleep-after-disconnect.ms</name><value>1000</value><source>core-default.xml</source></property> <property><name>ha.zookeeper.session-timeout.ms</name><value>5000</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.linux-container-executor.nonsecure-mode.limit-users</name><value>true</value><source>yarn-default.xml</source></property> <property><name>mapreduce.input.fileinputformat.list-status.num-threads</name><value>1</value><source>mapred-default.xml</source></property> <property><name>io.skip.checksum.errors</name><value>false</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.scheduler.client.thread-count</name><value>50</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.safemode.extension</name><value>30000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobhistory.move.thread-count</name><value>3</value><source>mapred-default.xml</source></property> <property><name>yarn.resourcemanager.zk-state-store.parent-path</name><value>/rmstore</value><source>yarn-default.xml</source></property> <property><name>ipc.client.idlethreshold</name><value>4000</value><source>core-default.xml</source></property> <property><name>yarn.sharedcache.cleaner.initial-delay-mins</name><value>10</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.accesstime.precision</name><value>3600000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.task.profile.params</name><value>-agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobhistory.keytab</name><value>/etc/security/keytab/jhs.service.keytab</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.hdfs-blocks-metadata.enabled</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>yarn.scheduler.minimum-allocation-mb</name><value>1024</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.container-tokens.master-key-rolling-interval-secs</name><value>86400</value><source>yarn-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.fetch.retry.interval-ms</name><value>1000</value><source>mapred-default.xml</source></property> <property><name>hadoop.user.group.static.mapping.overrides</name><value>dr.who=;</value><source>core-default.xml</source></property> <property><name>hadoop.security.kms.client.encrypted.key.cache.low-watermark</name><value>0.3f</value><source>core-default.xml</source></property> <property><name>dfs.datanode.directoryscan.interval</name><value>21600</value><source>hdfs-default.xml</source></property> <property><name>fs.s3a.connection.ssl.enabled</name><value>true</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.scheduler.monitor.policies</name><value>org.apache.hadoop.yarn.server.resourcemanager.monitor.capacity.ProportionalCapacityPreemptionPolicy</value><source>yarn-default.xml</source></property> <property><name>mapreduce.output.fileoutputformat.outputdir</name><value>hdfs://hadoopH1:9000/wc/output</value><source>programatically</source></property> <property><name>ipc.server.listen.queue.size</name><value>128</value><source>core-default.xml</source></property> <property><name>rpc.metrics.quantile.enable</name><value>false</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.log-aggregation.roll-monitoring-interval-seconds</name><value>-1</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobtracker.persist.jobstatus.dir</name><value>/jobtracker/jobsInfo</value><source>mapred-default.xml</source></property> <property><name>yarn.client.nodemanager-client-async.thread-pool-max-size</name><value>500</value><source>yarn-default.xml</source></property> <property><name>hadoop.security.group.mapping</name><value>org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.system-metrics-publisher.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.name.dir</name><value>/home/hadoop/App/hadoop-2.7.1/data/name</value><source>hdfs-site.xml</source></property> <property><name>yarn.am.liveness-monitor.expiry-interval-ms</name><value>600000</value><source>yarn-default.xml</source></property> <property><name>yarn.nm.liveness-monitor.expiry-interval-ms</name><value>600000</value><source>yarn-default.xml</source></property> <property><name>ftp.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property> <property><name>yarn.sharedcache.nested-level</name><value>3</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.max.objects</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.emit-timeline-data</name><value>false</value><source>mapred-default.xml</source></property> <property><name>mapreduce.map.memory.mb</name><value>1024</value><source>mapred-default.xml</source></property> <property><name>yarn.client.nodemanager-connect.retry-interval-ms</name><value>10000</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.edits.journal-plugin.qjournal</name><value>org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.tasktracker.healthchecker.interval</name><value>60000</value><source>mapred-default.xml</source></property> <property><name>nfs.wtmax</name><value>1048576</value><source>hdfs-default.xml</source></property> <property><name>yarn.timeline-service.leveldb-timeline-store.start-time-read-cache-size</name><value>10000</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.speculative.retry-after-no-speculate</name><value>1000</value><source>mapred-default.xml</source></property> <property><name>hadoop.registry.zk.connection.timeout.ms</name><value>15000</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.address</name><value>${yarn.resourcemanager.hostname}:8032</value><source>yarn-default.xml</source></property> <property><name>dfs.cachereport.intervalMsec</name><value>10000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.task.skip.start.attempts</name><value>2</value><source>mapred-default.xml</source></property> <property><name>yarn.resourcemanager.zk-timeout-ms</name><value>10000</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.checkpoint.edits.dir</name><value>${dfs.namenode.checkpoint.dir}</value><source>hdfs-default.xml</source></property> <property><name>hadoop.hdfs.configuration.version</name><value>1</value><source>hdfs-default.xml</source></property> <property><name>yarn.sharedcache.cleaner.resource-sleep-ms</name><value>0</value><source>yarn-default.xml</source></property> <property><name>mapreduce.map.skip.maxrecords</name><value>0</value><source>mapred-default.xml</source></property> <property><name>yarn.resourcemanager.system-metrics-publisher.dispatcher.pool-size</name><value>10</value><source>yarn-default.xml</source></property> <property><name>dfs.datanode.available-space-volume-choosing-policy.balanced-space-threshold</name><value>10737418240</value><source>hdfs-default.xml</source></property> <property><name>nfs.allow.insecure.ports</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobtracker.system.dir</name><value>${hadoop.tmp.dir}/mapred/system</value><source>mapred-default.xml</source></property> <property><name>yarn.timeline-service.hostname</name><value>0.0.0.0</value><source>yarn-default.xml</source></property> <property><name>hadoop.registry.rm.enabled</name><value>false</value><source>core-default.xml</source></property> <property><name>mapreduce.job.reducer.preempt.delay.sec</name><value>0</value><source>mapred-default.xml</source></property> <property><name>mapreduce.shuffle.ssl.enabled</name><value>false</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.vmem-pmem-ratio</name><value>2.1</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.container-manager.thread-count</name><value>20</value><source>yarn-default.xml</source></property> <property><name>dfs.encrypt.data.transfer</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>dfs.block.access.key.update.interval</name><value>600</value><source>hdfs-default.xml</source></property> <property><name>hadoop.tmp.dir</name><value>/home/hadoop/App/hadoop-2.7.1/data/tmp</value><source>core-site.xml</source></property> <property><name>dfs.namenode.audit.loggers</name><value>default</value><source>hdfs-default.xml</source></property> <property><name>fs.AbstractFileSystem.har.impl</name><value>org.apache.hadoop.fs.HarFs</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.localizer.cache.target-size-mb</name><value>10240</value><source>yarn-default.xml</source></property> <property><name>yarn.app.mapreduce.shuffle.log.backups</name><value>0</value><source>mapred-default.xml</source></property> <property><name>yarn.http.policy</name><value>HTTP_ONLY</value><source>yarn-default.xml</source></property> <property><name>dfs.client.short.circuit.replica.stale.threshold.ms</name><value>1800000</value><source>hdfs-default.xml</source></property> <property><name>yarn.timeline-service.webapp.https.address</name><value>${yarn.timeline-service.hostname}:8190</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.amlauncher.thread-count</name><value>50</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobtracker.persist.jobstatus.hours</name><value>1</value><source>mapred-default.xml</source></property> <property><name>tfile.fs.output.buffer.size</name><value>262144</value><source>core-default.xml</source></property> <property><name>dfs.namenode.checkpoint.check.period</name><value>60</value><source>hdfs-default.xml</source></property> <property><name>dfs.datanode.dns.interface</name><value>default</value><source>hdfs-default.xml</source></property> <property><name>fs.ftp.host.port</name><value>21</value><source>core-default.xml</source></property> <property><name>mapreduce.task.io.sort.mb</name><value>100</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.inotify.max.events.per.rpc</name><value>1000</value><source>hdfs-default.xml</source></property> <property><name>hadoop.security.group.mapping.ldap.search.attr.group.name</name><value>cn</value><source>core-default.xml</source></property> <property><name>dfs.namenode.avoid.read.stale.datanode</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.output.fileoutputformat.compress.type</name><value>RECORD</value><source>mapred-default.xml</source></property> <property><name>mapreduce.reduce.skip.proc.count.autoincr</name><value>true</value><source>mapred-default.xml</source></property> <property><name>file.bytes-per-checksum</name><value>512</value><source>core-default.xml</source></property> <property><name>mapreduce.job.userlog.retain.hours</name><value>24</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.http.address</name><value>0.0.0.0:50075</value><source>hdfs-default.xml</source></property> <property><name>dfs.image.compress</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>ha.health-monitor.check-interval.ms</name><value>1000</value><source>core-default.xml</source></property> <property><name>dfs.permissions.enabled</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.resource-tracker.client.thread-count</name><value>50</value><source>yarn-default.xml</source></property> <property><name>dfs.client.domain.socket.data.traffic</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>dfs.image.compression.codec</name><value>org.apache.hadoop.io.compress.DefaultCodec</value><source>hdfs-default.xml</source></property> <property><name>dfs.datanode.address</name><value>0.0.0.0:50010</value><source>hdfs-default.xml</source></property> <property><name>dfs.block.access.token.enable</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.reduce.input.buffer.percent</name><value>0.0</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.linux-container-executor.cgroups.strict-resource-usage</name><value>false</value><source>yarn-default.xml</source></property> <property><name>mapreduce.tasktracker.local.dir.minspacestart</name><value>0</value><source>mapred-default.xml</source></property> <property><name>dfs.blockreport.intervalMsec</name><value>21600000</value><source>hdfs-default.xml</source></property> <property><name>ha.health-monitor.rpc-timeout.ms</name><value>45000</value><source>core-default.xml</source></property> <property><name>dfs.client.failover.connection.retries</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.kerberos.internal.spnego.principal</name><value>${dfs.web.authentication.kerberos.principal}</value><source>hdfs-default.xml</source></property> <property><name>yarn.scheduler.maximum-allocation-mb</name><value>8192</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.leveldb-state-store.path</name><value>${hadoop.tmp.dir}/yarn/system/rmstore</value><source>yarn-default.xml</source></property> <property><name>mapreduce.task.files.preserve.failedtasks</name><value>false</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.delete.thread-count</name><value>4</value><source>yarn-default.xml</source></property> <property><name>mapreduce.output.fileoutputformat.compress.codec</name><value>org.apache.hadoop.io.compress.DefaultCodec</value><source>mapred-default.xml</source></property> <property><name>map.sort.class</name><value>org.apache.hadoop.util.QuickSort</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.classloader</name><value>false</value><source>mapred-default.xml</source></property> <property><name>hadoop.registry.zk.retry.ceiling.ms</name><value>60000</value><source>core-default.xml</source></property> <property><name>mapreduce.jobtracker.tasktracker.maxblacklists</name><value>4</value><source>mapred-default.xml</source></property> <property><name>io.seqfile.compress.blocksize</name><value>1000000</value><source>core-default.xml</source></property> <property><name>dfs.blocksize</name><value>134217728</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.task.profile.maps</name><value>0-2</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobtracker.staging.root.dir</name><value>${hadoop.tmp.dir}/mapred/staging</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.localizer.cache.cleanup.interval-ms</name><value>600000</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobtracker.http.address</name><value>0.0.0.0:50030</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.reduce.class</name><value>cn.hadoop.mr.wc.WCReducer</value><source>programatically</source></property> <property><name>mapreduce.job.dir</name><value>/tmp/hadoop-yarn/staging/hadoop/.staging/job_1582165983362_0009</value><source>programatically</source></property> <property><name>dfs.client.mmap.cache.timeout.ms</name><value>3600000</value><source>hdfs-default.xml</source></property> <property><name>hadoop.security.java.secure.random.algorithm</name><value>SHA1PRNG</value><source>core-default.xml</source></property> <property><name>fs.client.resolve.remote.symlinks</name><value>true</value><source>core-default.xml</source></property> <property><name>mapreduce.tasktracker.local.dir.minspacekill</name><value>0</value><source>mapred-default.xml</source></property> <property><name>nfs.mountd.port</name><value>4242</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.disk-health-checker.min-healthy-disks</name><value>0.25</value><source>yarn-default.xml</source></property> <property><name>mapreduce.tasktracker.taskmemorymanager.monitoringinterval</name><value>5000</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.resource.du.reserved</name><value>104857600</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.end-notification.retry.interval</name><value>1000</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobhistory.loadedjobs.cache.size</name><value>5</value><source>mapred-default.xml</source></property> <property><name>dfs.client.datanode-restart.timeout</name><value>30</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.local-dirs</name><value>${hadoop.tmp.dir}/nm-local-dir</value><source>yarn-default.xml</source></property> <property><name>dfs.datanode.block.id.layout.upgrade.threads</name><value>12</value><source>hdfs-default.xml</source></property> <property><name>hadoop.registry.jaas.context</name><value>Client</value><source>core-default.xml</source></property> <property><name>yarn.timeline-service.webapp.address</name><value>${yarn.timeline-service.hostname}:8188</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobhistory.address</name><value>0.0.0.0:10020</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobtracker.persist.jobstatus.active</name><value>true</value><source>mapred-default.xml</source></property> <property><name>file.blocksize</name><value>67108864</value><source>core-default.xml</source></property> <property><name>dfs.datanode.readahead.bytes</name><value>4194304</value><source>hdfs-default.xml</source></property> <property><name>yarn.sharedcache.cleaner.period-mins</name><value>1440</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.http-address</name><value>0.0.0.0:50070</value><source>hdfs-default.xml</source></property> <property><name>hadoop.work.around.non.threadsafe.getpwuid</name><value>false</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.configuration.provider-class</name><value>org.apache.hadoop.yarn.LocalConfigurationProvider</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.recovery.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.hostname</name><value>hadoopH1</value><source>yarn-site.xml</source></property> <property><name>fs.s3n.multipart.uploads.enabled</name><value>false</value><source>core-default.xml</source></property> <property><name>dfs.namenode.fs-limits.max-component-length</name><value>255</value><source>hdfs-default.xml</source></property> <property><name>ha.failover-controller.cli-check.rpc-timeout.ms</name><value>20000</value><source>core-default.xml</source></property> <property><name>ftp.client-write-packet-size</name><value>65536</value><source>core-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.parallelcopies</name><value>5</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobhistory.principal</name><value>jhs/_HOST@REALM.TLD</value><source>mapred-default.xml</source></property> <property><name>hadoop.http.authentication.simple.anonymous.allowed</name><value>true</value><source>core-default.xml</source></property> <property><name>yarn.log-aggregation.retain-seconds</name><value>-1</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.windows-container.cpu-limit.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>yarn.timeline-service.http-authentication.simple.anonymous.allowed</name><value>true</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.secondary.https-address</name><value>0.0.0.0:50091</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.job.ubertask.maxreduces</name><value>1</value><source>mapred-default.xml</source></property> <property><name>fs.s3a.connection.establish.timeout</name><value>5000</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.health-checker.interval-ms</name><value>600000</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.fs-limits.max-xattr-size</name><value>16384</value><source>hdfs-default.xml</source></property> <property><name>fs.s3a.multipart.purge</name><value>false</value><source>core-default.xml</source></property> <property><name>hadoop.security.kms.client.encrypted.key.cache.num.refill.threads</name><value>2</value><source>core-default.xml</source></property> <property><name>yarn.timeline-service.store-class</name><value>org.apache.hadoop.yarn.server.timeline.LeveldbTimelineStore</value><source>yarn-default.xml</source></property> <property><name>mapreduce.shuffle.transfer.buffer.size</name><value>131072</value><source>mapred-default.xml</source></property> <property><name>yarn.resourcemanager.zk-num-retries</name><value>1000</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobtracker.jobhistory.task.numberprogresssplits</name><value>12</value><source>mapred-default.xml</source></property> <property><name>yarn.sharedcache.store.in-memory.staleness-period-mins</name><value>10080</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.webapp.address</name><value>${yarn.nodemanager.hostname}:8042</value><source>yarn-default.xml</source></property> <property><name>yarn.app.mapreduce.client-am.ipc.max-retries</name><value>3</value><source>mapred-default.xml</source></property> <property><name>ha.failover-controller.new-active.rpc-timeout.ms</name><value>60000</value><source>core-default.xml</source></property> <property><name>mapreduce.jobhistory.client.thread-count</name><value>10</value><source>mapred-default.xml</source></property> <property><name>fs.trash.interval</name><value>0</value><source>core-default.xml</source></property> <property><name>mapreduce.fileoutputcommitter.algorithm.version</name><value>1</value><source>mapred-default.xml</source></property> <property><name>mapreduce.reduce.skip.maxgroups</name><value>0</value><source>mapred-default.xml</source></property> <property><name>mapreduce.map.output.value.class</name><value>org.apache.hadoop.io.LongWritable</value><source>programatically</source></property> <property><name>dfs.namenode.top.windows.minutes</name><value>1,5,25</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.reduce.memory.mb</name><value>1024</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.health-checker.script.timeout-ms</name><value>1200000</value><source>yarn-default.xml</source></property> <property><name>dfs.datanode.du.reserved</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.resource.check.interval</name><value>5000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.client.progressmonitor.pollinterval</name><value>1000</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.hostname</name><value>0.0.0.0</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.ha.enabled</name><value>false</value><source>yarn-default.xml</source></property> <property><name>dfs.ha.log-roll.period</name><value>120</value><source>hdfs-default.xml</source></property> <property><name>yarn.scheduler.minimum-allocation-vcores</name><value>1</value><source>yarn-default.xml</source></property> <property><name>dfs.client.block.write.replace-datanode-on-failure.best-effort</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>yarn.app.mapreduce.am.container.log.limit.kb</name><value>0</value><source>mapred-default.xml</source></property> <property><name>hadoop.http.authentication.signature.secret.file</name><value>${user.home}/hadoop-http-auth-signature-secret</value><source>core-default.xml</source></property> <property><name>mapreduce.jobhistory.move.interval-ms</name><value>180000</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.container-executor.class</name><value>org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor</value><source>yarn-default.xml</source></property> <property><name>hadoop.security.authorization</name><value>false</value><source>core-default.xml</source></property> <property><name>dfs.storage.policy.enabled</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>dfs.datanode.https.address</name><value>0.0.0.0:50475</value><source>hdfs-default.xml</source></property> <property><name>yarn.nodemanager.localizer.address</name><value>${yarn.nodemanager.hostname}:8040</value><source>yarn-default.xml</source></property> <property><name>mapreduce.jobhistory.recovery.store.fs.uri</name><value>${hadoop.tmp.dir}/mapred/history/recoverystore</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.replication.min</name><value>1</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.shuffle.connection-keep-alive.enable</name><value>false</value><source>mapred-default.xml</source></property> <property><name>dfs.namenode.top.num.users</name><value>10</value><source>hdfs-default.xml</source></property> <property><name>hadoop.common.configuration.version</name><value>0.23.0</value><source>core-default.xml</source></property> <property><name>yarn.app.mapreduce.task.container.log.backups</name><value>0</value><source>mapred-default.xml</source></property> <property><name>hadoop.security.groups.negative-cache.secs</name><value>30</value><source>core-default.xml</source></property> <property><name>mapreduce.ifile.readahead</name><value>true</value><source>mapred-default.xml</source></property> <property><name>yarn.nodemanager.resource.percentage-physical-cpu-limit</name><value>100</value><source>yarn-default.xml</source></property> <property><name>mapreduce.job.max.split.locations</name><value>10</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.max.locked.memory</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>hadoop.registry.zk.quorum</name><value>localhost:2181</value><source>core-default.xml</source></property> <property><name>fs.s3a.threads.keepalivetime</name><value>60</value><source>core-default.xml</source></property> <property><name>mapreduce.jobhistory.joblist.cache.size</name><value>20000</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.end-notification.max.attempts</name><value>5</value><source>mapred-default.xml</source></property> <property><name>dfs.image.transfer.timeout</name><value>60000</value><source>hdfs-default.xml</source></property> <property><name>dfs.client.read.shortcircuit.skip.checksum</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>nfs.rtmax</name><value>1048576</value><source>hdfs-default.xml</source></property> <property><name>dfs.namenode.edit.log.autoroll.check.interval.ms</name><value>300000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.connect.timeout</name><value>180000</value><source>mapred-default.xml</source></property> <property><name>dfs.datanode.failed.volumes.tolerated</name><value>0</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobhistory.webapp.address</name><value>0.0.0.0:19888</value><source>mapred-default.xml</source></property> <property><name>fs.s3a.connection.timeout</name><value>50000</value><source>core-default.xml</source></property> <property><name>dfs.client.mmap.retry.timeout.ms</name><value>300000</value><source>hdfs-default.xml</source></property> <property><name>yarn.sharedcache.nm.uploader.replication.factor</name><value>10</value><source>yarn-default.xml</source></property> <property><name>dfs.datanode.data.dir.perm</name><value>700</value><source>hdfs-default.xml</source></property> <property><name>hadoop.http.authentication.token.validity</name><value>36000</value><source>core-default.xml</source></property> <property><name>ipc.client.connect.max.retries.on.timeouts</name><value>45</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.docker-container-executor.exec-name</name><value>/usr/bin/docker</value><source>yarn-default.xml</source></property> <property><name>yarn.app.mapreduce.am.job.committer.cancel-timeout</name><value>60000</value><source>mapred-default.xml</source></property> <property><name>dfs.ha.fencing.ssh.connect-timeout</name><value>30000</value><source>core-default.xml</source></property> <property><name>mapreduce.reduce.log.level</name><value>INFO</value><source>mapred-default.xml</source></property> <property><name>mapreduce.reduce.shuffle.merge.percent</name><value>0.66</value><source>mapred-default.xml</source></property> <property><name>ipc.client.fallback-to-simple-auth-allowed</name><value>false</value><source>core-default.xml</source></property> <property><name>io.serializations</name><value>org.apache.hadoop.io.serializer.WritableSerialization,org.apache.hadoop.io.serializer.avro.AvroSpecificSerialization,org.apache.hadoop.io.serializer.avro.AvroReflectSerialization</value><source>core-default.xml</source></property> <property><name>fs.s3.block.size</name><value>67108864</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.linux-container-executor.nonsecure-mode.local-user</name><value>nobody</value><source>yarn-default.xml</source></property> <property><name>hadoop.kerberos.kinit.command</name><value>kinit</value><source>core-default.xml</source></property> <property><name>hadoop.security.kms.client.encrypted.key.cache.expiry</name><value>43200000</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.fs.state-store.uri</name><value>${hadoop.tmp.dir}/yarn/system/rmstore</value><source>yarn-default.xml</source></property> <property><name>yarn.admin.acl</name><value>*</value><source>yarn-default.xml</source></property> <property><name>dfs.namenode.delegation.token.max-lifetime</name><value>604800000</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.reduce.merge.inmem.threshold</name><value>1000</value><source>mapred-default.xml</source></property> <property><name>net.topology.impl</name><value>org.apache.hadoop.net.NetworkTopology</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.ha.automatic-failover.enabled</name><value>true</value><source>yarn-default.xml</source></property> <property><name>dfs.datanode.use.datanode.hostname</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>dfs.heartbeat.interval</name><value>3</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.scheduler.class</name><value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value><source>yarn-default.xml</source></property> <property><name>io.map.index.skip</name><value>0</value><source>core-default.xml</source></property> <property><name>dfs.namenode.handler.count</name><value>10</value><source>hdfs-default.xml</source></property> <property><name>yarn.resourcemanager.webapp.https.address</name><value>${yarn.resourcemanager.hostname}:8090</value><source>yarn-default.xml</source></property> <property><name>yarn.nodemanager.admin-env</name><value>MALLOC_ARENA_MAX=$MALLOC_ARENA_MAX</value><source>yarn-default.xml</source></property> <property><name>hadoop.security.crypto.cipher.suite</name><value>AES/CTR/NoPadding</value><source>core-default.xml</source></property> <property><name>mapreduce.task.profile.map.params</name><value>${mapreduce.task.profile.params}</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobtracker.jobhistory.block.size</name><value>3145728</value><source>mapred-default.xml</source></property> <property><name>hadoop.security.crypto.buffer.size</name><value>8192</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value><source>yarn-default.xml</source></property> <property><name>mapreduce.cluster.acls.enabled</name><value>false</value><source>mapred-default.xml</source></property> <property><name>yarn.sharedcache.uploader.server.address</name><value>0.0.0.0:8046</value><source>yarn-default.xml</source></property> <property><name>fs.s3a.threads.max</name><value>256</value><source>core-default.xml</source></property> <property><name>fs.har.impl.disable.cache</name><value>true</value><source>core-default.xml</source></property> <property><name>mapreduce.tasktracker.map.tasks.maximum</name><value>2</value><source>mapred-default.xml</source></property> <property><name>ipc.client.connect.timeout</name><value>20000</value><source>core-default.xml</source></property> <property><name>yarn.nodemanager.remote-app-log-dir-suffix</name><value>logs</value><source>yarn-default.xml</source></property> <property><name>fs.df.interval</name><value>60000</value><source>core-default.xml</source></property> <property><name>hadoop.util.hash.type</name><value>murmur</value><source>core-default.xml</source></property> <property><name>mapreduce.jobhistory.minicluster.fixed.ports</name><value>false</value><source>mapred-default.xml</source></property> <property><name>mapreduce.jobtracker.jobhistory.lru.cache.size</name><value>5</value><source>mapred-default.xml</source></property> <property><name>yarn.app.mapreduce.shuffle.log.limit.kb</name><value>0</value><source>mapred-default.xml</source></property> <property><name>dfs.client.failover.max.attempts</name><value>15</value><source>hdfs-default.xml</source></property> <property><name>dfs.client.use.datanode.hostname</name><value>false</value><source>hdfs-default.xml</source></property> <property><name>ha.zookeeper.acl</name><value>world:anyone:rwcda</value><source>core-default.xml</source></property> <property><name>mapreduce.jobtracker.maxtasks.perjob</name><value>-1</value><source>mapred-default.xml</source></property> <property><name>mapreduce.job.speculative.speculative-cap-running-tasks</name><value>0.1</value><source>mapred-default.xml</source></property> <property><name>mapreduce.map.sort.spill.percent</name><value>0.80</value><source>mapred-default.xml</source></property> <property><name>file.stream-buffer-size</name><value>4096</value><source>core-default.xml</source></property> <property><name>yarn.resourcemanager.ha.automatic-failover.embedded</name><value>true</value><source>yarn-default.xml</source></property> <property><name>yarn.resourcemanager.nodemanager.minimum.version</name><value>NONE</value><source>yarn-default.xml</source></property> <property><name>hadoop.fuse.connection.timeout</name><value>300</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.tasktracker.instrumentation</name><value>org.apache.hadoop.mapred.TaskTrackerMetricsInst</value><source>mapred-default.xml</source></property> <property><name>io.seqfile.sorter.recordlimit</name><value>1000000</value><source>core-default.xml</source></property> <property><name>yarn.sharedcache.webapp.address</name><value>0.0.0.0:8788</value><source>yarn-default.xml</source></property> <property><name>yarn.app.mapreduce.am.resource.mb</name><value>1536</value><source>mapred-default.xml</source></property> <property><name>mapreduce.framework.name</name><value>yarn</value><source>mapred-site.xml</source></property> <property><name>mapreduce.job.reduce.slowstart.completedmaps</name><value>0.05</value><source>mapred-default.xml</source></property> <property><name>yarn.resourcemanager.client.thread-count</name><value>50</value><source>yarn-default.xml</source></property> <property><name>mapreduce.cluster.temp.dir</name><value>${hadoop.tmp.dir}/mapred/temp</value><source>mapred-default.xml</source></property> <property><name>dfs.client.mmap.enabled</name><value>true</value><source>hdfs-default.xml</source></property> <property><name>mapreduce.jobhistory.intermediate-done-dir</name><value>${yarn.app.mapreduce.am.staging-dir}/history/done_intermediate</value><source>mapred-default.xml</source></property> <property><name>fs.s3a.attempts.maximum</name><value>10</value><source>core-default.xml</source></property> </configuration> job.xml
二:MapReduce框架 (结合YARN框架)
补充:MapReduce框架知道我们写的map-reduce程序的运行逻辑。我们写的map-reduce中并没有管理层的任务运行分配逻辑,该逻辑被封装在MapReduce框架里面,被封装为MRAppMaster类,该类用于管理整个map-reduce的运行逻辑。(map-reduce程序的管理者)
MRAppMaster由YARN框架启动(动态启动,随机选取)
(一)框架流程图
注:MRAppMaster和yarnChild(包括map task和reduce task)都是动态产生的。
三: Job提交源码分析
(一)客户端提交job
wcjob.waitForCompletion(true);//将job提交给集群运行
(二)job.java进行提交
public boolean waitForCompletion(boolean verbose ){ if (state == JobState.DEFINE) { submit(); //进行提交 } if (verbose) { monitorAndPrintJob(); //监视并打印job状态 } else { // get the completion poll interval from the client. int completionPollIntervalMillis = Job.getCompletionPollInterval(cluster.getConf()); while (!isComplete()) { try { Thread.sleep(completionPollIntervalMillis); } catch (InterruptedException ie) { } } } return isSuccessful(); //返回成功状态 }
(三)job提交信息到集群中
public void submit(){ ensureState(JobState.DEFINE); //再确认一遍状态 setUseNewAPI(); //使用新的API connect(); final JobSubmitter submitter = getJobSubmitter(cluster.getFileSystem(), cluster.getClient()); status = ugi.doAs(new PrivilegedExceptionAction<JobStatus>() { public JobStatus run() throws IOException, InterruptedException, ClassNotFoundException { return submitter.submitJobInternal(Job.this, cluster); } }); state = JobState.RUNNING; LOG.info("The url to track the job: " + getTrackingURL()); }
(四)job.class中connect为集群赋值
private synchronized void connect() throws IOException, InterruptedException, ClassNotFoundException { if (cluster == null) { cluster = ugi.doAs(new PrivilegedExceptionAction<Cluster>() { public Cluster run() throws IOException, InterruptedException, ClassNotFoundException { return new Cluster(getConfiguration()); } }); } }
(五)Cluster.class中集群的赋值
public Cluster(Configuration conf) throws IOException { this(null, conf); } public Cluster(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException { this.conf = conf; //配置信息 this.ugi = UserGroupInformation.getCurrentUser(); //当前用户 initialize(jobTrackAddr, conf); }
(六)集群初始化
private void initialize(InetSocketAddress jobTrackAddr, Configuration conf) throws IOException { synchronized (frameworkLoader) { for (ClientProtocolProvider provider : frameworkLoader) { //循环所有框架,直到成功创建一个为止 ClientProtocol clientProtocol = null; try { if (jobTrackAddr == null) { clientProtocol = provider.create(conf); } else { clientProtocol = provider.create(jobTrackAddr, conf); } if (clientProtocol != null) { clientProtocolProvider = provider; client = clientProtocol; LOG.debug("Picked " + provider.getClass().getName() + " as the ClientProtocolProvider"); break; } else { LOG.debug("Cannot pick " + provider.getClass().getName() + " as the ClientProtocolProvider - returned null protocol"); } } catch (Exception e) { LOG.info("Failed to use " + provider.getClass().getName() + " due to error: " + e.getMessage()); } } } if (null == clientProtocolProvider || null == client) { throw new IOException( "Cannot initialize Cluster. Please check your configuration for " + MRConfig.FRAMEWORK_NAME + " and the correspond server addresses."); } }
1.集群客户端创建
(1)本地创建
public class LocalClientProtocolProvider extends ClientProtocolProvider { @Override public ClientProtocol create(Configuration conf) throws IOException { String framework = conf.get(MRConfig.FRAMEWORK_NAME, MRConfig.LOCAL_FRAMEWORK_NAME); if (!MRConfig.LOCAL_FRAMEWORK_NAME.equals(framework)) { return null; } conf.setInt(JobContext.NUM_MAPS, 1); return new LocalJobRunner(conf); }
获取配置信息,可以知道框架使用yarn
可以知道两者不等,故不创建本地集群客户端!!!
(2).创建yarn集群客户端
public class YarnClientProtocolProvider extends ClientProtocolProvider { @Override public ClientProtocol create(Configuration conf) throws IOException { if (MRConfig.YARN_FRAMEWORK_NAME.equals(conf.get(MRConfig.FRAMEWORK_NAME))) { return new YARNRunner(conf); }
集群客户端获取成功
if (clientProtocol != null) { clientProtocolProvider = provider; //为成员变量赋值 client = clientProtocol; LOG.debug("Picked " + provider.getClass().getName() + " as the ClientProtocolProvider"); break; //退出循环,不再创建集群 }
(七)立即提交job到集群中去
JobStatus submitJobInternal(Job job, Cluster cluster){ Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf); //从ResourceManager中获取资源配置存放路径 JobID jobId = submitClient.getNewJobID(); //获取jobid,用于创建目录 job.setJobID(jobId); Path submitJobDir = new Path(jobStagingArea, jobId.toString()); //生成完整路径 JobStatus status = null; try { copyAndConfigureFiles(job, submitJobDir); //提交jar包和配置文件到资源hdfs配置路径 int maps = writeSplits(job, submitJobDir); //获取切片信息 // Write job file to submit dir writeConf(conf, submitJobFile); //写描述文件xml到hdfs配置路径 } }
完整资源配置路径: