一些准备工作就不说了,包括设置ssh连接等,主要说一下配置文件内容及启动过程,以192.168.157.100~105几台服务器为例:
1、core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-kf100.jd.com:8020</value>
</property>
<property>
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp/hadoop-${user.name}</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hadoop.groups</name>
<value>*</value>
</property>
</configuration>
2、hadoop-env.sh:
添加jdk安装目录:export JAVA_HOME=/export/servers/jdk1.6.0_25
3、hdfs-site.xml:
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-kf100.jd.com:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/usr/local/hadoop/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/usr/local/hadoop/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
4、mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-kf100.jd.com:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-kf100.jd.com:19888</value>
</property>
<property>
<name>mapreduce.map.java.opts</name>
<value>-Xmx768M</value>
</property>
<property>
<name>mapreduce.reduce.java.opts</name>
<value>-Xmx1024M</value>
</property>
</configuration>
5、slaves:
hadoop-kf101.jd.com
hadoop-kf102.jd.com
hadoop-kf103.jd.com
hadoop-kf104.jd.com
hadoop-kf105.jd.com
6、yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>hadoop-kf100.jd.com:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>hadoop-kf100.jd.com:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>hadoop-kf100.jd.com:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>hadoop-kf100.jd.com:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>hadoop-kf100.jd.com:8088</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/usr/local/hadoop-2.2.0/etc/hadoop/fair-scheduler.xml</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
</configuration>
7、fair-scheduler.xml:--设置队列的app数
<allocations>
<queue name="erpmerge">
<minResources>671193 mb,378vcores</minResources>
<maxResources>851151 mb,480vcores</maxResources>
<maxRunningApps>200</maxRunningApps>
<weight>1.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
<user name="erpmerge">
<maxRunningApps>200</maxRunningApps>
</user>
<queue name="mart_cfo">
<minResources>671193 mb,378vcores</minResources>
<maxResources>851151 mb,480vcores</maxResources>
<maxRunningApps>200</maxRunningApps>
<weight>1.0</weight>
<schedulingPolicy>fair</schedulingPolicy>
</queue>
<user name="mart_cfo">
<maxRunningApps>200</maxRunningApps>
</user>
<userMaxAppsDefault>100</userMaxAppsDefault>
<defaultQueueSchedulingPolicy>fair</defaultQueueSchedulingPolicy>
</allocations>
每个节点的内容都是一样的,下面是启动过程:
namenode:sh hadoop-daemon.sh namenode start
datanode:sh hadoop-daemons.sh datanode start
jobhistory:sh mr-jobhistory-daemon.sh jobhistory start
resoucemanager、nodemanager:sh start-yarn.sh