Apache Hadoop集群扩容实战案例
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
一.搭建完全分布式集群
博主推荐阅读: https://www.cnblogs.com/yinzhengjie2020/p/12424192.html
二.为新节点安装Hadoop运行环境
博主推荐阅读: https://www.cnblogs.com/yinzhengjie2020/p/12422758.html
三.将新节点添加到完全分布式集群实战案例
1>.查看NameNode的WebUI
2>.将HDFS集群的配置文件拷贝到新节点
[root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop]# scp hdfs-site.xml yarn-site.xml mapred-site.xml core-site.xml 172.200.4.107:${HADOOP_HOME}/etc/hadoop/ root@172.200.4.107's password: hdfs-site.xml 100% 1155 493.4KB/s 00:00 yarn-site.xml 100% 2062 855.7KB/s 00:00 mapred-site.xml 100% 1841 919.3KB/s 00:00 core-site.xml 100% 1199 676.2KB/s 00:00 [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop]# [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop]#
3>.新节点查看配置文件是否成功拷贝过来
[root@hadoop107.yinzhengjie.org.cn ~]# cat /etc/hosts 127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 #Hadoop cluster 172.200.4.101 hadoop101.yinzhengjie.org.cn 172.200.4.102 hadoop102.yinzhengjie.org.cn 172.200.4.103 hadoop103.yinzhengjie.org.cn 172.200.4.104 hadoop104.yinzhengjie.org.cn 172.200.4.105 hadoop105.yinzhengjie.org.cn 172.200.4.106 hadoop106.yinzhengjie.org.cn 172.200.4.107 hadoop107.yinzhengjie.org.cn [root@hadoop107.yinzhengjie.org.cn ~]#
[root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>dfs.replication</name> <value>3</value> <description>指定HDFS副本的数量</description> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>hadoop105.yinzhengjie.org.cn:50090</value> <description>指定Hadoop辅助名称节点主机配置</description> </property> </configuration> [root@hadoop107.yinzhengjie.org.cn ~]#
[root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/yarn-site.xml <?xml version="1.0"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> <description>Reducer获取数据的方式</description> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>hadoop106.yinzhengjie.org.cn</value> <description>指定YARN的ResourceManager的地址</description> </property> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> <description>启用或禁用日志聚合的配置,默认为false,即禁用,将该值设置为true,表示开启日志聚集功能使能</description> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>604800</value> <description>删除聚合日志前要保留多长时间(默认单位是秒),默认值是"-1"表示禁用,请注意,将此值设置得太小,您将向Namenode发送垃圾邮件.</description> </property> <property> <name>yarn.log-aggregation.retain-check-interval-seconds</name> <value>3600</value> <description>单位为秒,检查聚合日志保留之间的时间.如果设置为0或负值,那么该值将被计算为聚合日志保留时间的十分之一;请注意,将此值设置得太小,您将向名称节点发送垃圾邮件.</description> </property> </configuration> [root@hadoop107.yinzhengjie.org.cn ~]#
[root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/mapred-site.xml <?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> <description>指定MR运行在YARN上</description> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop101.yinzhengjie.org.cn:10020</value> <description>配置历史服务器端地址,默认端口为10020</description> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop101.yinzhengjie.org.cn:19888</value> <description>历史服务器WebUI端地址,默认端口是19888</description> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/yinzhengjie/jobhistory/tmp</value> <description>MapReduce作业写入历史文件的HDFS目录</description> </property> <property> <name>mapreduce.jobhistory.done-dir</name> <value>/yinzhengjie/jobhistory/manager</value> <description>MR JobHistory服务器管理历史文件的HDFS目录</description> </property> </configuration> [root@hadoop107.yinzhengjie.org.cn ~]# [root@hadoop107.yinzhengjie.org.cn ~]#
[root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/core-site.xml <?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <!-- Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. --> <!-- Put site-specific property overrides in this file. --> <configuration> <property> <name>fs.defaultFS</name> <value>hdfs://hadoop101.yinzhengjie.org.cn:9000</value> <description>指定HDFS中NameNode的RPC地址</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/yinzhengjie/softwares/hadoop-2.10.0/data/tmp</value> <description>指定Hadoop运行时产生文件的存储目录</description> </property> </configuration> [root@hadoop107.yinzhengjie.org.cn ~]#
4>.新节点启动DataNode进程(由于配置文件中记录了HDFS的RPC端口,启动DataNode后会自动向NameNode的RPC端口发起注册请求的)
[root@hadoop107.yinzhengjie.org.cn ~]# jps 4339 Jps [root@hadoop107.yinzhengjie.org.cn ~]# [root@hadoop107.yinzhengjie.org.cn ~]# hadoop-daemon.sh start datanode starting datanode, logging to /yinzhengjie/softwares/hadoop-2.10.0/logs/hadoop-root-datanode-hadoop107.yinzhengjie.org.cn.out [root@hadoop107.yinzhengjie.org.cn ~]# [root@hadoop107.yinzhengjie.org.cn ~]# jps 4433 Jps 4364 DataNode [root@hadoop107.yinzhengjie.org.cn ~]# [root@hadoop107.yinzhengjie.org.cn ~]#
5>.再次查看NameNode的WebUI