• Apache Hadoop集群扩容实战案例


                 Apache Hadoop集群扩容实战案例

                                            作者:尹正杰

    版权声明:原创作品,谢绝转载!否则将追究法律责任。

    一.搭建完全分布式集群

      博主推荐阅读:
        https://www.cnblogs.com/yinzhengjie2020/p/12424192.html

    二.为新节点安装Hadoop运行环境

      博主推荐阅读:
        https://www.cnblogs.com/yinzhengjie2020/p/12422758.html

    三.将新节点添加到完全分布式集群实战案例

    1>.查看NameNode的WebUI

    2>.将HDFS集群的配置文件拷贝到新节点

    [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop]# scp hdfs-site.xml yarn-site.xml mapred-site.xml core-site.xml 172.200.4.107:${HADOOP_HOME}/etc/hadoop/
    root@172.200.4.107's password: 
    hdfs-site.xml                                                                                                                                                                                                                               100% 1155   493.4KB/s   00:00    
    yarn-site.xml                                                                                                                                                                                                                               100% 2062   855.7KB/s   00:00    
    mapred-site.xml                                                                                                                                                                                                                             100% 1841   919.3KB/s   00:00    
    core-site.xml                                                                                                                                                                                                                               100% 1199   676.2KB/s   00:00    
    [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop]# 
    [root@hadoop101.yinzhengjie.org.cn /yinzhengjie/softwares/hadoop-2.10.0/etc/hadoop]# 

    3>.新节点查看配置文件是否成功拷贝过来 

    [root@hadoop107.yinzhengjie.org.cn ~]# cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    
    #Hadoop cluster
    172.200.4.101 hadoop101.yinzhengjie.org.cn
    172.200.4.102 hadoop102.yinzhengjie.org.cn
    172.200.4.103 hadoop103.yinzhengjie.org.cn
    172.200.4.104 hadoop104.yinzhengjie.org.cn
    172.200.4.105 hadoop105.yinzhengjie.org.cn
    172.200.4.106 hadoop106.yinzhengjie.org.cn
    172.200.4.107 hadoop107.yinzhengjie.org.cn
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# cat /etc/hosts                        #需要确保"/etc/hosts"文件中有对应主机的解析名称哟,因为配置文件中使用的是主机名称进行解析的
    [root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    
        <property>
            <name>dfs.replication</name>
            <value>3</value>
            <description>指定HDFS副本的数量</description>
        </property>
    
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>hadoop105.yinzhengjie.org.cn:50090</value>
            <description>指定Hadoop辅助名称节点主机配置</description>
        </property>
    
    </configuration>
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/hdfs-site.xml
    [root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/yarn-site.xml
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
            <description>Reducer获取数据的方式</description>
        </property>
    
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop106.yinzhengjie.org.cn</value>
            <description>指定YARN的ResourceManager的地址</description>
        </property>
    
    
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
            <description>启用或禁用日志聚合的配置,默认为false,即禁用,将该值设置为true,表示开启日志聚集功能使能</description>
        </property>
    
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>604800</value>
            <description>删除聚合日志前要保留多长时间(默认单位是秒),默认值是"-1"表示禁用,请注意,将此值设置得太小,您将向Namenode发送垃圾邮件.</description>
        </property>
    
    
        <property>
            <name>yarn.log-aggregation.retain-check-interval-seconds</name>
            <value>3600</value>
            <description>单位为秒,检查聚合日志保留之间的时间.如果设置为0或负值,那么该值将被计算为聚合日志保留时间的十分之一;请注意,将此值设置得太小,您将向名称节点发送垃圾邮件.</description>
        </property>
    
    </configuration>
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/yarn-site.xml
    [root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
            <description>指定MR运行在YARN上</description>
        </property>
    
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hadoop101.yinzhengjie.org.cn:10020</value>
            <description>配置历史服务器端地址,默认端口为10020</description>
        </property>
    
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>hadoop101.yinzhengjie.org.cn:19888</value>
            <description>历史服务器WebUI端地址,默认端口是19888</description>
        </property>
    
    
        <property>
            <name>mapreduce.jobhistory.intermediate-done-dir</name>
            <value>/yinzhengjie/jobhistory/tmp</value>
            <description>MapReduce作业写入历史文件的HDFS目录</description>
        </property>
    
        <property>
            <name>mapreduce.jobhistory.done-dir</name>
            <value>/yinzhengjie/jobhistory/manager</value>
            <description>MR JobHistory服务器管理历史文件的HDFS目录</description>
        </property>
    
    </configuration>
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/mapred-site.xml
    [root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/core-site.xml
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop101.yinzhengjie.org.cn:9000</value>
            <description>指定HDFS中NameNode的RPC地址</description>
        </property>
    
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/yinzhengjie/softwares/hadoop-2.10.0/data/tmp</value>
            <description>指定Hadoop运行时产生文件的存储目录</description>
        </property>
    
    </configuration>
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# cat ${HADOOP_HOME}/etc/hadoop/core-site.xml

    4>.新节点启动DataNode进程(由于配置文件中记录了HDFS的RPC端口,启动DataNode后会自动向NameNode的RPC端口发起注册请求的)

    [root@hadoop107.yinzhengjie.org.cn ~]# jps
    4339 Jps
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# hadoop-daemon.sh start datanode
    starting datanode, logging to /yinzhengjie/softwares/hadoop-2.10.0/logs/hadoop-root-datanode-hadoop107.yinzhengjie.org.cn.out
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# jps
    4433 Jps
    4364 DataNode
    [root@hadoop107.yinzhengjie.org.cn ~]# 
    [root@hadoop107.yinzhengjie.org.cn ~]# 

    5>.再次查看NameNode的WebUI

  • 相关阅读:
    Django学习(二) Django框架简单搭建
    Django学习(一) Django安装配置
    Python学习(一) Python安装配置
    注册第一天,纪念一下
    小程序笔记
    详解HTML5中的进度条progress元素简介及兼容性处理
    服务管理
    yum
    管道,输出,管道,重定向,grep
    VIM
  • 原文地址:https://www.cnblogs.com/yinzhengjie2020/p/12489223.html
Copyright © 2020-2023  润新知