• hadoop动态添加删除节点datanode及恢复


    1. 配置系统环境

    主机名,ssh互信,环境变量等

    本文略去jdk安装,请将datanode的jdk安装路径与/etc/hadoop/hadoop-evn.sh中的java_home保持一致,版本hadoop2.7.5

    修改/etc/sysconfig/network

    然后执行命令
    hostname 主机名
    这个时候可以注销一下系统,再重登录之后就行了

    [root@localhost ~]# hostname
    localhost.localdomain
    [root@localhost ~]# hostname -i
    ::1 127.0.0.1
    [root@localhost ~]#
    [root@localhost ~]# cat /etc/sysconfig/network
    # Created by anaconda
    NETWORKING=yes
    HOSTNAME=slave2
    GATEWAY=192.168.48.2
    # oracle-rdbms-server-11gR2-preinstall : Add NOZEROCONF=yes
    NOZEROCONF=yes
    [root@localhost ~]# hostname slave2
    [root@localhost ~]# hostname
    slave2
    [root@localhost ~]# su - hadoop
    Last login: Sat Feb 24 14:25:48 CST 2018 on pts/1
    [hadoop@slave2 ~]$ su - root

    建datanode目录并改所有者

    (此处的具体路径值,请参照namenode中/usr/hadoop/hadoop-2.7.5/etc/hadoop/hdfs-site.xml,core-site.xml中的dfs.name.dir,dfs.data.dir,dfs.tmp.dir等)

    Su - root

    # mkdir -p /usr/local/hadoop-2.7.5/tmp/dfs/data

    # chmod -R 777 /usr/local/hadoop-2.7.5/tmp

    # chown -R hadoop:hadoop /usr/local/hadoop-2.7.5

    [root@slave2 ~]# mkdir -p /usr/local/hadoop-2.7.5/tmp/dfs/data
    [root@slave2 ~]# chmod -R 777 /usr/local/hadoop-2.7.5/tmp
     [root@slave2 ~]# chown -R hadoop:hadoop /usr/local/hadoop-2.7.5
     [root@slave2 ~]# pwd
    /root
    [root@slave2 ~]# cd /usr/local/
    [root@slave2 local]# ll
    total 0
    drwxr-xr-x. 2 root   root   46 Mar 21  2017 bin
    drwxr-xr-x. 2 root   root    6 Jun 10  2014 etc
    drwxr-xr-x. 2 root   root    6 Jun 10  2014 games
    drwxr-xr-x  3 hadoop hadoop 16 Feb 24 18:18 hadoop-2.7.5
    drwxr-xr-x. 2 root   root    6 Jun 10  2014 include
    drwxr-xr-x. 2 root   root    6 Jun 10  2014 lib
    drwxr-xr-x. 2 root   root    6 Jun 10  2014 lib64
    drwxr-xr-x. 2 root   root    6 Jun 10  2014 libexec
    drwxr-xr-x. 2 root   root    6 Jun 10  2014 sbin
    drwxr-xr-x. 5 root   root   46 Dec 17  2015 share
    drwxr-xr-x. 2 root   root    6 Jun 10  2014 src
    [root@slave2 local]#

    ssh互信,即实现 master-->slave2免密码

    master:

    [root@hadoop-master ~]# cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.48.129    hadoop-master
    192.168.48.132    slave1
    192.168.48.131    slave2
    [hadoop@hadoop-master ~]$ scp /usr/hadoop/.ssh/authorized_keys hadoop@slave2:/usr/hadoop/.ssh
    The authenticity of host 'slave2 (192.168.48.131)' can't be established.
    ECDSA key fingerprint is 1e:cd:d1:3d:b0:5b:62:45:a3:63:df:c7:7a:0f:b8:7c.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'slave2,192.168.48.131' (ECDSA) to the list of known hosts.
    hadoop@slave2's password: 
    authorized_keys         
    [hadoop@hadoop-master ~]$ ssh hadoop@slave2
    Last login: Sat Feb 24 18:27:33 2018
    [hadoop@slave2 ~]$
    [hadoop@slave2 ~]$ exit
    logout
    Connection to slave2 closed.
    [hadoop@hadoop-master ~]$
    

    2. 修改namenode节点的slave文件,增加新节点信息

    [hadoop@hadoop-master hadoop]$ pwd
    /usr/hadoop/hadoop-2.7.5/etc/hadoop
    [hadoop@hadoop-master hadoop]$ vi slaves 
    slave1
    slave2
    

    3. namenode节点上,hadoop-2.7.3复制到新节点上,并在新节点上删除data和logs目录中的文件

    Master

    [hadoop@hadoop-master ~]$ scp -R hadoop-2.7.5 hadoop@slave2:/usr/hadoop
    
    Slave2
    [hadoop@slave2 hadoop-2.7.5]$ ll
    total 124
    drwxr-xr-x 2 hadoop hadoop  4096 Feb 24 14:29 bin
    drwxr-xr-x 3 hadoop hadoop    19 Feb 24 14:30 etc
    drwxr-xr-x 2 hadoop hadoop   101 Feb 24 14:30 include
    drwxr-xr-x 3 hadoop hadoop    19 Feb 24 14:29 lib
    drwxr-xr-x 2 hadoop hadoop  4096 Feb 24 14:29 libexec
    -rw-r--r-- 1 hadoop hadoop 86424 Feb 24 18:44 LICENSE.txt
    drwxrwxr-x 2 hadoop hadoop  4096 Feb 24 14:30 logs
    -rw-r--r-- 1 hadoop hadoop 14978 Feb 24 18:44 NOTICE.txt
    -rw-r--r-- 1 hadoop hadoop  1366 Feb 24 18:44 README.txt
    drwxr-xr-x 2 hadoop hadoop  4096 Feb 24 14:29 sbin
    drwxr-xr-x 4 hadoop hadoop    29 Feb 24 14:30 share
    [hadoop@slave2 hadoop-2.7.5]$ pwd
    /usr/hadoop/hadoop-2.7.5
    [hadoop@slave2 hadoop-2.7.5]$ rm -R logs/*
    

    4. 启动新datanodedatanodenodemanger进程

    先确认namenode和当前的datanode中,etc/hoadoop/excludes文件中待加入的主机,再进行下面操作

    [hadoop@slave2 hadoop-2.7.5]$ sbin/hadoop-daemon.sh start datanode
    starting datanode, logging to /usr/hadoop/hadoop-2.7.5/logs/hadoop-hadoop-datanode-slave2.out
    [hadoop@slave2 hadoop-2.7.5]$ sbin/yarn-daemon.sh start nodemanager
    starting datanode, logging to /usr/hadoop/hadoop-2.7.5/logs/yarn-hadoop-datanode-slave2.out
    [hadoop@slave2 hadoop-2.7.5]$
    [hadoop@slave2 hadoop-2.7.5]$ jps
    3897 DataNode
    6772 NodeManager
    8189 Jps
    [hadoop@slave2 ~]$

    5、在NameNode上刷新节点

    [hadoop@hadoop-master ~]$ hdfs dfsadmin -refreshNodes
    Refresh nodes successful
    [hadoop@hadoop-master ~]$sbin/start-balancer.sh

    6. 在namenode查看当前集群情况,

    确认节点已经正常加入

    [hadoop@hadoop-master hadoop]$ hdfs dfsadmin -report
    Configured Capacity: 58663657472 (54.63 GB)
    Present Capacity: 15487176704 (14.42 GB)
    DFS Remaining: 15486873600 (14.42 GB)
    DFS Used: 303104 (296 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 5
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    
    -------------------------------------------------
    Live datanodes (2):
    
    Name: 192.168.48.131:50010 (slave2)
    Hostname: 183.221.250.11
    Decommission Status : Normal
    Configured Capacity: 38588669952 (35.94 GB)
    DFS Used: 8192 (8 KB)
    Non DFS Used: 36887191552 (34.35 GB)
    DFS Remaining: 1701470208 (1.58 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 4.41%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Thu Mar 01 19:36:33 PST 2018
    
    
    Name: 192.168.48.132:50010 (slave1)
    Hostname: slave1
    Decommission Status : Normal
    Configured Capacity: 20074987520 (18.70 GB)
    DFS Used: 294912 (288 KB)
    Non DFS Used: 6289289216 (5.86 GB)
    DFS Remaining: 13785403392 (12.84 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 68.67%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Thu Mar 01 19:36:35 PST 2018
    
    
    [hadoop@hadoop-master hadoop]$

    7动态删除datanode

    7.1配置NameNode的hdfs-site.xml,

    适当减小dfs.replication副本数,增加dfs.hosts.exclude配置

    [hadoop@hadoop-master hadoop]$ pwd
    /usr/hadoop/hadoop-2.7.5/etc/hadoop
    [hadoop@hadoop-master hadoop]$ cat hdfs-site.xml
    <configuration>
    <property>
          <name>dfs.replication</name>
          <value>3</value>
    </property>
      <property>
          <name>dfs.name.dir</name>
          <value>/usr/local/hadoop-2.7.5/tmp/dfs/name</value>
    </property>
        <property>
          <name>dfs.data.dir</name>
          <value>/usr/local/hadoop-2.7.5/tmp/dfs/data</value>
        </property>
    <property>
        <name>dfs.hosts.exclude</name>
        <value>/usr/hadoop/hadoop-2.7.5/etc/hadoop/excludes</value>
      </property>
    
    </configuration>

    7.2在namenode对应路径(/etc/hadoop/)下新建excludes文件,

    并写入待删除DataNode的ip或域名

    [hadoop@hadoop-master hadoop]$ pwd
    /usr/hadoop/hadoop-2.7.5/etc/hadoop

    [hadoop@hadoop-master hadoop]$ vi excludes ####slave2 192.168.48.131[hadoop@hadoop-master hadoop]$

    7.3NameNode上刷新所有DataNode

    hdfs dfsadmin -refreshNodes
    sbin/start-balancer.sh

    7.4在namenode查看当前集群情况,

    确认信节点已经正常删除,结果中已无slave2

    [hadoop@hadoop-master hadoop]$ hdfs dfsadmin -report 

    或者可以在web检测界面(ip:50070)上可以观测到DataNode逐渐变为Dead。

    http://192.168.48.129:50070/

    在datanode项,Admin state已经由“In Service“变为”Decommissioned“,则表示删除成功

    7.5停止已删除的节点相关进程

    [hadoop@slave2 hadoop-2.7.5]$ jps
    9530 Jps
    3897 DataNode
    6772 NodeManager
    [hadoop@slave2 hadoop-2.7.5]$ sbin/hadoop-daemon.sh stop datanode
    stopping datanode
    [hadoop@slave2 hadoop-2.7.5]$ sbin/yarn-daemon.sh stop nodemanager
    stopping nodemanager
    [hadoop@slave2 hadoop-2.7.5]$ jps
    9657 Jps
    [hadoop@slave2 hadoop-2.7.5]$ 

    8恢复已删除节点

    执行7.2 中删除相关信息,然后456即可

  • 相关阅读:
    MySQL创建临时表
    mysql存储过程之事务篇
    sqlserver 和MySQL的一些函数的区别
    JBoss7 如何用脚本 启动 和 停止
    Mysql [Err] 1118
    Mysql [Err] 1118
    Javascript两个感叹号的用法(!!)
    Mac上PyCharm运行多进程报错的解决方案
    Mac iTerm2使用lrzsz上传和下载文件
    Linux使用socks代理
  • 原文地址:https://www.cnblogs.com/pu20065226/p/8493316.html
Copyright © 2020-2023  润新知