• Hapoop 搭建 (七)HDFS HA搭建


    Hapoop 搭建 (三)hadoop集群搭建

    Hapoop 搭建 (五)搭建zookeeper集群环境

    本文是在hadoop搭建的基础上进行改造

     先对之前三个节点的配置、数据进行备份 

    /opt/modules/hadoop-2.8.2/etc/hadoop 文件夹

    /opt/modules/hadoop-2.8.2/tmp文件夹   

    cp -r hadoop/ backup-hadoop
    cp
    -r tmp/ backup-tmp

       

    一、hdfs-site.xml 文件配置

    <configuration>
      <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        <!-- mycluster 为自定义的值,下方配置要使用改值 -->
        <property>
            <name>dfs.nameservices</name>
            <value>mycluster</value>
        </property>
        <!-- 配置两个NameNode的标示符 -->
       <property>
            <name>dfs.ha.namenodes.mycluster</name>
            <value>nn1,nn2</value>
        </property>
        <!-- 配置两个NameNode 所在节点与访问端口 -->
        <property>
            <name>dfs.namenode.rpc-address.mycluster.nn1</name>
            <value>centos01:8020</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.mycluster.nn2</name>
            <value>centos02:8020</value>
        </property>
        <!-- 配置两个NameNode 的web页面访问地址 -->
        <property>
            <name>dfs.namenode.http-address.mycluster.nn1</name>
            <value>centos01:50070</value>
        </property>
       <property>
            <name>dfs.namenode.http-address.mycluster.nn2</name>
            <value>centos02:50070</value>
        </property>
        <!-- 设置一组JournalNode的URL地址 -->
        <property>
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://centos01:8485;centos02:8485;centos03:8485/mycluster</value>
        </property>
        <!-- JournalNode用于存放元数据和状态的目录 -->
        <property>
            <name>dfs.journalnode.edits.dir</name>
            <value>/opt/modules/hadoop-2.8.2/tmp/dfs/jn</value>
        </property>
        <!-- 客户端与NameNode通讯的类 -->
        <property>
            <name>dfs.client.failover.proxy.provider.mycluster</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <!-- 解决HA集群隔离问题 -->
        <property>
            <name>dfs.ha.fencing.methods</name>
            <value>sshfence</value>
        </property>
       <!-- 上述ssh通讯使用的密钥文件 -->
        <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
           <!-- hadoop为当前用户名 -->
            <value>/home/hadoop/.ssh/id_rsa</value>
        </property>
    
        <property>
            <name>dfs.permissions.enabled</name>
            <value>false</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/opt/modules/hadoop-2.8.2/tmp/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/opt/modules/hadoop-2.8.2/tmp/dfs/data</value>
        </property>
    </configuration>

    二、core-site.xml 文件配置

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <!-- <value>hdfs://centos01:9000</value> -->
            <value>hdfs://mycluster</value>
        </property>
        <property>
            <name>hadoop.temp.dir</name>
            <value>file:/opt/modules/hadoop-2.8.2/tmp</value>
        </property>
    </configuration>

     hdfs://centos01:9000 改为 hdfs://mycluster Hadoop启动时会找到对应的两个NameNode

    将 hdfs-stie.xml 与 core-site.xml 发送到另两个节点

    scp /opt/modules/hadoop-2.8.2/etc/hadoop/hdfs-site.xml hadoop@centos02:/opt/modules/hadoop-2.8.2/etc/hadoop/
    scp /opt/modules/hadoop-2.8.2/etc/hadoop/core-site.xml hadoop@centos02:/opt/modules/hadoop-2.8.2/etc/hadoop/
    scp /opt/modules/hadoop-2.8.2/etc/hadoop/hdfs-site.xml hadoop@centos03:/opt/modules/hadoop-2.8.2/etc/hadoop/
    scp /opt/modules/hadoop-2.8.2/etc/hadoop/core-site.xml hadoop@centos03:/opt/modules/hadoop-2.8.2/etc/hadoop/

    三、启动与测试

    1、启动JournalNode进程

    删除各个节点$HADOOP_HOME/tmp目录下所有文件

    分别进入3个节点Hadoop安装目录,启动3个节点的JournalNode进程

    sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh start journalnode

     

    2、格式化NameNode

    在centos01上执行 *在namenade所在的节点上处理 

    bin/hdfs namenode -format

    执行后存在这句话,执行成功
    common.Storage: Storage directory /opt/modules/hadoop-2.8.2/tmp/dfs/name has been successfully formatted.

    3、启动NameNode1(活动NameNode)

    进入centos01的hadoop安装目录启动namenode1

    sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh start namenode

    启动后生成images元数据

    4、复制NameNode1元数据

    进入centos02上进入Hadoop安装目录,执行以下,将centos01上的NameNode元数据复制到centos02上(或者将centos01 $HADOOP_HOME/tmp目录复制到centos02相同的位置)

    sh /opt/modules/hadoop-2.8.2/bin/hdfs namenode -bootstrapStandby

    执行后存在这句话,执行成功
    common.Storage: Storage directory /opt/modules/hadoop-2.8.2/tmp/dfs/name has been successfully formatted.

    5、启动NameNode2(备用NameNode)

    进入centos02的hadoop安装目录启动namenode2

    sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh start namenode

    启动后 浏览器 http://192.168.0.171:50070 查看NameNode1状态

    浏览器 http://192.168.0.172:50070 查看NameNode2状态

     状态都为standby

    6、将NameNode1状态设置为Active

    进入centos01的hadoop安装目录启动namenode1

    sh /opt/modules/hadoop-2.8.2/bin/hdfs haadmin -transitionToActive nn1

     刷新浏览器 http://192.168.0.171:50070 查看NameNode1状态

     状态变为Active

    此时DataNode还没有启动

    7、重启HDFS

    进入centos01的hadoop安装目录

    停止hdfs

    sh  sbin/stop-dfs.sh

    启动hdfs

    sh  sbin/start-dfs.sh

     8、再次将NameNode1状态设置为Active

    重启后NameNode、DataNode等进程已经启动,需要将NameNode1重新设置Active 

    sh /opt/modules/hadoop-2.8.2/bin/hdfs haadmin -transitionToActive nn1

    通过命令查看状态

    bin/hdfs haadmin -getServiceState nn1

    9、每个节点jps命令,查看状态

    查看每个节点进程

    jps

    10、测试HDFS

    将centos01上 kill -9 35396 ,手动到centos02上手动激活NameNode2(第六步)

    当发生故障需要手动切换

    四、结合Zookeeper自动进行故障转移 (zookeeper集群、ZKFailoverController进程(ZKFC))

    Zookeeper主要作用故障检测和NameNode选举作用 

    1、开启自动故障转移功能

    在centos01上,修改hdfs-site.xml文件,加入如下内容

        <!-- 开启自动故障转移,mycluster为自定义配置的nameservice ID值 -->
        <property>
            <name>dfs.ha.automatic-failover.enabled.mycluster</name>
            <value>true</value>
        </property>

    完整的配置

     1 <configuration>
     2   <property>
     3         <name>dfs.replication</name>
     4         <value>2</value>
     5     </property>
     6     
     7     <!-- mycluster 为自定义的值,下方配置要使用改值 -->
     8     <property>
     9         <name>dfs.nameservices</name>
    10         <value>mycluster</value>
    11     </property>
    12     <!-- 配置两个NameNode的标示符 -->
    13    <property>
    14         <name>dfs.ha.namenodes.mycluster</name>
    15         <value>nn1,nn2</value>
    16     </property>
    17     <!-- 配置两个NameNode 所在节点与访问端口 -->
    18     <property>
    19         <name>dfs.namenode.rpc-address.mycluster.nn1</name>
    20         <value>centos01:8020</value>
    21     </property>
    22     <property>
    23         <name>dfs.namenode.rpc-address.mycluster.nn2</name>
    24         <value>centos02:8020</value>
    25     </property>
    26     <!-- 配置两个NameNode 的web页面访问地址 -->
    27     <property>
    28         <name>dfs.namenode.http-address.mycluster.nn1</name>
    29         <value>centos01:50070</value>
    30     </property>
    31    <property>
    32         <name>dfs.namenode.http-address.mycluster.nn2</name>
    33         <value>centos02:50070</value>
    34     </property>
    35     <!-- 设置一组JournalNode的URL地址 -->
    36     <property>
    37         <name>dfs.namenode.shared.edits.dir</name>
    38         <value>qjournal://centos01:8485;centos02:8485;centos03:8485/mycluster</value>
    39     </property>
    40     <!-- JournalNode用于存放元数据和状态的目录 -->
    41     <property>
    42         <name>dfs.journalnode.edits.dir</name>
    43         <value>/opt/modules/hadoop-2.8.2/tmp/dfs/jn</value>
    44     </property>
    45     <!-- 客户端与NameNode通讯的类 -->
    46     <property>
    47         <name>dfs.client.failover.proxy.provider.mycluster</name>
    48         <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
    49     </property>
    50     <!-- 解决HA集群隔离问题 -->
    51     <property>
    52         <name>dfs.ha.fencing.methods</name>
    53         <value>sshfence</value>
    54     </property>
    55    <!-- 上述ssh通讯使用的密钥文件 -->
    56     <property>
    57         <name>dfs.ha.fencing.ssh.private-key-files</name>
    58        <!-- hadoop为当前用户名 -->
    59         <value>/home/hadoop/.ssh/id_rsa</value>
    60     </property>
    61 
    62     <!-- 开启自动故障转移,mycluster为自定义配置的nameservice ID值 -->
    63     <property>
    64         <name>dfs.ha.automatic-failover.enabled.mycluster</name>
    65         <value>true</value>
    66     </property>
    67 
    68     <!-- 配置sshfence隔离机制超时时间 -->
    69     <property>
    70         <name>dfs.ha.fencing.ssh.connect-timeout</name>
    71         <value>30000</value>
    72     </property>
    73 
    74     <property>
    75         <name>ha.failover-controller.cli-check.rpc-timeout.ms</name>
    76         <value>60000</value>
    77     </property>
    78 
    79     <property>
    80         <name>dfs.permissions.enabled</name>
    81         <value>false</value>
    82     </property>
    83     <property>
    84         <name>dfs.namenode.name.dir</name>
    85         <value>file:/opt/modules/hadoop-2.8.2/tmp/dfs/name</value>
    86     </property>
    87     <property>
    88         <name>dfs.datanode.data.dir</name>
    89         <value>file:/opt/modules/hadoop-2.8.2/tmp/dfs/data</value>
    90     </property>
    91 </configuration>

    2、指定Zookeeper集群

    在centos01节点中,修改core-site.xml文件,加入以下内容

       <!-- 指定zookeeper集群节点以及端口 --> 
       <property>
            <name>ha.zookeeper.quorum</name>
            <value>centos01:2181,centos02:2181,centos03:2181</value>
        </property>

    完整的配置

     1 <configuration>
     2     <property>
     3         <name>fs.defaultFS</name>
     4         <!-- <value>hdfs://centos01:9000</value> -->
     5         <value>hdfs://mycluster</value>
     6     </property>
     7     <property>
     8         <name>hadoop.temp.dir</name>
     9         <value>file:/opt/modules/hadoop-2.8.2/tmp</value>
    10     </property>
    11    <!-- 指定zookeeper集群节点以及端口 --> 
    12    <property>
    13         <name>ha.zookeeper.quorum</name>
    14         <value>centos01:2181,centos02:2181,centos03:2181</value>
    15     </property>
    16      <!-- hadoop链接zookeeper的超时时长设置 -->
    17     <property>
    18         <name>ha.zookeeper.session-timeout.ms</name>
    19         <value>1000</value>
    20         <description>ms</description>
    21     </property>
    22 </configuration>

    3、同步其它节点

    将修改好的hdfs-site.xml、core-site.xml 同步到其它2个节点

    scp /opt/modules/hadoop-2.8.2/etc/hadoop/hdfs-site.xml hadoop@centos02:/opt/modules/hadoop-2.8.2/etc/hadoop/
    scp /opt/modules/hadoop-2.8.2/etc/hadoop/core-site.xml hadoop@centos02:/opt/modules/hadoop-2.8.2/etc/hadoop/
    scp /opt/modules/hadoop-2.8.2/etc/hadoop/hdfs-site.xml hadoop@centos03:/opt/modules/hadoop-2.8.2/etc/hadoop/
    scp /opt/modules/hadoop-2.8.2/etc/hadoop/core-site.xml hadoop@centos03:/opt/modules/hadoop-2.8.2/etc/hadoop/

    4、停止HDFS集群

    进入centos01的hadoop安装目录

    停止hdfs

    sh  sbin/stop-dfs.sh

    5、启动Zookeeper集群

    需要登陆到每个节点启动

    sh /opt/modules/zookeeper-3.4.14/bin/zkServer.sh start

    6、初始化HA在Zookeeper中的状态

    进入centos01节点hadoop安装目录,执行命令创建znode节点,存储自动故障转移数据

    sh bin/hdfs zkfc -formatZK

     

    7、启动HDFS集群

    进入centos01的hadoop安装目录

    启动hdfs

    sh  sbin/start-dfs.sh

    8、启动ZKFC守护进程

    需要手动启动运行NameNode的每个节点ZKFC进程,(centos01,centos02 两个节点上运行了NameNode)

    sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh start zkfc
    停止
    sh /opt/modules/hadoop-2.8.2/sbin/hadoop-daemon.sh stop zkfc

    先启动的NameNode状态为Active

    9、测试HDFS故障自动转移

    查看每个节点进程情况

    上传一个文件测试

    hdfs dfs -mkdir /input
    hdfs dfs -put /opt/modules/hadoop-2.8.2/README.txt /input

     

     测试kill centos02上NameNode

     

    访问测试,centos01状态为active

    ⚠️ 当一个NameNode被kill后,另一个无法自动acitve

    由于dfs.ha.fencing.methods参数的value是sshfence,需要使用的fuser命令;所以通过如下命令安装一下即可,两个namenode节点都需要安装

    执行 fuser显示未找到命令
    fuser

    安装
    sudo yum install -y psmisc 
    [hadoop@centos02 hadoop-2.8.2]$ sudo yum install -y psmisc
    [sudo] hadoop 的密码:
    已加载插件:fastestmirror
    Determining fastest mirrors
     * base: mirrors.njupt.edu.cn
     * extras: mirrors.163.com
     * updates: mirrors.163.com
    base                                                                                                      | 3.6 kB  00:00:00     
    docker-ce-stable                                                                                          | 3.5 kB  00:00:00     
    extras                                                                                                    | 2.9 kB  00:00:00     
    updates                                                                                                   | 2.9 kB  00:00:00     
    (1/3): extras/7/x86_64/primary_db                                                                         | 164 kB  00:00:06     
    (2/3): docker-ce-stable/x86_64/primary_db                                                                 |  41 kB  00:00:07     
    (3/3): updates/7/x86_64/primary_db                                                                        | 6.7 MB  00:00:07     
    正在解决依赖关系
    --> 正在检查事务
    ---> 软件包 psmisc.x86_64.0.22.20-16.el7 将被 安装
    --> 解决依赖关系完成
    
    依赖关系解决
    
    =================================================================================================================================
     Package                      架构                         版本                                 源                          大小
    =================================================================================================================================
    正在安装:
     psmisc                       x86_64                       22.20-16.el7                         base                       141 k
    
    事务概要
    =================================================================================================================================
    安装  1 软件包
    
    总下载量:141 k
    安装大小:475 k
    Downloading packages:
    psmisc-22.20-16.el7.x86_64.rpm                                                                            | 141 kB  00:00:06     
    Running transaction check
    Running transaction test
    Transaction test succeeded
    Running transaction
      正在安装    : psmisc-22.20-16.el7.x86_64                                                                                   1/1 
      验证中      : psmisc-22.20-16.el7.x86_64                                                                                   1/1 
    
    已安装:
      psmisc.x86_64 0:22.20-16.el7                                                                                                   
    
    完毕!
    [hadoop@centos02 hadoop-2.8.2]$ fuser
    未指定进程
    Usage: fuser [-fMuvw] [-a|-s] [-4|-6] [-c|-m|-n SPACE] [-k [-i] [-SIGNAL]] NAME...
           fuser -l
           fuser -V
    Show which processes use the named files, sockets, or filesystems.
    
      -a,--all              display unused files too
      -i,--interactive      ask before killing (ignored without -k)
      -k,--kill             kill processes accessing the named file
      -l,--list-signals     list available signal names
      -m,--mount            show all processes using the named filesystems or block device
      -M,--ismountpoint     fulfill request only if NAME is a mount point
      -n,--namespace SPACE  search in this name space (file, udp, or tcp)
      -s,--silent           silent operation
      -SIGNAL               send this signal instead of SIGKILL
      -u,--user             display user IDs
      -v,--verbose          verbose output
      -w,--writeonly        kill only processes with write access
      -V,--version          display version information
      -4,--ipv4             search IPv4 sockets only
      -6,--ipv6             search IPv6 sockets only
      -                     reset options
    
      udp/tcp names: [local_port][,[rmt_host][,[rmt_port]]]
  • 相关阅读:
    Optimization on content service with local search in cloud of clouds
    译:滑雪租赁问题(ski rental problem)
    计算机专业顶级学术会议
    论文WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression
    FADE:云存储中数据安全删除
    HTML 标签补充
    python mongodb 的调试
    django MultiValueDictKeyError 错误处理
    Using mongo in django to develop web app with python
    使用list和tuple
  • 原文地址:https://www.cnblogs.com/xuchen0117/p/12466245.html
Copyright © 2020-2023  润新知