• (转载)hadoop 滚动升级


    hadoop2.0 支持HA,基于这个功能可进行在线升级不需要停HDFS服务

    注意,滚动升级只支持Hadoop-2.4.0以后的版本

    JNs相对稳定,在大多数情况下升级HDFS时不需要升级,在这里描述的滚动升级过程中,只考虑NNs和DNs,而不考虑JNs和ZKNs

    本次测试是非联邦集群,有kerberos认证(保证配置即可,无需额外调整),hadoop2.7.7升级至hadoop2.8.5


    升级准备检查

    检测当前HDFS服务是否正常

    [hadoop@hadoop001 ~]$ hdfs dfsadmin –report    #查看是否有异常的datanode

    [hadoop@hadoop001 ~]$ hdfs fsck /              #hdfs文件系统是否是健康状态

    ..................................................................................................Status:HEALTHY

     Total size: 20242735151 B (Total open files size: 332 B)

     Total dirs: 821

     Total files: 1198

     Totalsymlinks:      0 (Files currently beingwritten: 5)

     Total blocks(validated): 1122 (avg. block size18041653 B) (Total open file blocks (not validated): 4)

     Minimallyreplicated blocks:  1122 (100.0 %)

     Over-replicated blocks:  0 (0.0 %)

     Under-replicatedblocks: 67 (5.9714794 %)

     Mis-replicated blocks:        0 (0.0 %)

     Defaultreplication factor:   3

     Averageblock replication:    3.0

     Corruptblocks:      0

     Missingreplicas:        469 (12.2294655 %)

     Number ofdata-nodes:         3

     Number ofracks:     1

    FSCK ended at Mon Sep 16 11:09:05 CST 2019 in 91milliseconds

    Namenode主备是否正常

     [hadoop@hadoop001 ~]$  hdfs haadmin -getServiceState  nn1        #主备服务是否正常

    standby

    [hadoop@hadoop001 ~]$  hdfs haadmin -getServiceState  nn2     #主备服务是否正常

    active

    [hadoop@hadoop001 ~]$ ssh hadoop002

    Last login: Thu Sep 5 19:01:30 2019 from 172.16.40.43

    [hadoop@hadoop002 ~]$ hadoop-daemon.sh stop namenode         #主备切换是否正常

    stopping namenode

     [hadoop@hadoop002 ~]$ exit

    logout

    Connection to hadoop002 closed.

    [hadoop@hadoop001 ~]$  hdfs haadmin -getServiceState  nn2     #主备切换是否正常

    19/09/16 11:14:02 INFO ipc.Client: Retrying connectto server: hadoop002:8020. Already tried 0 time(s); retrypolicy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000MILLISECONDS)

    Operation failed: Call From hadoop001 to hadoop002:8020 failed on connectionexception: java.net.ConnectException: Connection refused; For more detailssee: http://wiki.apache.org/hadoop/ConnectionRefused

    [hadoop@hadoop001 ~]$  hdfs haadmin -getServiceState  nn1     #主备切换是否正常

    Active

    元数据备份

    namenode主备节点元数据备份包括journalnode编辑日志,非必需步骤以防万一回退或升级失败!

    升级准备,hdfs需离开safe模式,如果在安全模式不能手动退出,一遍hdfs文件系统检查完毕自动退出!

    升级准备

    [hadoop@hadoop001 ~]$  hdfs dfsadmin -rollingUpgrade prepare

    PREPARE rolling upgrade ...

    Preparing for upgrade. Data is being saved forrollback.

    Run "dfsadmin -rollingUpgrade query" tocheck the status

    for proceeding with rolling upgrade

      Block PoolID: BP-686481837-192.168.40.42-1563178388776

         StartTime: Mon Sep 16 11:38:36 CST 2019 (=1568605116802)

      FinalizeTime:

    #此时元数据目录下有回滚镜像文件

    -rw-r--r-- 1 hadoop users  155196 Sep 16 11:38fsimage_rollback_0000000000002953069

    -rw-r--r-- 1 hadoop users      71 Sep 16 11:38fsimage_rollback_0000000000002953069.md5

    升级检查当前hdfs是否处于升级状态,如果不是下方显示请暂停处理。

    [hadoop@hadoop001 ~]$  hdfs dfsadmin -rollingUpgrade query

    QUERY rolling upgrade ...

    There is no rolling upgrade in progress or rollingupgrade has already been finalized.

    升级namenode

    #升级主备节点

    停服务

    [hadoop@hadoop01 ~]$ hadoop-daemon.sh stop namenode

    #停服务替换hadoop安装目录,同步配置文件,如果有ranger重新执行enabled的脚本,注意观察另一节点是否切换为active状态

    替换高版本的安装包

    [hadoop@hadoop001 core]$ mv hadoop/ hadoop-2.7.7

    [hadoop@hadoop01 core]$ scp -r hadoop

    hadoop@192.168.40.41:$PWD #测试环境是将其他集群的2.8.5的包直接scp过来,正常是将hadoop-2.8.5.tar.gz解压至该目录

    替换配置文件

    [hadoop@hadoop01 etc]$ pwd /opt/beh/core/hadoop/etc

    [hadoop@hadoop01 etc]$ mv hadoop/ hadoop-2.8.5

    [hadoop@hadoop01 etc]$ cp -r/opt/beh/core/hadoop-2.7.7/etc/hadoop/ .

    注意如果有journalnode服务首先重启journalnode,zkfc,无需升级;启动后观察日志是否有异常。

    [hadoop@hadoop001 hadoop]$ hadoop-daemon.sh stop journalnode

    stopping journalnode

    [hadoop@hadoop001 hadoop]$ hadoop-daemon.sh start journalnode

    starting journalnode, logging to/opt/beh/logs/hadoop/hadoop-hadoop-journalnode-hadoop001.out

    [hadoop@hadoop001 hadoop]$ hadoop-daemon.sh stop zkfc

    stopping zkfc

    [hadoop@hadoop001 hadoop]$ hadoop-daemon.sh start zkfc

    starting zkfc, logging to/opt/beh/logs/hadoop/hadoop-hadoop-zkfc-hadoop001.out

    [hadoop@hadoop001 hadoop]$

    升级namenode

    [hadoop@hadoop001 hadoop]$ hdfs namenode -rollingUpgrade started #运行至退出安全模式

    The reported blocks 3200 has reached the threshold1.0000 of total blocks 3200. The number of live datanodes 9 has reached theminimum number 0. In safe mode extension. Safe mode will be turned offautomatically in 9 seconds.

    19/09/11 16:43:55 INFO hdfs.StateChange: STATE*Leaving safe mode after 33 secs

    19/09/11 16:43:55 INFO hdfs.StateChange:STATE* Safe mode is OFF

    19/09/11 16:43:55 INFO hdfs.StateChange: STATE*Network topology has 1 racks and 9 datanodes

    19/09/11 16:43:55 INFO hdfs.StateChange: STATE*UnderReplicatedBlocks has 0 blocks

    Ctrl+c停止前台进程,启动namenode

    [hadoop@hadoop001 current]$ hadoop-daemon.sh start namenode

    查询滚动升级状态

    [hadoop@hadoop001 hadoop]$  hdfs dfsadmin -rollingUpgrade query

    QUERY rolling upgrade ...

    Proceed with rolling upgrade:

      Block PoolID: BP-686481837-192.168.40.42-1563178388776

         StartTime: Mon Sep 16 11:38:36 CST 2019 (=1568605116802)

      FinalizeTime:

    升级后namenoder启动日志有如下信息:

    2019-09-11 16:56:38,972 INFOorg.apache.hadoop.hdfs.server.namenode.NameNode: Reported DataNode version'2.7.7' of DN DatanodeRegistration(0.0.0.0:50010,datanodeUuid=eb9a5cc2-e1e0-4d65-98c2-596a39336f36, infoPort=0,infoSecurePort=50475, ipcPort=50020, storageInfo=lv=-56;cid=CID-1f4dc3a9-7d17-46f7-9a0f-02578c683842;nsid=1047123487;c=0)does not match NameNode version '2.8.5'. Note: This is normal during a rollingupgrade.

    #升级另一主节点,重复上述操作,注意替换安装目录后先重启journalnode

    [hadoop@hadoop002 core]$ mv hadoop/ hadoop-2.7.7

    [hadoop@hadoop001 core]$ scp -r hadoophadoop@hadoop002:$PWD

    [hadoop@hadoop002 hadoop]$ hadoop-daemon.sh stop journalnode

    [hadoop@hadoop002 hadoop]$ hadoop-daemon.sh start journalnode

    [hadoop@hadoop002 hadoop]$ hadoop-daemon.sh stop zkfc

    [hadoop@hadoop002 hadoop]$ hadoop-daemon.sh start zkfc

    [hadoop@hadoop002 current]$ hadoop-daemon.sh stop namenode

    [hadoop@hadoop002 hadoop]$ hdfs haadmin –getAllServiceState #检测namenode是否正常切换

    hadoop001:8020                            active   

    19/09/16 17:54:02 INFO ipc.Client: Retrying connectto server: hadoop002:8020. Already tried 0 time(s); retrypolicy is RetryUpToMaximumCountWithFixedSleep(maxRetries=1, sleepTime=1000MILLISECONDS)

    [hadoop@hadoop002 hadoop]$ hdfs namenode -rollingUpgradestarted

    [hadoop@hadoop002 current]$ hadoop-daemon.sh start namenode

    升级datanode

    升级datanode,先替换hadoop目录换成高版本的,保留配置文件也就是替换$HADOOP_HOME/etc/hadoop目录,然后执行如下操作

    [hadoop@hadoop002 core]$ hdfs dfsadmin -shutdownDatanode hadoop003:50020 upgrade

    active namenode日志输出:

    2019-09-11 17:09:04,216 INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:updatePipeline(blk_1073766114_25448, newGS=25470, newLength=83,newNodes=[192.168.40.15:50010, 192.168.40.14:50010, 192.168.40.21:50010],client=DFSClient_NONMAPREDUCE_1761359517_14)

    2019-09-11 17:09:04,217 INFOorg.apache.hadoop.hdfs.server.namenode.FSNamesystem:updatePipeline(blk_1073766114_25448 => blk_1073766114_25470) success

    datanode升级完毕会关闭服务

    [hadoop@hadoop01 ~]$ hdfs dfsadmin -getDatanodeInfo hadoop003:50020 #检测datanode是否shutdown,也就是出现链接异常信息

    到hadoop003启动datanode

    [hadoop@hadoop003 ~]$ hadoop-daemon.sh start datanode

    VERSION文件发生改变升级前

    [hadoop@hadoop003 current]$ more VERSION

    #Tue Aug 27 16:27:14 CST 2019

    storageID=DS-824be616-eee5-4954-88a4-c752de40e7e2

    clusterID=CID-1f4dc3a9-7d17-46f7-9a0f-02578c683842

    cTime=0

    datanodeUuid=ebf453db-e511-4c50-a76c-f87fd83db864

    storageType=DATA_NODE

    layoutVersion=-56

    VERSION文件发生改变升级后

    [hadoop@hadoop03 current]$ more VERSION

    #Wed Sep 11 17:25:31 CST 2019

    storageID=DS-824be616-eee5-4954-88a4-c752de40e7e2

    clusterID=CID-1f4dc3a9-7d17-46f7-9a0f-02578c683842

    cTime=0

    datanodeUuid=ebf453db-e511-4c50-a76c-f87fd83db864

    storageType=DATA_NODE

    layoutVersion=-57

    重复上述步骤,直到更新集群中的所有数据节点运行完毕。

    YARN服务启动注意事项

    kerberos情况下注意非kerberos请忽略,修改下面文件属性并重启yarn(注意hadoop家目录的权限755,750租户无法提交yarn任务)

    chown root:hadoop /opt/hadoop/bin/container-executor

    chmod 6050 /opt/hadoop/bin/container-executor

    分发spark-shuffle的jar包到集群所有节点,支持hive on spark

    scp spark-2.0.0-yarn-shuffle.jarhadoop@hadoop01:/opt/hadoop/share/hadoop/yarn/lib

    重启集群的yarn服务

    完成滚动升级

    执行了finalize ,namenode主备节点元数据目录的回滚元数据镜像就会被删除,就不能回滚到之前的版本,建议集群运行一个周期执行下面操作。

    [hadoop@hadoop01 hadoop]$ hdfs dfsadmin-rollingUpgrade finalize

    FINALIZE rolling upgrade ...

    Rolling upgrade is finalized.

    Block Pool ID: BP-261222913-172.16.13.12-1564812628651

    Start Time: Wed Sep11 15:41:21 CST 2019 (=1568187681874)

    Finalize Time: Wed Sep 11 17:54:22 CST2019 (=1568195662530)

    datanode升级脚本

    #!/bin/bash

    CORE_HOME=/opt/beh/core

    hosts=`cat ~/datanode`

    for host in $hosts

    do

            ssh hadoop@$host "source ~/.bashrc;

                            echo ------------------------------------------------------------;

                            jps;

                            mv $CORE_HOME/hadoop $CORE_HOME/hadoop-2.7.7"

            scp -r ~/hadoop hadoop@$host:$CORE_HOME

            ssh hadoop@$host "rm -r $CORE_HOME/hadoop/etc/hadoop;

                            cp -r $CORE_HOME/hadoop-2.7.7/etc/hadoop $CORE_HOME/hadoop/etc ;

                            sudo chown root:hadoop /opt/beh/core/hadoop/bin/container-executor;

                            sudo chmod 6050 /opt/beh/core/hadoop/bin/container-executor ;"

            echo " hdfs dfsadmin -shutdownDatanode $host:50020 upgrade"

            echo " hdfs dfsadmin -getDatanodeInfo $host:50020"

            if (whiptail --title "exec update line" --yesno " hdfs dfsadmin -shutdownDatanode $host:50020 upgrade" 10 60)then

                    ssh hadoop@$host "source ~/.bashrc;

                            hadoop-daemon.sh start datanode;yarn-daemon.sh stop nodemanager;yarn-daemon.sh start nodemanager;

                            echo ------------------------------------------------------------;

                            jps"

            else

                    echo "no update"

            fi

    done

    参考

    http://hadoop.apache.org/docs/r2.8.5/hadoop-project-dist/hadoop-hdfs/HdfsRollingUpgrade.html

    转载于:https://www.jianshu.com/p/aac25a2e6f85

  • 相关阅读:
    CSS快速入门
    Kafka (一) 核心概念
    软件工程模型
    函数式编程
    spark计算操作整理
    HBase 文件合并
    HBase 数据存储结构
    目的论浅谈
    PHP8的注解
    JS的移入移除
  • 原文地址:https://www.cnblogs.com/yjt1993/p/12346462.html
Copyright © 2020-2023  润新知