当前集群
主机名称 | IP地址 | 角色 | 统一安装目录 | 统一安装用户 |
sht-sgmhadoopnn-01 | 172.16.101.55 | namenode,resourcemanager |
/usr/local/hadoop(软连接) /usr/local/hadoop-2.7.4 /usr/local/zookeeper(软连接) /usr/local/zookeeper-3.4.9 |
root |
sht-sgmhadoopnn-02 | 172.16.101.56 | namenode,resourcemanager | ||
sht-sgmhadoopdn-01 | 172.16.101.58 | datanode,nodemanager,journalnode,zookeeper | ||
sht-sgmhadoopdn-02 | 172.16.101.59 | datanode,nodemanager,journalnode,zookeeper | ||
sht-sgmhadoopdn-03 | 172.16.101.60 | datanode,nodemanager,journalnode,zookeeper |
集群部署完成后增加datanode sht-sgmhadoopdn-04
部署参考 https://www.cnblogs.com/ilifeilong/p/10610993.html
1. 新datanode节点按照全新安装方式配置ssh无密码登录、系统变量、主机名解析、等
2.在namenode active节点sht-sgmhadoopnn-01修改配置文件
1)slaves
添加主机名sht-sgmhadoopdn-04至slaves文件
2)hdfs-site.xml
将dfs.replication参数值修改为4
3. 在namenode active节点sht-sgmhadoopnn-01将以上两个新修改的文件rsync到集群其他节点
# rsync -az --progress hdfs-site.xml root@172.16.101.56:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress hdfs-site.xml root@172.16.101.58:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress hdfs-site.xml root@172.16.101.59:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress hdfs-site.xml root@172.16.101.60:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress hdfs-site.xml root@172.16.101.66:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress slaves root@172.16.101.56:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress slaves root@172.16.101.58:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress slaves root@172.16.101.59:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress slaves root@172.16.101.60:/usr/local/hadoop/etc/hadoop/ # rsync -az --progress slaves root@172.16.101.66:/usr/local/hadoop/etc/hadoop/
4. 在namenode active节点sht-sgmhadoopnn-01将hadoop目录同步到新节点
# rsync -az --progress --exclude=data --exclude=logs /usr/local/hadoop-2.7.4 root@sht-sgmhadoopdn-04:/usr/local/
5. 在新节点上启动datanode和nodemanager角色
# hadoop-daemon.sh start datanode # yarn-daemon.sh start nodemanager
6. 在namenode和resourcemanager 的active节点或standby节点的WEB界面验证
http://172.16.101.55:50070/dfshealth.html#tab-datanode
http://172.16.101.55:8088/cluster/nodes
7.重新均衡集群datanode数据(建议在standby namenode节点操作)
# hdfs balancer -threshold 1
输出log
# hdfs balancer -threshold 1 19/03/29 23:59:21 INFO balancer.Balancer: Using a threshold of 1.0 19/03/29 23:59:21 INFO balancer.Balancer: namenodes = [hdfs://mycluster] 19/03/29 23:59:21 INFO balancer.Balancer: parameters = Balancer.Parameters [BalancingPolicy.Node, threshold = 1.0, max idle iteration = 5, number of nodes to be excluded = 0, number of nodes to be included = 0, run during upgrade = false] Time Stamp Iteration# Bytes Already Moved Bytes Left To Move Bytes Being Moved 19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000) 19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000) 19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200) 19/03/29 23:59:24 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5) 19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648) 19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760) 19/03/29 23:59:24 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010 19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010 19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010 19/03/29 23:59:24 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010 19/03/29 23:59:24 INFO balancer.Balancer: 0 over-utilized: [] 19/03/29 23:59:24 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK] 19/03/29 23:59:24 INFO balancer.Balancer: Need to move 1.10 GB to make the cluster balanced. 19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 19/03/29 23:59:24 INFO balancer.Balancer: Decided to move 635.63 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK 19/03/29 23:59:24 INFO balancer.Balancer: Decided to move 147.43 MB bytes from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK 19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 19/03/29 23:59:24 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 19/03/29 23:59:24 INFO balancer.Balancer: Will move 783.06 MB in this iteration 19/03/29 23:59:24 INFO balancer.Dispatcher: Limiting threads per target to the specified max. 19/03/29 23:59:24 INFO balancer.Dispatcher: Allocating 5 threads per target. 19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741846_1022 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741838_1014 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741845_1021 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/29 23:59:24 INFO balancer.Dispatcher: Start moving blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/29 23:59:52 INFO balancer.Dispatcher: Successfully moved blk_1073741838_1014 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/29 23:59:52 INFO balancer.Dispatcher: Start moving blk_1073741836_1012 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:00:14 INFO balancer.Dispatcher: Successfully moved blk_1073741836_1012 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:00:14 INFO balancer.Dispatcher: Start moving blk_1073741835_1011 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:00:38 INFO balancer.Dispatcher: Successfully moved blk_1073741835_1011 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:01:44 WARN balancer.Dispatcher: Failed to move blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741837_1013 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22240 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:01:44 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:01:44 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:02:07 WARN balancer.Dispatcher: Failed to move blk_1073741845_1021 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741845_1021 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22238 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:02:07 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:02:07 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:02:11 WARN balancer.Dispatcher: Failed to move blk_1073741839_1015 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22232 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:02:11 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:02:11 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:02:35 WARN balancer.Dispatcher: Failed to move blk_1073741846_1022 with size=134217728 from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741846_1022 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22234 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:02:35 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:02:35 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds Mar 30, 2019 12:02:36 AM 0 384 MB 1.10 GB 783.06 MB 19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000) 19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000) 19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200) 19/03/30 00:02:41 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5) 19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648) 19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760) 19/03/30 00:02:41 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010 19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010 19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010 19/03/30 00:02:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010 19/03/30 00:02:41 INFO balancer.Balancer: 0 over-utilized: [] 19/03/30 00:02:41 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK] 19/03/30 00:02:41 INFO balancer.Balancer: Need to move 833.58 MB to make the cluster balanced. 19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 19/03/30 00:02:41 INFO balancer.Balancer: Decided to move 538.88 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:02:41 INFO balancer.Balancer: Decided to move 244.18 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 19/03/30 00:02:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 19/03/30 00:02:41 INFO balancer.Balancer: Will move 783.06 MB in this iteration 19/03/30 00:02:41 INFO balancer.Dispatcher: Limiting threads per target to the specified max. 19/03/30 00:02:41 INFO balancer.Dispatcher: Allocating 5 threads per target. 19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741842_1018 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741841_1017 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741840_1016 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.io.IOException: Got error, status message Not able to copy block 1073741834 to /172.16.101.66:22256 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 from /172.16.101.58:50010, block move is failed 19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:02:41 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741841_1017 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741841_1017 received exception java.io.IOException: Got error, status message Not able to copy block 1073741841 to /172.16.101.66:22258 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741841_1017 from /172.16.101.58:50010, block move is failed 19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741840_1016 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741840_1016 received exception java.io.IOException: Got error, status message Not able to copy block 1073741840 to /172.16.101.66:22260 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741840_1016 from /172.16.101.58:50010, block move is failed 19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:02:41 WARN balancer.Dispatcher: Failed to move blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 received exception java.io.IOException: Got error, status message Not able to copy block 1073741839 to /172.16.101.66:22262 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741839_1015 from /172.16.101.58:50010, block move is failed 19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:02:41 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:02:58 INFO balancer.Dispatcher: Successfully moved blk_1073741842_1018 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:02:58 INFO balancer.Dispatcher: Successfully moved blk_1073741837_1013 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 Mar 30, 2019 12:02:58 AM 1 640 MB 833.58 MB 783.06 MB 19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000) 19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000) 19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200) 19/03/30 00:03:03 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5) 19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648) 19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760) 19/03/30 00:03:03 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010 19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010 19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010 19/03/30 00:03:03 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010 19/03/30 00:03:03 INFO balancer.Balancer: 0 over-utilized: [] 19/03/30 00:03:03 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK] 19/03/30 00:03:03 INFO balancer.Balancer: Need to move 640.08 MB to make the cluster balanced. 19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 19/03/30 00:03:03 INFO balancer.Balancer: Decided to move 474.38 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:03:03 INFO balancer.Balancer: Decided to move 308.67 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 19/03/30 00:03:03 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 19/03/30 00:03:03 INFO balancer.Balancer: Will move 783.06 MB in this iteration 19/03/30 00:03:03 INFO balancer.Dispatcher: Limiting threads per target to the specified max. 19/03/30 00:03:03 INFO balancer.Dispatcher: Allocating 5 threads per target. 19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741833_1009 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:03:03 WARN balancer.Dispatcher: Failed to move blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741828_1004 received exception java.io.IOException: Got error, status message Not able to copy block 1073741828 to /172.16.101.66:22272 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741828_1004 from /172.16.101.58:50010, block move is failed 19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:03:03 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:03:03 WARN balancer.Dispatcher: Failed to move blk_1073741826_1002 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 received exception java.io.IOException: Got error, status message Not able to copy block 1073741826 to /172.16.101.66:22274 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 from /172.16.101.58:50010, block move is failed 19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:03:03 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:03:47 INFO balancer.Dispatcher: Successfully moved blk_1073741833_1009 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:05:12 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22266 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:05:12 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:05:12 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:05:36 WARN balancer.Dispatcher: Failed to move blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741827_1003 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22270 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:05:36 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:05:36 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:06:11 WARN balancer.Dispatcher: Failed to move blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741832_1008 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22268 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:06:11 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:06:11 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds Mar 30, 2019 12:06:11 AM 2 768 MB 640.08 MB 783.06 MB 19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000) 19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000) 19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200) 19/03/30 00:06:16 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5) 19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648) 19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760) 19/03/30 00:06:16 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010 19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010 19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010 19/03/30 00:06:16 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010 19/03/30 00:06:16 INFO balancer.Balancer: 0 over-utilized: [] 19/03/30 00:06:16 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK] 19/03/30 00:06:16 INFO balancer.Balancer: Need to move 458.28 MB to make the cluster balanced. 19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 19/03/30 00:06:16 INFO balancer.Balancer: Decided to move 413.78 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:06:16 INFO balancer.Balancer: Decided to move 369.28 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 19/03/30 00:06:16 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 19/03/30 00:06:16 INFO balancer.Balancer: Will move 783.06 MB in this iteration 19/03/30 00:06:16 INFO balancer.Dispatcher: Limiting threads per target to the specified max. 19/03/30 00:06:16 INFO balancer.Dispatcher: Allocating 5 threads per target. 19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:06:16 WARN balancer.Dispatcher: Failed to move blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 received exception java.io.IOException: Got error, status message Not able to copy block 1073741834 to /172.16.101.66:22284 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741834_1010 from /172.16.101.58:50010, block move is failed 19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:06:16 INFO balancer.Dispatcher: Start moving blk_1073741825_1001 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:06:16 WARN balancer.Dispatcher: Failed to move blk_1073741825_1001 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741825_1001 received exception java.io.IOException: Got error, status message Not able to copy block 1073741825 to /172.16.101.66:22286 because threads quota is exceeded., copy block BP-698223843-172.16.101.55-1553701973789:blk_1073741825_1001 from /172.16.101.58:50010, block move is failed 19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:06:16 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds 19/03/30 00:06:19 INFO balancer.Dispatcher: Successfully moved blk_1073741828_1004 with size=21901927 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:06:49 INFO balancer.Dispatcher: Successfully moved blk_1073741832_1008 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:06:53 INFO balancer.Dispatcher: Successfully moved blk_1073741827_1003 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:08:36 WARN balancer.Dispatcher: Failed to move blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741826_1002 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22280 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:08:36 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:08:36 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds Mar 30, 2019 12:08:36 AM 3 1.02 GB 458.28 MB 783.06 MB 19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000) 19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000) 19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200) 19/03/30 00:08:41 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5) 19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648) 19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760) 19/03/30 00:08:41 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010 19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010 19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010 19/03/30 00:08:41 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010 19/03/30 00:08:41 INFO balancer.Balancer: 0 over-utilized: [] 19/03/30 00:08:41 INFO balancer.Balancer: 1 underutilized: [172.16.101.66:50010:DISK] 19/03/30 00:08:41 INFO balancer.Balancer: Need to move 248.99 MB to make the cluster balanced. 19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => underUtilized 19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: overUtilized => belowAvgUtilized 19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for SAME_RACK: underUtilized => aboveAvgUtilized 19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 344.02 MB bytes from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 344.02 MB bytes from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:08:41 INFO balancer.Balancer: Decided to move 95.03 MB bytes from 172.16.101.60:50010:DISK to 172.16.101.66:50010:DISK 19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => underUtilized 19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: overUtilized => belowAvgUtilized 19/03/30 00:08:41 INFO balancer.Balancer: chooseStorageGroups for ANY_OTHER: underUtilized => aboveAvgUtilized 19/03/30 00:08:41 INFO balancer.Balancer: Will move 783.06 MB in this iteration 19/03/30 00:08:41 INFO balancer.Dispatcher: Limiting threads per target to the specified max. 19/03/30 00:08:41 INFO balancer.Dispatcher: Allocating 5 threads per target. 19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741825_1001 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:08:41 INFO balancer.Dispatcher: Start moving blk_1073741848_1024 with size=73209856 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:09:35 INFO balancer.Dispatcher: Successfully moved blk_1073741848_1024 with size=73209856 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:09:35 INFO balancer.Dispatcher: Start moving blk_1073741847_1023 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:09:40 INFO balancer.Dispatcher: Successfully moved blk_1073741839_1015 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:09:41 INFO balancer.Dispatcher: Successfully moved blk_1073741826_1002 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:09:57 INFO balancer.Dispatcher: Successfully moved blk_1073741825_1001 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:09:57 INFO balancer.Dispatcher: Successfully moved blk_1073741834_1010 with size=134217728 from 172.16.101.59:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010 19/03/30 00:12:28 WARN balancer.Dispatcher: Failed to move blk_1073741847_1023 with size=134217728 from 172.16.101.58:50010:DISK to 172.16.101.66:50010:DISK through 172.16.101.58:50010: Got error, status message opReplaceBlock BP-698223843-172.16.101.55-1553701973789:blk_1073741847_1023 received exception java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/172.16.101.66:22298 remote=/172.16.101.58:50010], block move is failed 19/03/30 00:12:28 INFO balancer.Dispatcher: DDatanode:172.16.101.58:50010 activateDelay 10.0 seconds 19/03/30 00:12:28 INFO balancer.Dispatcher: DDatanode:172.16.101.66:50010 activateDelay 10.0 seconds Mar 30, 2019 12:12:28 AM 4 1.59 GB 248.99 MB 783.06 MB 19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.movedWinWidth = 5400000 (default=5400000) 19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.moverThreads = 1000 (default=1000) 19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.dispatcherThreads = 200 (default=200) 19/03/30 00:12:33 INFO balancer.Balancer: dfs.datanode.balance.max.concurrent.moves = 5 (default=5) 19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.getBlocks.size = 2147483648 (default=2147483648) 19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.getBlocks.min-block-size = 10485760 (default=10485760) 19/03/30 00:12:33 INFO balancer.Balancer: dfs.balancer.max-size-to-move = 10737418240 (default=10737418240) 19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.59:50010 19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.60:50010 19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.58:50010 19/03/30 00:12:33 INFO net.NetworkTopology: Adding a new node: /default-rack/172.16.101.66:50010 19/03/30 00:12:33 INFO balancer.Balancer: 0 over-utilized: [] 19/03/30 00:12:33 INFO balancer.Balancer: 0 underutilized: [] The cluster is balanced. Exiting... Mar 30, 2019 12:12:33 AM 5 1.59 GB 0 B -1 B Mar 30, 2019 12:12:34 AM Balancing took 13.216533333333333 minutes
再次查看hdfs集群负载
8. 修改hdfs集群中现有文件/目录的副本因子
现有的文件的备份系数仍是原来的值,hadoop并不会自动的按照新的备份系数调整,我们需要手动完成。
hdfs dfs -setrep -R -w 4 /
输出log
Replication 4 set: /CentOS-6.8-x86_64-bin-DVD2.iso Replication 4 set: /hadoop-2.8.1.tar.gz Replication 4 set: /slaves Waiting for /CentOS-6.8-x86_64-bin-DVD2.iso ..................... done Waiting for /hadoop-2.8.1.tar.gz ... done Waiting for /slaves ... done
通过命令查看
# hdfs fsck / Connecting to namenode via http://sht-sgmhadoopnn-01:50070/fsck?ugi=root&path=%2F FSCK started by root (auth:SIMPLE) from /172.16.101.55 for path / at Sat Mar 30 00:22:54 CST 2019 ...Status: HEALTHY Total size: 2645248691 B Total dirs: 2 Total files: 3 Total symlinks: 0 Total blocks (validated): 22 (avg. block size 120238576 B) Minimally replicated blocks: 22 (100.0 %) Over-replicated blocks: 0 (0.0 %) Under-replicated blocks: 0 (0.0 %) Mis-replicated blocks: 0 (0.0 %) Default replication factor: 3 Average block replication: 4.0 Corrupt blocks: 0 Missing replicas: 0 (0.0 %) Number of data-nodes: 4 Number of racks: 1 FSCK ended at Sat Mar 30 00:22:54 CST 2019 in 2 milliseconds The filesystem under path '/' is HEALTHY
以上步骤在不重启hdfs集群下动态添加datanode节点 ,仍然建议在适当时重启hdfs集群。