• Cloudera Certified Associate Administrator案例之Manage篇


          Cloudera Certified Associate Administrator案例之Manage

                                          作者:尹正杰

    版权声明:原创作品,谢绝转载!否则将追究法律责任。

     

    一.下载Namenode镜像文件

    问题描述:
      公司集群的Namenode今天发生了故障,你想通过分析Fsimage文件来排查问题。你需要下载最新的fsimage文件,命名为"timestamp_xxx",其中xxx为以秒为单位的Unix时间戳,代表你操作时的当前时间,并上传到HDFS的/yinzhengjie/debug/hdfs/log/目录下。
    
    解决方案:
      这里涉及到hdfs命令的dfsadmin,dfs指令,以及基本Linux命令的使用。这些知识我们尽量不要查官方文档或者简单看一下命令的help输出就能操作。

    1>.下载镜像文件

    [root@node101.yinzhengjie.org.cn ~]# ll
    total 0
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# hdfs dfsadmin -fetchImage ./            #你得确保HDFS集群时正常运行的,否则下载会失败哟~
    19/06/15 15:27:57 INFO namenode.TransferFsImage: Opening connection to http://node101.yinzhengjie.org.cn:50070/imagetransfer?getimage=1&txid=latest
    19/06/15 15:27:57 INFO namenode.TransferFsImage: Image Transfer timeout configured to 60000 milliseconds
    19/06/15 15:27:57 INFO namenode.TransferFsImage: Transfer took 0.02s at 3263.16 KB/s
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# ll
    total 64
    -rw-r--r-- 1 root root 64384 Jun 15 15:27 fsimage_0000000000000004578
    [root@node101.yinzhengjie.org.cn ~]# 

    2>.将镜像文件进行重命名操作

    [root@node101.yinzhengjie.org.cn ~]# ll
    total 64
    -rw-r--r-- 1 root root 64384 Jun 15 15:27 fsimage_0000000000000004578
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# mv fsimage_0000000000000004578 timestamp_`date +%s`
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# ll
    total 64
    -rw-r--r-- 1 root root 64384 Jun 15 15:27 timestamp_1560583829
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# 

    3>.如果不存在目录就得手动创建hdfs上的路径

    [root@node101.yinzhengjie.org.cn ~]# su hdfs        #由于HDFS默认开启了sample认证功能,因此我们要切换用户,否则会抛异常"Permission denied"
    [hdfs@node101.yinzhengjie.org.cn /root]$ 
    [hdfs@node101.yinzhengjie.org.cn /root]$ hdfs dfs -mkdir -p /yinzhengjie/debug/hdfs/log
    [hdfs@node101.yinzhengjie.org.cn /root]$  
    [hdfs@node101.yinzhengjie.org.cn /root]$ hdfs dfs -chmod 777 /yinzhengjie/debug/hdfs/log/
    [hdfs@node101.yinzhengjie.org.cn /root]$ 
    [hdfs@node101.yinzhengjie.org.cn /root]$ exit
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# 

    4>.将日志上传到hdfs上

    [root@node101.yinzhengjie.org.cn ~]# ll
    total 64
    -rw-r--r-- 1 root root 64384 Jun 15 15:27 timestamp_1560583829
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# hdfs dfs -copyFromLocal timestamp_1560583829 /yinzhengjie/debug/hdfs/log/
    [root@node101.yinzhengjie.org.cn ~]# 
    [root@node101.yinzhengjie.org.cn ~]# hdfs dfs -ls /yinzhengjie/debug/hdfs/log/
    Found 1 items
    -rw-r--r--   3 root supergroup      64384 2019-06-15 15:35 /yinzhengjie/debug/hdfs/log/timestamp_1560583829
    [root@node101.yinzhengjie.org.cn ~]# 

    二.手动均衡DataNode数据

    问题描述:
      公司的集群新扩充了一批工作节点,但是新的工作节点上没有数据,造成整个集群数据分布不均衡。
      你知道HDFS的balancer功能可以解决这个问题。请将balancer操作占用的带宽限制为1G以内,并以阈值5启动balancer操作。 解决方案:
      如果面试官问你这个问题那基本上就是送分题,我们只需要执行balancer即可。

    1>.点击"HDFS"

    2>.点击配置,搜索关键字"dfs.datanode.balance.bandwidth"

    3>.将每个 DataNode 可用于平衡的最大带宽为1GB

    4>.搜索关键字"重新平衡阈值"(或搜索英文"Threshold")

    5>.修改重新平衡阈值为5

    三.调小HDFS的副本数(将副本数为3的改为副本数为2

    问题描述:
      你发现公司集群的HDFS集群总容量使用已经超过了80%,使用了默认的三个副本,现在想要将某个目录较大的文件副本数从3个副本改为2个副本,从而节省一定的容量。
    
    解决方案:
      如果遇到面试官问你这样的问题,那么恭喜你又是一道送分题。

    1>.上传文件到HDFS集群中

    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ ll /yinzhengjie/softwares/jdk1.8.0_201/
    total 25980
    drwxr-xr-x 2 10 143     4096 Dec 16 03:45 bin
    -r--r--r-- 1 10 143     3244 Dec 16 03:45 COPYRIGHT
    drwxr-xr-x 3 10 143      132 Dec 16 03:45 include
    -rw-r--r-- 1 10 143  5207434 Dec 12  2018 javafx-src.zip
    drwxr-xr-x 5 10 143      185 Dec 16 03:45 jre
    drwxr-xr-x 5 10 143      245 Dec 16 03:45 lib
    -r--r--r-- 1 10 143       40 Dec 16 03:45 LICENSE
    drwxr-xr-x 4 10 143       47 Dec 16 03:45 man
    -r--r--r-- 1 10 143      159 Dec 16 03:45 README.html
    -rw-r--r-- 1 10 143      424 Dec 16 03:45 release
    -rw-r--r-- 1 10 143 21103945 Dec 16 03:45 src.zip
    -rw-r--r-- 1 10 143   108109 Dec 12  2018 THIRDPARTYLICENSEREADME-JAVAFX.txt
    -r--r--r-- 1 10 143   155002 Dec 16 03:45 THIRDPARTYLICENSEREADME.txt
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir /yinzhengjie/data
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -put /yinzhengjie/softwares/jdk1.8.0_201/* /yinzhengjie/data/
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -put /yinzhengjie/softwares/jdk1.8.0_201/* /yinzhengjie/data/
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
    Found 13 items
    -rw-r--r--   3 hdfs supergroup      3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT
    -rw-r--r--   3 hdfs supergroup         40 2019-06-15 18:11 /yinzhengjie/data/LICENSE
    -rw-r--r--   3 hdfs supergroup        159 2019-06-15 18:11 /yinzhengjie/data/README.html
    -rw-r--r--   3 hdfs supergroup    105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt
    -rw-r--r--   3 hdfs supergroup    151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/bin
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/include
    -rw-r--r--   3 hdfs supergroup      5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/jre
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/lib
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/man
    -rw-r--r--   3 hdfs supergroup        424 2019-06-15 18:11 /yinzhengjie/data/release
    -rw-r--r--   3 hdfs supergroup     20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
    Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata
    FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:20:48 CST 2019
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ...................................Status: HEALTHY
     Total size:    397764951 B
     Total dirs:    205
     Total files:    1635
     Total symlinks:        0
     Total blocks (validated):    1614 (avg. block size 246446 B)
     Minimally replicated blocks:    1614 (100.0 %)
     Over-replicated blocks:    0 (0.0 %)
     Under-replicated blocks:    0 (0.0 %)
     Mis-replicated blocks:        0 (0.0 %)
     Default replication factor:    3
     Average block replication:    3.0        #很显然,当前目录的文件副本书为3
     Corrupt blocks:        0
     Missing replicas:        0 (0.0 %)
     Number of data-nodes:        4
     Number of racks:        1
    FSCK ended at Sat Jun 15 18:20:48 CST 2019 in 78 milliseconds
    
    
    The filesystem under path '/yinzhengjie/data' is HEALTHY
    [hdfs@node101.yinzhengjie.org.cn ~]$ 

    2>.将HDFS一个目录的文件副本数改为2

    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 2 -R -w /yinzhengjie/data/
    ......
    Replication 2 set: /yinzhengjie/data/man/man1/javadoc.1
    Replication 2 set: /yinzhengjie/data/man/man1/javafxpackager.1
    Replication 2 set: /yinzhengjie/data/man/man1/javah.1
    Replication 2 set: /yinzhengjie/data/man/man1/javap.1
    Replication 2 set: /yinzhengjie/data/man/man1/javapackager.1
    Replication 2 set: /yinzhengjie/data/man/man1/javaws.1
    Replication 2 set: /yinzhengjie/data/man/man1/jcmd.1
    Replication 2 set: /yinzhengjie/data/man/man1/jconsole.1
    Replication 2 set: /yinzhengjie/data/man/man1/jdb.1
    Replication 2 set: /yinzhengjie/data/man/man1/jdeps.1
    Replication 2 set: /yinzhengjie/data/man/man1/jhat.1
    Replication 2 set: /yinzhengjie/data/man/man1/jinfo.1
    Replication 2 set: /yinzhengjie/data/man/man1/jjs.1
    Replication 2 set: /yinzhengjie/data/man/man1/jmap.1
    Replication 2 set: /yinzhengjie/data/man/man1/jmc.1
    Replication 2 set: /yinzhengjie/data/man/man1/jps.1
    Replication 2 set: /yinzhengjie/data/man/man1/jrunscript.1
    Replication 2 set: /yinzhengjie/data/man/man1/jsadebugd.1
    Replication 2 set: /yinzhengjie/data/man/man1/jstack.1
    Replication 2 set: /yinzhengjie/data/man/man1/jstat.1
    Replication 2 set: /yinzhengjie/data/man/man1/jstatd.1
    Replication 2 set: /yinzhengjie/data/man/man1/jvisualvm.1
    Replication 2 set: /yinzhengjie/data/man/man1/keytool.1
    Replication 2 set: /yinzhengjie/data/man/man1/native2ascii.1
    Replication 2 set: /yinzhengjie/data/man/man1/orbd.1
    Replication 2 set: /yinzhengjie/data/man/man1/pack200.1
    Replication 2 set: /yinzhengjie/data/man/man1/policytool.1
    Replication 2 set: /yinzhengjie/data/man/man1/rmic.1
    Replication 2 set: /yinzhengjie/data/man/man1/rmid.1
    Replication 2 set: /yinzhengjie/data/man/man1/rmiregistry.1
    Replication 2 set: /yinzhengjie/data/man/man1/schemagen.1
    Replication 2 set: /yinzhengjie/data/man/man1/serialver.1
    Replication 2 set: /yinzhengjie/data/man/man1/servertool.1
    Replication 2 set: /yinzhengjie/data/man/man1/tnameserv.1
    Replication 2 set: /yinzhengjie/data/man/man1/unpack200.1
    Replication 2 set: /yinzhengjie/data/man/man1/wsgen.1
    Replication 2 set: /yinzhengjie/data/man/man1/wsimport.1
    Replication 2 set: /yinzhengjie/data/man/man1/xjc.1
    Replication 2 set: /yinzhengjie/data/release
    Replication 2 set: /yinzhengjie/data/src.zip
    [hdfs@node101.yinzhengjie.org.cn ~]$
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 2 -R -w /yinzhengjie/data/
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
    Found 13 items
    -rw-r--r--   2 hdfs supergroup      3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT
    -rw-r--r--   2 hdfs supergroup         40 2019-06-15 18:11 /yinzhengjie/data/LICENSE
    -rw-r--r--   2 hdfs supergroup        159 2019-06-15 18:11 /yinzhengjie/data/README.html
    -rw-r--r--   2 hdfs supergroup    105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt
    -rw-r--r--   2 hdfs supergroup    151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/bin
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/include
    -rw-r--r--   2 hdfs supergroup      5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/jre
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/lib
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/man
    -rw-r--r--   2 hdfs supergroup        424 2019-06-15 18:11 /yinzhengjie/data/release
    -rw-r--r--   2 hdfs supergroup     20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
    Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata
    FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:24:03 CST 2019
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ...................................Status: HEALTHY
     Total size:    397764951 B
     Total dirs:    205
     Total files:    1635
     Total symlinks:        0
     Total blocks (validated):    1614 (avg. block size 246446 B)
     Minimally replicated blocks:    1614 (100.0 %)
     Over-replicated blocks:    0 (0.0 %)
     Under-replicated blocks:    0 (0.0 %)
     Mis-replicated blocks:        0 (0.0 %)
     Default replication factor:    3
     Average block replication:    2.0      #当前集群的副本数为2
     Corrupt blocks:        0
     Missing replicas:        0 (0.0 %)
     Number of data-nodes:        4
     Number of racks:        1
    FSCK ended at Sat Jun 15 18:24:03 CST 2019 in 32 milliseconds
    
    
    The filesystem under path '/yinzhengjie/data' is HEALTHY
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ 

      

    四.调大HDFS的副本数(将副本数为2的改为副本数为3

    问题描述:
      对集群进行例行检查的时候,你发现有个别重要文件的副本数只有两个,而集群默认的副本书参数为3个,并没有修改过。请解决"/yinzhengjie/data/"目录下文件的副本数不足的问题。
    
    解决方案:
      HDFS命令的基本用法要熟练掌握,面试的时候如果考察HDFS的命令那几乎就是送分题。

    1>.修改目录下所有文件的副本数

    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
    Found 13 items
    -rw-r--r--   2 hdfs supergroup      3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT
    -rw-r--r--   2 hdfs supergroup         40 2019-06-15 18:11 /yinzhengjie/data/LICENSE
    -rw-r--r--   2 hdfs supergroup        159 2019-06-15 18:11 /yinzhengjie/data/README.html
    -rw-r--r--   2 hdfs supergroup    105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt
    -rw-r--r--   2 hdfs supergroup    151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/bin
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/include
    -rw-r--r--   2 hdfs supergroup      5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/jre
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/lib
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/man
    -rw-r--r--   2 hdfs supergroup        424 2019-06-15 18:11 /yinzhengjie/data/release
    -rw-r--r--   2 hdfs supergroup     20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
    Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata
    FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:24:03 CST 2019
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ...................................Status: HEALTHY
     Total size:    397764951 B
     Total dirs:    205
     Total files:    1635
     Total symlinks:        0
     Total blocks (validated):    1614 (avg. block size 246446 B)
     Minimally replicated blocks:    1614 (100.0 %)
     Over-replicated blocks:    0 (0.0 %)
     Under-replicated blocks:    0 (0.0 %)
     Mis-replicated blocks:        0 (0.0 %)
     Default replication factor:    3
     Average block replication:    2.0      #当前副本数为2
     Corrupt blocks:        0
     Missing replicas:        0 (0.0 %)
     Number of data-nodes:        4
     Number of racks:        1
    FSCK ended at Sat Jun 15 18:24:03 CST 2019 in 32 milliseconds
    
    
    The filesystem under path '/yinzhengjie/data' is HEALTHY
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 3  /yinzhengjie/data/
    ......
    Replication 3 set: /yinzhengjie/data/man/man1/javap.1
    Replication 3 set: /yinzhengjie/data/man/man1/javapackager.1
    Replication 3 set: /yinzhengjie/data/man/man1/javaws.1
    Replication 3 set: /yinzhengjie/data/man/man1/jcmd.1
    Replication 3 set: /yinzhengjie/data/man/man1/jconsole.1
    Replication 3 set: /yinzhengjie/data/man/man1/jdb.1
    Replication 3 set: /yinzhengjie/data/man/man1/jdeps.1
    Replication 3 set: /yinzhengjie/data/man/man1/jhat.1
    Replication 3 set: /yinzhengjie/data/man/man1/jinfo.1
    Replication 3 set: /yinzhengjie/data/man/man1/jjs.1
    Replication 3 set: /yinzhengjie/data/man/man1/jmap.1
    Replication 3 set: /yinzhengjie/data/man/man1/jmc.1
    Replication 3 set: /yinzhengjie/data/man/man1/jps.1
    Replication 3 set: /yinzhengjie/data/man/man1/jrunscript.1
    Replication 3 set: /yinzhengjie/data/man/man1/jsadebugd.1
    Replication 3 set: /yinzhengjie/data/man/man1/jstack.1
    Replication 3 set: /yinzhengjie/data/man/man1/jstat.1
    Replication 3 set: /yinzhengjie/data/man/man1/jstatd.1
    Replication 3 set: /yinzhengjie/data/man/man1/jvisualvm.1
    Replication 3 set: /yinzhengjie/data/man/man1/keytool.1
    Replication 3 set: /yinzhengjie/data/man/man1/native2ascii.1
    Replication 3 set: /yinzhengjie/data/man/man1/orbd.1
    Replication 3 set: /yinzhengjie/data/man/man1/pack200.1
    Replication 3 set: /yinzhengjie/data/man/man1/policytool.1
    Replication 3 set: /yinzhengjie/data/man/man1/rmic.1
    Replication 3 set: /yinzhengjie/data/man/man1/rmid.1
    Replication 3 set: /yinzhengjie/data/man/man1/rmiregistry.1
    Replication 3 set: /yinzhengjie/data/man/man1/schemagen.1
    Replication 3 set: /yinzhengjie/data/man/man1/serialver.1
    Replication 3 set: /yinzhengjie/data/man/man1/servertool.1
    Replication 3 set: /yinzhengjie/data/man/man1/tnameserv.1
    Replication 3 set: /yinzhengjie/data/man/man1/unpack200.1
    Replication 3 set: /yinzhengjie/data/man/man1/wsgen.1
    Replication 3 set: /yinzhengjie/data/man/man1/wsimport.1
    Replication 3 set: /yinzhengjie/data/man/man1/xjc.1
    Replication 3 set: /yinzhengjie/data/release
    Replication 3 set: /yinzhengjie/data/src.zip
    [hdfs@node101.yinzhengjie.org.cn ~]$
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -setrep 3 /yinzhengjie/data/

    2>.验证是否副本数是否修改成功

    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
    Found 13 items
    -rw-r--r--   3 hdfs supergroup      3.2 K 2019-06-15 18:10 /yinzhengjie/data/COPYRIGHT
    -rw-r--r--   3 hdfs supergroup         40 2019-06-15 18:11 /yinzhengjie/data/LICENSE
    -rw-r--r--   3 hdfs supergroup        159 2019-06-15 18:11 /yinzhengjie/data/README.html
    -rw-r--r--   3 hdfs supergroup    105.6 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME-JAVAFX.txt
    -rw-r--r--   3 hdfs supergroup    151.4 K 2019-06-15 18:11 /yinzhengjie/data/THIRDPARTYLICENSEREADME.txt
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/bin
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/include
    -rw-r--r--   3 hdfs supergroup      5.0 M 2019-06-15 18:10 /yinzhengjie/data/javafx-src.zip
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:10 /yinzhengjie/data/jre
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/lib
    drwxr-xr-x   - hdfs supergroup          0 2019-06-15 18:11 /yinzhengjie/data/man
    -rw-r--r--   3 hdfs supergroup        424 2019-06-15 18:11 /yinzhengjie/data/release
    -rw-r--r--   3 hdfs supergroup     20.1 M 2019-06-15 18:11 /yinzhengjie/data/src.zip
    [hdfs@node101.yinzhengjie.org.cn ~]$
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls -h /yinzhengjie/data/
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/
    Connecting to namenode via http://node101.yinzhengjie.org.cn:50070/fsck?ugi=hdfs&path=%2Fyinzhengjie%2Fdata
    FSCK started by hdfs (auth:SIMPLE) from /172.30.1.101 for path /yinzhengjie/data at Sat Jun 15 18:37:24 CST 2019
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ....................................................................................................
    ...................................Status: HEALTHY
     Total size:    397764951 B
     Total dirs:    205
     Total files:    1635
     Total symlinks:        0
     Total blocks (validated):    1614 (avg. block size 246446 B)
     Minimally replicated blocks:    1614 (100.0 %)
     Over-replicated blocks:    0 (0.0 %)
     Under-replicated blocks:    0 (0.0 %)
     Mis-replicated blocks:        0 (0.0 %)
     Default replication factor:    3
     Average block replication:    3.0      #当前集群的副本数为3
     Corrupt blocks:        0
     Missing replicas:        0 (0.0 %)
     Number of data-nodes:        4
     Number of racks:        1
    FSCK ended at Sat Jun 15 18:37:24 CST 2019 in 17 milliseconds
    
    
    The filesystem under path '/yinzhengjie/data' is HEALTHY
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs fsck /yinzhengjie/data/

    五.将HDFS一个文件以指定的块大小复制到另一个目录

    问题描述:
      你发现集群中一些大文件的块大小为64MB,导致MapReduce作业使用这些文件时,默认会产生较多的map数量,造成资源浪费。
      你决定将这些文件以128MB的块大小备份到另一个目录中。请将"/yinzhengjie/data/input"下的文件以128MB的块大小备份到"/yinzhengjie/data/output"下。 解决方案:
      这道题主要考察对HDFS的理解,HDFS文件的块大小处理集群默认配置外,还可以针对每个文件单独设置,但一旦设定后就不能修改,只能重新拷贝一份。

    1>.将HDFS一个文件以64MB的块大小复制到另一个目录

    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir -p /yinzhengjie/data/input
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls /yinzhengjie/debug/hdfs/log
    Found 1 items
    -rw-r--r--   3 root supergroup      64384 2019-06-15 16:37 /yinzhengjie/debug/hdfs/log/timestamp_1560583829
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -Ddfs.block.size=67108864 -cp /yinzhengjie/debug/hdfs/log/timestamp_1560583829 /yinzhengjie/data/input
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls /yinzhengjie/data/input
    Found 1 items
    -rw-r--r--   3 hdfs supergroup      64384 2019-06-15 18:44 /yinzhengjie/data/input/timestamp_1560583829
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -Ddfs.block.size=67108864 -cp /yinzhengjie/debug/hdfs/log/timestamp_1560583829 /yinzhengjie/data/input

    2>.确认集群默认的块大小(如下图所示,默认的块大小已经时256MB啦,因此备份时需要指定块大小的参数,如果默认值时128MB咱们就不用指定块大小的参数啦)

    3>.创建备份目录,并将数据拷贝至该目录

    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -mkdir /yinzhengjie/data/output
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -Ddfs.block.size=134217728 -cp /yinzhengjie/data/input/timestamp_1560583829  /yinzhengjie/data/output
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls  /yinzhengjie/data/input
    Found 1 items
    -rw-r--r--   3 hdfs supergroup      64384 2019-06-15 18:44 /yinzhengjie/data/input/timestamp_1560583829
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ hdfs dfs -ls  /yinzhengjie/data/output
    Found 1 items
    -rw-r--r--   3 hdfs supergroup      64384 2019-06-15 18:59 /yinzhengjie/data/output/timestamp_1560583829
    [hdfs@node101.yinzhengjie.org.cn ~]$ 
    [hdfs@node101.yinzhengjie.org.cn ~]$ 

  • 相关阅读:
    UVA12125 March of the Penguins (最大流+拆点)
    UVA 1317 Concert Hall Scheduling(最小费用最大流)
    UVA10249 The Grand Dinner(最大流)
    UVA1349 Optimal Bus Route Design(KM最佳完美匹配)
    UVA1212 Duopoly(最大流最小割)
    UVA1395 Slim Span(kruskal)
    UVA1045 The Great Wall Game(二分图最佳匹配)
    UVA12168 Cat vs. Dog( 二分图最大独立集)
    hdu3488Tour(KM最佳完美匹配)
    UVA1345 Jamie's Contact Groups(最大流+二分)
  • 原文地址:https://www.cnblogs.com/yinzhengjie/p/10995701.html
Copyright © 2020-2023  润新知