• Hadoop2.7.6_02_HDFS常用操作


    1. HDFS常用操作

    1.1. 查询

    1.1.1.  浏览器查询

    1.1.2. 命令行查询

    [yun@mini04 bin]$ hadoop fs -ls /
    

      

    1.2. 上传文件

     1 [yun@mini05 zhangliang]$ cat test.info 
     2 111
     3 222
     4 333
     5 444
     6 555
     7 
     8 [yun@mini05 zhangliang]$ hadoop fs -put test.info /   # 上传文件 
     9 [yun@mini05 software]$ ll -h
    10 total 641M
    11 -rw-r--r--  1 yun    yun  190M Jun  8 16:36 CentOS-7.4_hadoop-2.7.6.tar.gz
    12 [yun@mini05 software]$ hadoop fs -put CentOS-7.4_hadoop-2.7.6.tar.gz /  # 上传文件
    13 [yun@mini05 software]$ hadoop fs -ls / 
    14 Found 2 items
    15 -rw-r--r--   3 yun supergroup  198811365 2018-06-10 09:04 /CentOS-7.4_hadoop-2.7.6.tar.gz
    16 -rw-r--r--   3 yun supergroup         21 2018-06-10 09:02 /test.info

    1.2.1. 文件存放位置

     1 [yun@mini04 subdir0]$ pwd   # 根据之前的配置,总共3份相同的文件 
     2 /app/hadoop/tmp/dfs/data/current/BP-925531343-10.0.0.11-1528537498201/current/finalized/subdir0/subdir0
     3 [yun@mini04 subdir0]$ ll -h  # 默认128M 切片 
     4 total 192M
     5 -rw-rw-r-- 1 yun yun   21 Jun 10 09:02 blk_1073741825
     6 -rw-rw-r-- 1 yun yun   11 Jun 10 09:02 blk_1073741825_1001.meta
     7 -rw-rw-r-- 1 yun yun 128M Jun 10 09:04 blk_1073741826
     8 -rw-rw-r-- 1 yun yun 1.1M Jun 10 09:04 blk_1073741826_1002.meta
     9 -rw-rw-r-- 1 yun yun  62M Jun 10 09:04 blk_1073741827
    10 -rw-rw-r-- 1 yun yun 493K Jun 10 09:04 blk_1073741827_1003.meta
    11 [yun@mini04 subdir0]$ cat blk_1073741825
    12 111
    13 222
    14 333
    15 444
    16 555
    17 
    18 [yun@mini04 subdir0]$

    1.2.2. 浏览器访问

    1.3. 文件下载

     1 [yun@mini04 zhangliang]$ pwd
     2 /app/software/zhangliang
     3 [yun@mini04 zhangliang]$ ll
     4 total 0
     5 [yun@mini04 zhangliang]$ hadoop fs -ls /
     6 Found 2 items
     7 -rw-r--r--   3 yun supergroup  198811365 2018-06-10 09:04 /CentOS-7.4_hadoop-2.7.6.tar.gz
     8 -rw-r--r--   3 yun supergroup         21 2018-06-10 09:02 /test.info
     9 [yun@mini04 zhangliang]$ hadoop fs -get /test.info
    10 [yun@mini04 zhangliang]$ hadoop fs -get /CentOS-7.4_hadoop-2.7.6.tar.gz
    11 [yun@mini04 zhangliang]$ ll
    12 total 194156
    13 -rw-r--r-- 1 yun yun 198811365 Jun 10 09:10 CentOS-7.4_hadoop-2.7.6.tar.gz
    14 -rw-r--r-- 1 yun yun        21 Jun 10 09:09 test.info
    15 [yun@mini04 zhangliang]$ cat test.info 
    16 111
    17 222
    18 333
    19 444
    20 555
    21 
    22 [yun@mini04 zhangliang]$

    2. 简单案例

    2.1. 准备数据

     1 [yun@mini05 zhangliang]$ pwd
     2 /app/software/zhangliang
     3 [yun@mini05 zhangliang]$ ll
     4 total 8
     5 -rw-rw-r-- 1 yun yun 33 Jun 10 09:43 test.info
     6 -rw-rw-r-- 1 yun yun 55 Jun 10 09:43 zhang.info
     7 [yun@mini05 zhangliang]$ cat test.info 
     8 111
     9 222
    10 333
    11 444
    12 555
    13 333
    14 222
    15 222
    16 222
    17 
    18 [yun@mini05 zhangliang]$ cat zhang.info 
    19 zxcvbnm
    20 asdfghjkl
    21 qwertyuiop
    22 qwertyuiop
    23 111
    24 qwertyuiop
    25 [yun@mini05 zhangliang]$ hadoop fs -mkdir -p /wordcount/input
    26 [yun@mini05 zhangliang]$ hadoop fs -put test.info zhang.info /wordcount/input
    27 [yun@mini05 zhangliang]$ hadoop fs -ls /wordcount/input
    28 Found 2 items
    29 -rw-r--r--   3 yun supergroup         37 2018-06-10 09:45 /wordcount/input/test.info
    30 -rw-r--r--   3 yun supergroup         55 2018-06-10 09:45 /wordcount/input/zhang.info

    2.1. 运行分析

     1 [yun@mini04 mapreduce]$ pwd
     2 /app/hadoop/share/hadoop/mapreduce
     3 [yun@mini04 mapreduce]$ ll
     4 total 5000
     5 -rw-r--r-- 1 yun yun  545657 Jun  8 16:36 hadoop-mapreduce-client-app-2.7.6.jar
     6 -rw-r--r-- 1 yun yun  777070 Jun  8 16:36 hadoop-mapreduce-client-common-2.7.6.jar
     7 -rw-r--r-- 1 yun yun 1558767 Jun  8 16:36 hadoop-mapreduce-client-core-2.7.6.jar
     8 -rw-r--r-- 1 yun yun  191881 Jun  8 16:36 hadoop-mapreduce-client-hs-2.7.6.jar
     9 -rw-r--r-- 1 yun yun   27830 Jun  8 16:36 hadoop-mapreduce-client-hs-plugins-2.7.6.jar
    10 -rw-r--r-- 1 yun yun   62954 Jun  8 16:36 hadoop-mapreduce-client-jobclient-2.7.6.jar
    11 -rw-r--r-- 1 yun yun 1561708 Jun  8 16:36 hadoop-mapreduce-client-jobclient-2.7.6-tests.jar
    12 -rw-r--r-- 1 yun yun   72054 Jun  8 16:36 hadoop-mapreduce-client-shuffle-2.7.6.jar
    13 -rw-r--r-- 1 yun yun  295697 Jun  8 16:36 hadoop-mapreduce-examples-2.7.6.jar
    14 drwxr-xr-x 2 yun yun    4096 Jun  8 16:36 lib
    15 drwxr-xr-x 2 yun yun      30 Jun  8 16:36 lib-examples
    16 drwxr-xr-x 2 yun yun    4096 Jun  8 16:36 sources
    17 [yun@mini04 mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.7.6.jar wordcount /wordcount/input /wordcount/output  
    18 18/06/10 09:46:09 INFO client.RMProxy: Connecting to ResourceManager at mini02/10.0.0.12:8032
    19 18/06/10 09:46:10 INFO input.FileInputFormat: Total input paths to process : 2
    20 18/06/10 09:46:10 INFO mapreduce.JobSubmitter: number of splits:2
    21 18/06/10 09:46:10 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1528589937101_0002
    22 18/06/10 09:46:10 INFO impl.YarnClientImpl: Submitted application application_1528589937101_0002
    23 18/06/10 09:46:10 INFO mapreduce.Job: The url to track the job: http://mini02:8088/proxy/application_1528589937101_0002/
    24 18/06/10 09:46:10 INFO mapreduce.Job: Running job: job_1528589937101_0002
    25 18/06/10 09:46:21 INFO mapreduce.Job: Job job_1528589937101_0002 running in uber mode : false
    26 18/06/10 09:46:21 INFO mapreduce.Job:  map 0% reduce 0%
    27 18/06/10 09:46:30 INFO mapreduce.Job:  map 100% reduce 0%
    28 18/06/10 09:46:36 INFO mapreduce.Job:  map 100% reduce 100%
    29 18/06/10 09:46:37 INFO mapreduce.Job: Job job_1528589937101_0002 completed successfully
    30 18/06/10 09:46:37 INFO mapreduce.Job: Counters: 49
    31     File System Counters
    32         FILE: Number of bytes read=113
    33         FILE: Number of bytes written=368235
    34         FILE: Number of read operations=0
    35         FILE: Number of large read operations=0
    36         FILE: Number of write operations=0
    37         HDFS: Number of bytes read=311
    38         HDFS: Number of bytes written=65
    39         HDFS: Number of read operations=9
    40         HDFS: Number of large read operations=0
    41         HDFS: Number of write operations=2
    42     Job Counters 
    43         Launched map tasks=2
    44         Launched reduce tasks=1
    45         Data-local map tasks=2
    46         Total time spent by all maps in occupied slots (ms)=12234
    47         Total time spent by all reduces in occupied slots (ms)=3651
    48         Total time spent by all map tasks (ms)=12234
    49         Total time spent by all reduce tasks (ms)=3651
    50         Total vcore-milliseconds taken by all map tasks=12234
    51         Total vcore-milliseconds taken by all reduce tasks=3651
    52         Total megabyte-milliseconds taken by all map tasks=12527616
    53         Total megabyte-milliseconds taken by all reduce tasks=3738624
    54     Map-Reduce Framework
    55         Map input records=16
    56         Map output records=15
    57         Map output bytes=151
    58         Map output materialized bytes=119
    59         Input split bytes=219
    60         Combine input records=15
    61         Combine output records=9
    62         Reduce input groups=8
    63         Reduce shuffle bytes=119
    64         Reduce input records=9
    65         Reduce output records=8
    66         Spilled Records=18
    67         Shuffled Maps =2
    68         Failed Shuffles=0
    69         Merged Map outputs=2
    70         GC time elapsed (ms)=308
    71         CPU time spent (ms)=2700
    72         Physical memory (bytes) snapshot=709656576
    73         Virtual memory (bytes) snapshot=6343041024
    74         Total committed heap usage (bytes)=473956352
    75     Shuffle Errors
    76         BAD_ID=0
    77         CONNECTION=0
    78         IO_ERROR=0
    79         WRONG_LENGTH=0
    80         WRONG_MAP=0
    81         WRONG_REDUCE=0
    82     File Input Format Counters 
    83         Bytes Read=92
    84     File Output Format Counters 
    85         Bytes Written=65
    86 [yun@mini05 zhangliang]$ hadoop fs -ls /wordcount/output  
    87 Found 2 items
    88 -rw-r--r--   3 yun supergroup          0 2018-06-10 09:46 /wordcount/output/_SUCCESS
    89 -rw-r--r--   3 yun supergroup         65 2018-06-10 09:46 /wordcount/output/part-r-00000 
    90 [yun@mini05 zhangliang]$ hadoop fs -cat /wordcount/output/part-r-00000  
    91 111    2
    92 222    4
    93 333    2
    94 444    1
    95 555    1
    96 asdfghjkl    1
    97 qwertyuiop    3
    98 zxcvbnm    1

    3. 案例:开发shell采集脚本

    3.1. 需求说明

      点击流日志每天都10T,在业务应用服务器上,需要准实时上传至数据仓库(Hadoop HDFS)上

    3.2. 需求分析

      一般上传文件都是在凌晨24点操作,由于很多种类的业务数据都要在晚上进行传输,为了减轻服务器的压力,避开高峰期

      如果需要伪实时的上传,则采用定时上传的方式。比如每小时上传一次。

    3.3. web日志模拟

      运行jar包模拟web日志

     1 # 必要的目录  
     2 # 模拟web服务器目录 /app/webservice 
     3 # 模拟web日志目录 /app/webservice/logs  
     4 # 模拟待上传文件存放的目录 /app/webservice/logs/up2hdfs  # 在这个目录上传到HDFS
     5 [yun@mini01 webservice]$ pwd
     6 /app/webservice
     7 [yun@mini01 webservice]$ ll
     8 total 360
     9 drwxrwxr-x 3 yun yun     39 Jun 14 17:40 logs
    10 -rw-r--r-- 1 yun yun 367872 Jun 14 11:32 testlog.jar
    11 # 运行jar包,打印日志
    12 [yun@mini01 webservice]$ java -jar testlog.jar &  
    13 ……………………
    14 # 生成web日志查看
    15 [yun@mini01 logs]$ pwd
    16 /app/webservice/logs
    17 [yun@mini01 logs]$ ll -hrt
    18 total 460K
    19 -rw-rw-r-- 1 yun yun  11K Jun 14 17:28 access.log.7
    20 -rw-rw-r-- 1 yun yun  11K Jun 14 17:28 access.log.6
    21 -rw-rw-r-- 1 yun yun  11K Jun 14 17:28 access.log.5
    22 -rw-rw-r-- 1 yun yun  11K Jun 14 17:28 access.log.4
    23 -rw-rw-r-- 1 yun yun  11K Jun 14 17:29 access.log.3
    24 -rw-rw-r-- 1 yun yun  11K Jun 14 17:29 access.log.2
    25 -rw-rw-r-- 1 yun yun  11K Jun 14 17:29 access.log.1
    26 -rw-rw-r-- 1 yun yun 9.6K Jun 14 17:29 access.log

    3.4. Shell脚本执行

    1 # 脚本测试
    2 [yun@mini01 hadoop]$ pwd
    3 /app/yunwei/hadoop
    4 [yun@mini01 hadoop]$ ll 
    5 -rwxrwxr-x 1 yun yun 2657 Jun 14 17:27 uploadFile2Hdfs.sh
    6 [yun@mini01 hadoop]$ ./uploadFile2Hdfs.sh     
    7 ………………

    3.5. 查看结果

    3.5.1. 脚本执行结果查看

    Web日志转移与上传查看

     1 # web日志转移查看
     2 [yun@mini01 logs]$ pwd
     3 /app/webservice/logs
     4 [yun@mini01 logs]$ ll -hrt
     5 total 28K
     6 -rw-rw-r-- 1 yun yun 7.3K Jun 14 17:36 access.log    ### 表明日志已经转移
     7 drwxrwxr-x 2 yun yun  16K Jun 14 17:40 up2hdfs
     8 
     9 # 待上传文件存放的目录
    10 [yun@mini01 up2hdfs]$ pwd
    11 /app/webservice/logs/up2hdfs
    12 [yun@mini01 up2hdfs]$ ll -hrt 
    13 -rw-rw-r-- 1 yun yun 11K Jun 14 17:35 access.log.7_20180614174001_DONE
    14 -rw-rw-r-- 1 yun yun 11K Jun 14 17:35 access.log.6_20180614174001_DONE
    15 -rw-rw-r-- 1 yun yun 11K Jun 14 17:35 access.log.5_20180614174001_DONE
    16 -rw-rw-r-- 1 yun yun 11K Jun 14 17:35 access.log.4_20180614174001_DONE
    17 -rw-rw-r-- 1 yun yun 11K Jun 14 17:35 access.log.3_20180614174001_DONE
    18 -rw-rw-r-- 1 yun yun 11K Jun 14 17:36 access.log.2_20180614174001_DONE
    19 -rw-rw-r-- 1 yun yun 11K Jun 14 17:36 access.log.1_20180614174001_DONE
    20 ## 文件以 DONE 结尾,表明已上传到HDFS

    脚本日志查看

    # shell脚本日志查看
    [yun@mini01 log]$ pwd
    /app/yunwei/hadoop/log
    [yun@mini01 log]$ ll
    total 28
    -rw-rw-r-- 1 yun yun 24938 Jun 14 17:40 uploadFile2Hdfs.sh.log
    [yun@mini01 log]$ cat uploadFile2Hdfs.sh.log 
    2018-06-14 17:40:08 uploadFile2Hdfs.sh access.log.1_20180614174001        ok
    2018-06-14 17:40:10 uploadFile2Hdfs.sh access.log.2_20180614174001        ok
    2018-06-14 17:40:12 uploadFile2Hdfs.sh access.log.3_20180614174001        ok
    2018-06-14 17:40:14 uploadFile2Hdfs.sh access.log.4_20180614174001        ok
    2018-06-14 17:40:16 uploadFile2Hdfs.sh access.log.5_20180614174001        ok
    2018-06-14 17:40:18 uploadFile2Hdfs.sh access.log.6_20180614174001        ok
    2018-06-14 17:40:20 uploadFile2Hdfs.sh access.log.7_20180614174001        ok

    3.5.2. HDFS命令行上传文件查看

    1 [yun@mini01 ~]$ hadoop fs -ls /data/webservice/20180614
    2 Found 7 items
    3 -rw-r--r--   2 yun supergroup      10268 2018-06-14 17:08 /data/webservice/20180614/access.log.1_20180614174001
    4 -rw-r--r--   2 yun supergroup      10268 2018-06-14 17:08 /data/webservice/20180614/access.log.2_20180614174001
    5 -rw-r--r--   2 yun supergroup      10268 2018-06-14 17:08 /data/webservice/20180614/access.log.3_20180614174001
    6 -rw-r--r--   2 yun supergroup      10268 2018-06-14 17:08 /data/webservice/20180614/access.log.4_20180614174001
    7 -rw-r--r--   2 yun supergroup      10268 2018-06-14 17:08 /data/webservice/20180614/access.log.5_20180614174001
    8 -rw-r--r--   2 yun supergroup      10268 2018-06-14 17:08 /data/webservice/20180614/access.log.6_20180614174001
    9 -rw-r--r--   2 yun supergroup      10268 2018-06-14 17:08 /data/webservice/20180614/access.log.7_20180614174001

    3.5.3. 浏览器查看看

    http://10.0.0.11:50070/explorer.html#/data/webservice/20180614
    
    路径 : /data/webservice/20180614
    

    3.6. 脚本加入定时任务

    1 [yun@mini01 hadoop]$ crontab -l   
    2 
    3 # WEB日志上传HDFS   每小时执行一次
    4 00 */1 * * * /app/yunwei/hadoop/uploadFile2Hdfs.sh    >/dev/null 2>&1

    3.7. 相关的脚本和jar包

    在Git上,路径如下:
    https://github.com/zhanglianghhh/bigdata/tree/master/hadoop/hdfs
    

      

  • 相关阅读:
    Dubbo服务者消费者提供者案例实现
    spring核心组件
    spring为什么要注入接口
    小菜鸡进阶之路_Second week之元组、列表、集合、字典对比.
    小菜鸡进阶之路-First week
    光学公式推到——(物象位置) 1/u+1/v=1/f
    C#问题——调用事件时其他信息: 未将对象引用设置到对象的实例。
    工业相机全局曝光和卷帘曝光的区别
    相机加接圈的作用和缺点
    C#——数组维度/行数/列数/长度区别
  • 原文地址:https://www.cnblogs.com/zhanglianghhh/p/9194508.html
Copyright © 2020-2023  润新知