• hadoop 2.7.3本地环境运行官方wordcount-基于HDFS


    接上篇《hadoop 2.7.3本地环境运行官方wordcount》。继续在本地模式下测试,本次使用hdfs.

    2 本地模式使用fs计数wodcount

    上面是直接使用的是linux的文件系统。现在使用hadoop fs。在本地模式下,hadoop fs其实也是使用的linux的fs。下面示例说明:

    2.1 验证FS

    cd /home/jungle/hadoop/hadoop-local
    ls -l 
    total 116
    drwxr-xr-x. 2 jungle jungle  4096 Jan  6 15:06 bin
    drwxrwxr-x. 4 jungle jungle    31 Jan  6 16:53 dataLocal
    drwxr-xr-x. 3 jungle jungle    19 Jan  6 14:56 etc
    drwxr-xr-x. 2 jungle jungle   101 Jan  6 14:56 include
    drwxr-xr-x. 3 jungle jungle    19 Jan  6 14:56 lib
    drwxr-xr-x. 2 jungle jungle  4096 Jan  6 14:56 libexec
    -rw-r--r--. 1 jungle jungle 84854 Jan  6 14:56 LICENSE.txt
    -rw-r--r--. 1 jungle jungle 14978 Jan  6 14:56 NOTICE.txt
    -rw-r--r--. 1 jungle jungle  1366 Jan  6 14:56 README.txt
    drwxr-xr-x. 2 jungle jungle  4096 Jan  6 14:56 sbin
    drwxr-xr-x. 4 jungle jungle    29 Jan  6 14:56 share
    
    hadoop fs -ls /
    Found 20 items
    -rw-r--r--   1 root root          0 2016-12-30 12:26 /1
    dr-xr-xr-x   - root root      45056 2016-12-30 13:06 /bin
    dr-xr-xr-x   - root root       4096 2016-12-29 20:09 /boot
    drwxr-xr-x   - root root       3120 2017-01-06 18:31 /dev
    drwxr-xr-x   - root root       8192 2017-01-06 18:32 /etc
    drwxr-xr-x   - root root         19 2016-11-05 23:38 /home
    dr-xr-xr-x   - root root       4096 2016-12-30 12:29 /lib
    dr-xr-xr-x   - root root      81920 2016-12-30 13:04 /lib64
    drwxr-xr-x   - root root          6 2016-11-05 23:38 /media
    # ...
    
    # 等同 ls -l  /home/jungle/hadoop/hadoop-local
    hadoop fs -ls /home/jungle/hadoop/hadoop-local
    Found 11 items
    -rw-r--r--   1 jungle jungle      84854 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/LICENSE.txt
    -rw-r--r--   1 jungle jungle      14978 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/NOTICE.txt
    -rw-r--r--   1 jungle jungle       1366 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/README.txt
    drwxr-xr-x   - jungle jungle       4096 2017-01-06 15:06 /home/jungle/hadoop/hadoop-local/bin
    drwxrwxr-x   - jungle jungle         31 2017-01-06 16:53 /home/jungle/hadoop/hadoop-local/dataLocal
    drwxr-xr-x   - jungle jungle         19 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/etc
    drwxr-xr-x   - jungle jungle        101 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/include
    drwxr-xr-x   - jungle jungle         19 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/lib
    drwxr-xr-x   - jungle jungle       4096 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/libexec
    drwxr-xr-x   - jungle jungle       4096 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/sbin
    drwxr-xr-x   - jungle jungle         29 2017-01-06 14:56 /home/jungle/hadoop/hadoop-local/share	
    

    从上面可以看出。hadoop fs -ls /home/jungle/hadoop/hadoop-local和linux的命令ls /home/jungle/hadoop/hadoop-local是等效的。

    2.2 准备数据

    下面基于上次实例的原始数据,将其copy到hdfs上。

    hadoop fs -mkdir -p ./dataHdfs/input 
    
    hadoop fs -ls .
    Found 12 items
    drwxrwxr-x   - jungle jungle         18 2017-01-06 18:44 dataHdfs
    drwxrwxr-x   - jungle jungle         31 2017-01-06 16:53 dataLocal
    # ...
    
    hadoop fs -ls ./dataHdfs/
    Found 1 items
    drwxrwxr-x   - jungle jungle          6 2017-01-06 18:44 dataHdfs/input
    
    hadoop fs -put 
    -put: Not enough arguments: expected 1 but got 0
    Usage: hadoop fs [generic options] -put [-f] [-p] [-l] <localsrc> ... <dst>
    
    # 将本地文件,put到hdfs上,实际效果等同于linux下的copy
    hadoop fs -put dataLocal/input/ ./dataHdfs/
    ls -l dataHdfs/
    total 0
    drwxrwxr-x. 2 jungle jungle 80 Jan  6 18:51 input
    
    ls -l dataHdfs/input/
    total 8
    -rw-r--r--. 1 jungle jungle 37 Jan  6 18:51 file1.txt
    -rw-r--r--. 1 jungle jungle 70 Jan  6 18:51 file2.txt
    
    hadoop fs -ls  ./dataHdfs/
    Found 1 items
    drwxrwxr-x   - jungle jungle         80 2017-01-06 18:51 dataHdfs/input
    
    hadoop fs -ls  ./dataHdfs/input/
    Found 2 items
    -rw-r--r--   1 jungle jungle         37 2017-01-06 18:51 dataHdfs/input/file1.txt
    -rw-r--r--   1 jungle jungle         70 2017-01-06 18:51 dataHdfs/input/file2.txt
    
    

    2.3 执行wordcount

    hadoop jar /home/jungle/hadoop/hadoop-local/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount dataHdfs/input/ dataHdfs/output
    # 这里的input, output目录,即可以理解成hdfs里的目录,也可以理解成linux里的目录。 
    
    cat dataHdfs/output/part-r-00000 
    I	1
    am	1
    bye	2
    great	1
    hadoop.	3
    hello	3
    is	1
    jungle.	2
    software	1
    the	1
    world.	2
    
    md5sum dataLocal/outout/part-r-00000  dataHdfs/output/part-r-00000 
    68956fd01404e5fc79e8f84e148f19e8  dataLocal/outout/part-r-00000
    68956fd01404e5fc79e8f84e148f19e8  dataHdfs/output/part-r-00000
    
    

    可见与上篇中 dataLocal/下的结果是相同的。

  • 相关阅读:
    南京网络赛题解 2018
    ACM 第二十天
    存储型XSS漏洞(pikachu)
    XSS跨站脚本(pikachu)——反射型xss(get和post)
    暴力破解(pikachu)——防范措施(简单了解)
    暴力破解02(pikachu)——验证码
    暴力破解01(pikachu)——基于表单
    安全软件burp suite pro破解版(亲测有效)
    sql lab的基础配置和1-4关闯关攻略(建议自学MySQL语言后再观看)
    sql lab的环境配置以及火狐浏览器的使用
  • 原文地址:https://www.cnblogs.com/qinqiao/p/local-hadoop-wordcount-base-hdfs.html
Copyright © 2020-2023  润新知