• hadoop自带例子wordcount的具体运行步骤


    1.在hadoop所在目录“usr/local”下创建一个文件夹input

    root@ubuntu:/usr/local# mkdir input

    2.在文件夹input中创建两个文本文件file1.txt和file2.txt,file1.txt中内容是“hello word”,file2.txt中内容是“hello hadoop”、“hello mapreduce”(分两行)。

    root@ubuntu:/usr/local# cd input
    root@ubuntu:/usr/local/input# echo "hello word" > file1.txt
    root@ubuntu:/usr/local/input# echo "hello hadoop" > file2.txt
    root@ubuntu:/usr/local/input# echo "hello mapreduce" > file2.txt   (hello mapreduce 会覆盖原来写入的hello hadoop ,可以使用gedit编辑file2.txt)
    root@ubuntu:/usr/local/input# ls
    file1.txt file2.txt

    显示文件内容可用:

    root@ubuntu:/usr/local/input# more file1.txt
    hello word
    root@ubuntu:/usr/local/input# more file2.txt
    hello mapreduce
    hello hadoop

    3.在HDFS上创建输入文件夹wc_input,并将本地文件夹input中的两个文本文件上传到集群的wc_input下

    root@ubuntu:/usr/local/hadoop-1.2.1# bin/hadoop fs -mkdir wc_input

    root@ubuntu:/usr/local/hadoop-1.2.1# bin/hadoop fs -put /usr/local/input/file* wc_input

    查看wc_input中的文件:

    root@ubuntu:/usr/local/hadoop-1.2.1# bin/hadoop fs -ls wc_input
    Found 2 items
    -rw-r--r-- 1 root supergroup 11 2014-03-13 01:19 /user/root/wc_input/file1.txt
    -rw-r--r-- 1 root supergroup 29 2014-03-13 01:19 /user/root/wc_input/file2.txt

    4.启动所有进程并查看进程:

    root@ubuntu:/# ssh localhost   (用于验证能否实现无密码登陆localhost,如果能会出现下面的信息。否则需要设置具体步骤见http://blog.csdn.net/joe_007/article/details/8298814)

    Welcome to Ubuntu 12.04.3 LTS (GNU/Linux 3.2.0-24-generic-pae i686)

    * Documentation: https://help.ubuntu.com/

    Last login: Mon Mar 3 04:44:23 2014 from localhost

    root@ubuntu:~# exit
    logout
    Connection to localhost closed.

    root@ubuntu:/usr/local/hadoop-1.2.1/bin# ./start-all.sh

    starting namenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-namenode-ubuntu.out
    localhost: starting datanode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-ubuntu.out
    localhost: starting secondarynamenode, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-ubuntu.out
    starting jobtracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-jobtracker-ubuntu.out
    localhost: starting tasktracker, logging to /usr/local/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-ubuntu.out

    root@ubuntu:/usr/local/hadoop-1.2.1/bin# jps
    7847 SecondaryNameNode
    4196
    7634 DataNode
    7423 NameNode
    8319 Jps
    7938 JobTracker
    8157 TaskTracker

    运行hadoop自带的wordcount jar包(注:再次运行时一定要先将前一次运行的输出文件夹删除)

    root@ubuntu:/usr/local/hadoop-1.2.1# bin/hadoop jar ./hadoop-examples-1.2.1.jar wordcount wc_input wc_output
    14/03/13 01:48:40 INFO input.FileInputFormat: Total input paths to process : 2
    14/03/13 01:48:40 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    14/03/13 01:48:40 WARN snappy.LoadSnappy: Snappy native library not loaded
    14/03/13 01:48:42 INFO mapred.JobClient: Running job: job_201403130031_0001
    14/03/13 01:48:44 INFO mapred.JobClient: map 0% reduce 0%
    14/03/13 01:52:47 INFO mapred.JobClient: map 50% reduce 0%
    14/03/13 01:53:50 INFO mapred.JobClient: map 100% reduce 0%
    14/03/13 01:54:14 INFO mapred.JobClient: map 100% reduce 100%

    ... ...

    5.查看输出文件夹

    root@ubuntu:/usr/local/hadoop-1.2.1# bin/hadoop fs -ls wc_output
    Found 3 items
    -rw-r--r-- 1 root supergroup 0 2014-03-13 01:54 /user/root/wc_output/_SUCCESS
    drwxr-xr-x - root supergroup 0 2014-03-13 01:48 /user/root/wc_output/_logs
    -rw-r--r-- 1 root supergroup 36 2014-03-13 01:54 /user/root/wc_output/part-r-00000   (实际输出结果在part-r-00000中)

    6.查看输出文件part-r-00000中的内容

    root@ubuntu:/usr/local/hadoop-1.2.1# bin/hadoop fs -cat /user/root/wc_output/part-r-00000
    hadoop 1
    hello 3
    mapreduce 1
    word 1

    7.关闭所有进程

    root@ubuntu:/usr/local/hadoop-1.2.1/bin# ./stop-all.sh
    stopping jobtracker
    localhost: stopping tasktracker
    stopping namenode
    localhost: stopping datanode
    localhost: stopping secondarynamenode

  • 相关阅读:
    springcloud有哪些特征
    可变参数
    递归
    增强的for循环
    Scanner对象
    注释
    Markdown常见的样式语法
    副本机制
    消费者分区分配策略
    SpringMVC 登陆判断
  • 原文地址:https://www.cnblogs.com/xuepei/p/3599202.html
Copyright © 2020-2023  润新知