• Hadoop自带例子学习


    用简单例子学习Hadoop,统计字符出现次数。

    环境:OS:Centos 6.5 x64 &  Soft:Hadoop 1.2.1

    1、创建文件夹,写入字符到文本文件。

    [huser@master hadoop-1.2.1]$ mkdir input
    [huser@master hadoop-1.2.1]$ echo "hello world" >test1.txt
    [huser@master hadoop-1.2.1]$ echo "hello hadoop" >test2.txt

    2、写到hdfs中

    [huser@master hadoop-1.2.1]$ bin/hadoop fs -put ../input ./in
    
    [huser@master hadoop-1.2.1]$ bin/hadoop fs -ls
    Found 1 items
    drwxr-xr-x - huser supergroup 0 2014-04-16 19:00 /user/huser/in
    
    [huser@master hadoop-1.2.1]$ bin/hadoop fs -ls ./in/*
    -rw-r--r-- 1 huser supergroup 12 2014-04-16 19:00 /user/huser/in/test1.txt
    -rw-r--r-- 1 huser supergroup 13 2014-04-16 19:00 /user/huser/in/test2.txt

    3、运行自带例子

    [huser@master hadoop-1.2.1]$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount in out
    14/04/16 19:02:53 INFO input.FileInputFormat: Total input paths to process : 2
    14/04/16 19:02:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    14/04/16 19:02:53 WARN snappy.LoadSnappy: Snappy native library not loaded
    14/04/16 19:02:54 INFO mapred.JobClient: Running job: job_201404161850_0001
    14/04/16 19:02:55 INFO mapred.JobClient: map 0% reduce 0%
    14/04/16 19:03:10 INFO mapred.JobClient: map 100% reduce 0%
    14/04/16 19:03:19 INFO mapred.JobClient: map 100% reduce 33%
    14/04/16 19:03:21 INFO mapred.JobClient: map 100% reduce 100%
    14/04/16 19:03:23 INFO mapred.JobClient: Job complete: job_201404161850_0001
    14/04/16 19:03:23 INFO mapred.JobClient: Counters: 30
    14/04/16 19:03:23 INFO mapred.JobClient: Job Counters 
    14/04/16 19:03:23 INFO mapred.JobClient: Launched reduce tasks=1
    14/04/16 19:03:23 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19368
    14/04/16 19:03:23 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
    14/04/16 19:03:23 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
    14/04/16 19:03:23 INFO mapred.JobClient: Rack-local map tasks=1
    14/04/16 19:03:23 INFO mapred.JobClient: Launched map tasks=2
    14/04/16 19:03:23 INFO mapred.JobClient: Data-local map tasks=1
    14/04/16 19:03:23 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11082
    14/04/16 19:03:23 INFO mapred.JobClient: File Output Format Counters 
    14/04/16 19:03:23 INFO mapred.JobClient: Bytes Written=25
    14/04/16 19:03:23 INFO mapred.JobClient: FileSystemCounters
    14/04/16 19:03:23 INFO mapred.JobClient: FILE_BYTES_READ=55
    14/04/16 19:03:23 INFO mapred.JobClient: HDFS_BYTES_READ=239
    14/04/16 19:03:23 INFO mapred.JobClient: FILE_BYTES_WRITTEN=169887
    14/04/16 19:03:23 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25
    14/04/16 19:03:23 INFO mapred.JobClient: File Input Format Counters 
    14/04/16 19:03:23 INFO mapred.JobClient: Bytes Read=25
    14/04/16 19:03:23 INFO mapred.JobClient: Map-Reduce Framework
    14/04/16 19:03:23 INFO mapred.JobClient: Map output materialized bytes=61
    14/04/16 19:03:23 INFO mapred.JobClient: Map input records=2
    14/04/16 19:03:23 INFO mapred.JobClient: Reduce shuffle bytes=61
    14/04/16 19:03:23 INFO mapred.JobClient: Spilled Records=8
    14/04/16 19:03:23 INFO mapred.JobClient: Map output bytes=41
    14/04/16 19:03:23 INFO mapred.JobClient: Total committed heap usage (bytes)=415633408
    14/04/16 19:03:23 INFO mapred.JobClient: CPU time spent (ms)=4060
    14/04/16 19:03:23 INFO mapred.JobClient: Combine input records=4
    14/04/16 19:03:23 INFO mapred.JobClient: SPLIT_RAW_BYTES=214
    14/04/16 19:03:23 INFO mapred.JobClient: Reduce input records=4
    14/04/16 19:03:23 INFO mapred.JobClient: Reduce input groups=3
    14/04/16 19:03:23 INFO mapred.JobClient: Combine output records=4
    14/04/16 19:03:23 INFO mapred.JobClient: Physical memory (bytes) snapshot=402755584
    14/04/16 19:03:23 INFO mapred.JobClient: Reduce output records=3
    14/04/16 19:03:23 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2173550592
    14/04/16 19:03:23 INFO mapred.JobClient: Map output records=4

    4、查看结果

    [huser@master hadoop-1.2.1]$ bin/hadoop fs -ls ./out/*
    -rw-r--r-- 1 huser supergroup 0 2014-04-16 19:03 /user/huser/out/_SUCCESS
    drwxr-xr-x - huser supergroup 0 2014-04-16 19:02 /user/huser/out/_logs/history
    -rw-r--r-- 1 huser supergroup 25 2014-04-16 19:03 /user/huser/out/part-r-00000
    
    [huser
    @master hadoop-1.2.1]$ bin/hadoop fs -cat ./out/part-r-00000 hadoop 1 hello 2 world 1
  • 相关阅读:
    C++ 学习笔记
    面向对象
    多线程
    Spring-扫描注解原理,注解自动扫描原理分析
    Eclipse 中报错的阅读顺序
    Eclipse 常用技巧及常见问题解决
    JAVA高级复习-自定义泛型类、泛型接口的注意点
    JAVA高级复习-泛型的使用
    IntelliJ IDEA学习笔记连载一IntelliJ IDEA中创建Maven工程
    JAVA高级复习-多线程的创建方式二
  • 原文地址:https://www.cnblogs.com/guarder/p/3702918.html
Copyright © 2020-2023  润新知