用简单例子学习Hadoop,统计字符出现次数。
环境:OS:Centos 6.5 x64 & Soft:Hadoop 1.2.1
1、创建文件夹,写入字符到文本文件。
[huser@master hadoop-1.2.1]$ mkdir input [huser@master hadoop-1.2.1]$ echo "hello world" >test1.txt [huser@master hadoop-1.2.1]$ echo "hello hadoop" >test2.txt
2、写到hdfs中
[huser@master hadoop-1.2.1]$ bin/hadoop fs -put ../input ./in [huser@master hadoop-1.2.1]$ bin/hadoop fs -ls Found 1 items drwxr-xr-x - huser supergroup 0 2014-04-16 19:00 /user/huser/in [huser@master hadoop-1.2.1]$ bin/hadoop fs -ls ./in/* -rw-r--r-- 1 huser supergroup 12 2014-04-16 19:00 /user/huser/in/test1.txt -rw-r--r-- 1 huser supergroup 13 2014-04-16 19:00 /user/huser/in/test2.txt
3、运行自带例子
[huser@master hadoop-1.2.1]$ bin/hadoop jar hadoop-examples-1.2.1.jar wordcount in out 14/04/16 19:02:53 INFO input.FileInputFormat: Total input paths to process : 2 14/04/16 19:02:53 INFO util.NativeCodeLoader: Loaded the native-hadoop library 14/04/16 19:02:53 WARN snappy.LoadSnappy: Snappy native library not loaded 14/04/16 19:02:54 INFO mapred.JobClient: Running job: job_201404161850_0001 14/04/16 19:02:55 INFO mapred.JobClient: map 0% reduce 0% 14/04/16 19:03:10 INFO mapred.JobClient: map 100% reduce 0% 14/04/16 19:03:19 INFO mapred.JobClient: map 100% reduce 33% 14/04/16 19:03:21 INFO mapred.JobClient: map 100% reduce 100% 14/04/16 19:03:23 INFO mapred.JobClient: Job complete: job_201404161850_0001 14/04/16 19:03:23 INFO mapred.JobClient: Counters: 30 14/04/16 19:03:23 INFO mapred.JobClient: Job Counters 14/04/16 19:03:23 INFO mapred.JobClient: Launched reduce tasks=1 14/04/16 19:03:23 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=19368 14/04/16 19:03:23 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/04/16 19:03:23 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/04/16 19:03:23 INFO mapred.JobClient: Rack-local map tasks=1 14/04/16 19:03:23 INFO mapred.JobClient: Launched map tasks=2 14/04/16 19:03:23 INFO mapred.JobClient: Data-local map tasks=1 14/04/16 19:03:23 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=11082 14/04/16 19:03:23 INFO mapred.JobClient: File Output Format Counters 14/04/16 19:03:23 INFO mapred.JobClient: Bytes Written=25 14/04/16 19:03:23 INFO mapred.JobClient: FileSystemCounters 14/04/16 19:03:23 INFO mapred.JobClient: FILE_BYTES_READ=55 14/04/16 19:03:23 INFO mapred.JobClient: HDFS_BYTES_READ=239 14/04/16 19:03:23 INFO mapred.JobClient: FILE_BYTES_WRITTEN=169887 14/04/16 19:03:23 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=25 14/04/16 19:03:23 INFO mapred.JobClient: File Input Format Counters 14/04/16 19:03:23 INFO mapred.JobClient: Bytes Read=25 14/04/16 19:03:23 INFO mapred.JobClient: Map-Reduce Framework 14/04/16 19:03:23 INFO mapred.JobClient: Map output materialized bytes=61 14/04/16 19:03:23 INFO mapred.JobClient: Map input records=2 14/04/16 19:03:23 INFO mapred.JobClient: Reduce shuffle bytes=61 14/04/16 19:03:23 INFO mapred.JobClient: Spilled Records=8 14/04/16 19:03:23 INFO mapred.JobClient: Map output bytes=41 14/04/16 19:03:23 INFO mapred.JobClient: Total committed heap usage (bytes)=415633408 14/04/16 19:03:23 INFO mapred.JobClient: CPU time spent (ms)=4060 14/04/16 19:03:23 INFO mapred.JobClient: Combine input records=4 14/04/16 19:03:23 INFO mapred.JobClient: SPLIT_RAW_BYTES=214 14/04/16 19:03:23 INFO mapred.JobClient: Reduce input records=4 14/04/16 19:03:23 INFO mapred.JobClient: Reduce input groups=3 14/04/16 19:03:23 INFO mapred.JobClient: Combine output records=4 14/04/16 19:03:23 INFO mapred.JobClient: Physical memory (bytes) snapshot=402755584 14/04/16 19:03:23 INFO mapred.JobClient: Reduce output records=3 14/04/16 19:03:23 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2173550592 14/04/16 19:03:23 INFO mapred.JobClient: Map output records=4
4、查看结果
[huser@master hadoop-1.2.1]$ bin/hadoop fs -ls ./out/* -rw-r--r-- 1 huser supergroup 0 2014-04-16 19:03 /user/huser/out/_SUCCESS drwxr-xr-x - huser supergroup 0 2014-04-16 19:02 /user/huser/out/_logs/history -rw-r--r-- 1 huser supergroup 25 2014-04-16 19:03 /user/huser/out/part-r-00000
[huser@master hadoop-1.2.1]$ bin/hadoop fs -cat ./out/part-r-00000 hadoop 1 hello 2 world 1