• hadoop记录topk


    lk@lk-virtual-machine:~$ cd hadoop-1.0.1
    lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin dfs -mkdir input
    bash: ./bin: 是一个文件夹
    lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop  dfs -mkdir input
    14/05/11 21:12:07 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 0 time(s).
    14/05/11 21:12:08 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 1 time(s).
    14/05/11 21:12:09 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 2 time(s).
    14/05/11 21:12:10 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 3 time(s).
    14/05/11 21:12:11 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 4 time(s).
    14/05/11 21:12:12 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 5 time(s).
    14/05/11 21:12:13 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 6 time(s).
    14/05/11 21:12:14 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 7 time(s).
    14/05/11 21:12:15 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 8 time(s).
    14/05/11 21:12:16 INFO ipc.Client: Retrying connect to server: localhost/127.0.0.1:9000. Already tried 9 time(s).
    Bad connection to FS. command aborted. exception: Call to localhost/127.0.0.1:9000 failed on connection exception: java.net.ConnectException: Connection refused
    lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop namenode -format
    14/05/11 21:12:48 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = lk-virtual-machine/127.0.1.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.0.1
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
    ************************************************************/
    14/05/11 21:12:48 INFO util.GSet: VM type       = 32-bit
    14/05/11 21:12:48 INFO util.GSet: 2% max memory = 19.33375 MB
    14/05/11 21:12:48 INFO util.GSet: capacity      = 2^22 = 4194304 entries
    14/05/11 21:12:48 INFO util.GSet: recommended=4194304, actual=4194304
    14/05/11 21:12:50 INFO namenode.FSNamesystem: fsOwner=lk
    14/05/11 21:12:50 INFO namenode.FSNamesystem: supergroup=supergroup
    14/05/11 21:12:50 INFO namenode.FSNamesystem: isPermissionEnabled=true
    14/05/11 21:12:50 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    14/05/11 21:12:50 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    14/05/11 21:12:50 INFO namenode.NameNode: Caching file names occuring more than 10 times
    14/05/11 21:12:50 INFO common.Storage: Image file of size 108 saved in 0 seconds.
    14/05/11 21:12:50 INFO common.Storage: Storage directory /tmp/hadoop-lk/dfs/name has been successfully formatted.
    14/05/11 21:12:50 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at lk-virtual-machine/127.0.1.1
    ************************************************************/
    lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop namenode -format
    14/05/11 21:13:12 INFO namenode.NameNode: STARTUP_MSG:
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = lk-virtual-machine/127.0.1.1
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.0.1
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r 1243785; compiled by 'hortonfo' on Tue Feb 14 08:15:38 UTC 2012
    ************************************************************/
    Re-format filesystem in /tmp/hadoop-lk/dfs/name ? (Y or N) n
    Format aborted in /tmp/hadoop-lk/dfs/name
    14/05/11 21:13:21 INFO namenode.NameNode: SHUTDOWN_MSG:
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at lk-virtual-machine/127.0.1.1
    ************************************************************/
    lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/start-all.sh
    starting namenode, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-namenode-lk-virtual-machine.out
    localhost: starting datanode, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-datanode-lk-virtual-machine.out
    localhost: starting secondarynamenode, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-secondarynamenode-lk-virtual-machine.out
    starting jobtracker, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-jobtracker-lk-virtual-machine.out
    localhost: starting tasktracker, logging to /home/lk/hadoop-1.0.1/libexec/../logs/hadoop-lk-tasktracker-lk-virtual-machine.out
    lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop dfs -mkdir input
    lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop dfs -put ~/input/file* input
    lk@lk-virtual-machine:~/hadoop-1.0.1$ javac -classpath hadoop-core-1.0.1.jar:lib/commons-cli-1.2.jar -d WordCount WordCount.java
    javac: 找不到文件: WordCount.java
    使用方法: javac <options> <source files>
    -help 用于列出可能的选项
    lk@lk-virtual-machine:~/hadoop-1.0.1$ javac -classpath hadoop-core-1.0.1.jar:lib/commons-cli-1.2.jar -d WordCount ~/WordCount.java
    lk@lk-virtual-machine:~/hadoop-1.0.1$ jar -cvf wordcount.jar -C WordCount .
    标明清单(manifest)
    添加:wordcount/(读入= 0) (写出= 0)(存储了 0%)
    添加:wordcount/WordCount$Map.class(读入= 1765) (写出= 771)(压缩了 56%)
    添加:wordcount/WordCount.class(读入= 1808) (写出= 963)(压缩了 46%)
    添加:wordcount/WordCount$Reduce.class(读入= 1741) (写出= 738)(压缩了 57%)
    lk@lk-virtual-machine:~/hadoop-1.0.1$ ./bin/hadoop jar wordcount.jar wordcount input output
    Exception in thread "main" java.lang.ClassNotFoundException: wordcount
        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:266)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
    lk@lk-virtual-machine:~/hadoop-1.0.1$
    lk@lk-virtual-machine:~/hadoop-1.0.1$
    lk@lk-virtual-machine:~/hadoop-1.0.1$
    lk@lk-virtual-machine:~/hadoop-1.0.1$
    lk@lk-virtual-machine:~/hadoop-1.0.1$
    lk@lk-virtual-machine:~/hadoop-1.0.1$
    lk@lk-virtual-machine:~/hadoop-1.0.1$ jps
    2598 SecondaryNameNode
    2341 DataNode
    2693 JobTracker
    2950 TaskTracker
    3247 Jps
    2061 NameNode
    lk@lk-virtual-machine:~/hadoop-1.0.1$ cd bin
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/Runer.jar wordcount input output
    Exception in thread "main" java.lang.ClassNotFoundException: wordcount
        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:266)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/Runer.jar WordCount input output
    Exception in thread "main" java.lang.NoClassDefFoundError: WordCount (wrong name: org/apache/hadoop/examples/WordCount)
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:634)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:277)
        at java.net.URLClassLoader.access$000(URLClassLoader.java:73)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:212)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:314)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:266)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:149)
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/Run.jar wordcount.WordCount  input output
    Exception in thread "main" java.net.UnknownHostException: unknown host: master01
        at org.apache.hadoop.ipc.Client$Connection.<init>(Client.java:214)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1192)
        at org.apache.hadoop.ipc.Client.call(Client.java:1046)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
        at sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:396)
        at org.apache.hadoop.ipc.RPC.getProxy(RPC.java:379)
        at org.apache.hadoop.hdfs.DFSClient.createRPCNamenode(DFSClient.java:119)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:238)
        at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:203)
        at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:89)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1386)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1404)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
        at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.addInputPath(FileInputFormat.java:372)
        at wordcount.WordCount.main(WordCount.java:61)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/Ru.jar wordcount.WordCount  input output
    14/05/11 22:43:50 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    ****hdfs://localhost:9000/user/lk/input
    14/05/11 22:43:52 INFO input.FileInputFormat: Total input paths to process : 4
    14/05/11 22:43:56 INFO mapred.JobClient: Running job: job_201405112114_0001
    14/05/11 22:43:57 INFO mapred.JobClient:  map 0% reduce 0%
    14/05/11 22:45:45 INFO mapred.JobClient:  map 50% reduce 0%
    14/05/11 22:46:59 INFO mapred.JobClient:  map 100% reduce 0%
    14/05/11 22:47:02 INFO mapred.JobClient:  map 100% reduce 16%
    14/05/11 22:47:05 INFO mapred.JobClient:  map 100% reduce 100%
    14/05/11 22:47:33 INFO mapred.JobClient: Job complete: job_201405112114_0001
    14/05/11 22:47:34 INFO mapred.JobClient: Counters: 29
    14/05/11 22:47:34 INFO mapred.JobClient:   Job Counters
    14/05/11 22:47:34 INFO mapred.JobClient:     Launched reduce tasks=1
    14/05/11 22:47:34 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=321173
    14/05/11 22:47:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/05/11 22:47:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/05/11 22:47:34 INFO mapred.JobClient:     Launched map tasks=4
    14/05/11 22:47:34 INFO mapred.JobClient:     Data-local map tasks=2
    14/05/11 22:47:34 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=84371
    14/05/11 22:47:34 INFO mapred.JobClient:   File Output Format Counters
    14/05/11 22:47:34 INFO mapred.JobClient:     Bytes Written=41
    14/05/11 22:47:34 INFO mapred.JobClient:   FileSystemCounters
    14/05/11 22:47:34 INFO mapred.JobClient:     FILE_BYTES_READ=104
    14/05/11 22:47:34 INFO mapred.JobClient:     HDFS_BYTES_READ=480
    14/05/11 22:47:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=106420
    14/05/11 22:47:34 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=41
    14/05/11 22:47:34 INFO mapred.JobClient:   File Input Format Counters
    14/05/11 22:47:34 INFO mapred.JobClient:     Bytes Read=50
    14/05/11 22:47:34 INFO mapred.JobClient:   Map-Reduce Framework
    14/05/11 22:47:34 INFO mapred.JobClient:     Map output materialized bytes=122
    14/05/11 22:47:34 INFO mapred.JobClient:     Map input records=2
    14/05/11 22:47:34 INFO mapred.JobClient:     Reduce shuffle bytes=122
    14/05/11 22:47:34 INFO mapred.JobClient:     Spilled Records=16
    14/05/11 22:47:34 INFO mapred.JobClient:     Map output bytes=82
    14/05/11 22:47:34 INFO mapred.JobClient:     CPU time spent (ms)=20800
    14/05/11 22:47:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=718479360
    14/05/11 22:47:34 INFO mapred.JobClient:     Combine input records=0
    14/05/11 22:47:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=430
    14/05/11 22:47:34 INFO mapred.JobClient:     Reduce input records=8
    14/05/11 22:47:34 INFO mapred.JobClient:     Reduce input groups=5
    14/05/11 22:47:34 INFO mapred.JobClient:     Combine output records=0
    14/05/11 22:47:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=550727680
    14/05/11 22:47:34 INFO mapred.JobClient:     Reduce output records=5
    14/05/11 22:47:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1881079808
    14/05/11 22:47:34 INFO mapred.JobClient:     Map output records=8
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop dfs -ls output
    Found 3 items
    -rw-r--r--   1 lk supergroup          0 2014-05-11 22:47 /user/lk/output/_SUCCESS
    drwxr-xr-x   - lk supergroup          0 2014-05-11 22:43 /user/lk/output/_logs
    -rw-r--r--   1 lk supergroup         41 2014-05-11 22:47 /user/lk/output/part-r-00000
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop dfs -cat output/part-r-00000
    Bye    1
    Goodbye    1
    Hadoop    2
    Hello    2
    World    2
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/topk.jar topk.TopK  input output
    14/05/11 23:00:26 INFO mapred.JobClient: Cleaning up the staging area hdfs://localhost:9000/tmp/hadoop-lk/mapred/staging/lk/.staging/job_201405112114_0002
    14/05/11 23:00:26 ERROR security.UserGroupInformation: PriviledgedActionException as:lk cause:org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already exists
    Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory output already exists
        at org.apache.hadoop.mapreduce.lib.output.FileOutputFormat.checkOutputSpecs(FileOutputFormat.java:137)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:889)
        at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:850)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:850)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
        at topk.TopK.run(TopK.java:86)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at topk.TopK.main(TopK.java:90)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:616)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop dfs -rmr output
    Deleted hdfs://localhost:9000/user/lk/output
    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$ ./hadoop jar ~/hadoop-1.0.1/topk.jar topk.TopK  input output
    ****hdfs://localhost:9000/user/lk/input
    14/05/11 23:01:28 INFO input.FileInputFormat: Total input paths to process : 4
    14/05/11 23:01:29 INFO mapred.JobClient: Running job: job_201405112114_0003
    14/05/11 23:01:30 INFO mapred.JobClient:  map 0% reduce 0%
    14/05/11 23:02:32 INFO mapred.JobClient:  map 50% reduce 0%
    14/05/11 23:02:34 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000000_0, Status : FAILED
    java.lang.ArrayIndexOutOfBoundsException: 8
        at topk.TopK$MapClass.map(TopK.java:43)
        at topk.TopK$MapClass.map(TopK.java:1)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

    14/05/11 23:02:36 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000001_0, Status : FAILED
    java.lang.ArrayIndexOutOfBoundsException: 8
        at topk.TopK$MapClass.map(TopK.java:43)
        at topk.TopK$MapClass.map(TopK.java:1)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

    14/05/11 23:02:39 INFO mapred.JobClient:  map 0% reduce 0%
    14/05/11 23:03:02 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000000_1, Status : FAILED
    14/05/11 23:03:02 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000001_1, Status : FAILED
    14/05/11 23:03:27 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000000_2, Status : FAILED
    java.lang.ArrayIndexOutOfBoundsException: 8
        at topk.TopK$MapClass.map(TopK.java:43)
        at topk.TopK$MapClass.map(TopK.java:1)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

    14/05/11 23:03:29 INFO mapred.JobClient: Task Id : attempt_201405112114_0003_m_000001_2, Status : FAILED
    java.lang.ArrayIndexOutOfBoundsException: 8
        at topk.TopK$MapClass.map(TopK.java:43)
        at topk.TopK$MapClass.map(TopK.java:1)
        at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:416)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1093)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

    14/05/11 23:03:59 INFO mapred.JobClient: Job complete: job_201405112114_0003
    14/05/11 23:04:00 INFO mapred.JobClient: Counters: 7
    14/05/11 23:04:00 INFO mapred.JobClient:   Job Counters
    14/05/11 23:04:00 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=255192
    14/05/11 23:04:00 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    14/05/11 23:04:00 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    14/05/11 23:04:00 INFO mapred.JobClient:     Launched map tasks=8
    14/05/11 23:04:00 INFO mapred.JobClient:     Data-local map tasks=8
    14/05/11 23:04:00 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
    14/05/11 23:04:00 INFO mapred.JobClient:     Failed map tasks=1

    lk@lk-virtual-machine:~/hadoop-1.0.1/bin$



    package topk;
    
    
    
    /**
     * Created with IntelliJ IDEA.
     * User: Isaac Li
     * Date: 12/4/12
     * Time: 5:48 PM
     * To change this template use File | Settings | File Templates.
     */
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.NullWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;
    
    import java.io.IOException;
    import java.util.TreeMap;
    
    //鍒╃敤MapReduce姹傛渶澶у€兼捣閲忔暟鎹�腑鐨凨涓�暟
    public class TopK extends Configured implements Tool {
    
        public static class MapClass extends Mapper<LongWritable, Text, NullWritable, Text> {
            public static final int K = 1;
            private TreeMap<Integer, Text> fatcats = new TreeMap<Integer, Text>();
            public void map(LongWritable key, Text value, Context context)
                    throws IOException, InterruptedException {
    
                String[] str = value.toString().split(",", -2);
                int temp = Integer.parseInt(str[8]);
                fatcats.put(temp, value);
                if (fatcats.size() > K)
                    fatcats.remove(fatcats.firstKey());
            }
            @Override
            protected void cleanup(Context context) throws IOException,  InterruptedException {
                for(Text text: fatcats.values()){
                    context.write(NullWritable.get(), text);
                }
            }
        }
    
        public static class Reduce extends Reducer<NullWritable, Text, NullWritable, Text> {
            public static final int K = 1;
            private TreeMap<Integer, Text> fatcats = new TreeMap<Integer, Text>();
            public void reduce(NullWritable key, Iterable<Text> values, Context context)
                    throws IOException, InterruptedException {
                for (Text val : values) {
                    String v[] = val.toString().split("	");
                    Integer weight = Integer.parseInt(v[1]);
                    fatcats.put(weight, val);
                    if (fatcats.size() > K)
                        fatcats.remove(fatcats.firstKey());
                }
                for (Text text: fatcats.values())
                    context.write(NullWritable.get(), text);
            }
        }
    
        public int run(String[] args) throws Exception {
            Configuration conf = getConf();
            Job job = new Job(conf, "TopK");
            job.setJarByClass(TopK.class);
            FileInputFormat.setInputPaths(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
            job.setMapperClass(MapClass.class);
           // job.setCombinerClass(Reduce.class);
            job.setReducerClass(Reduce.class);
            job.setInputFormatClass(TextInputFormat.class);
            job.setOutputFormatClass(TextOutputFormat.class);
            job.setOutputKeyClass(NullWritable.class);
            job.setOutputValueClass(Text.class);
            System.exit(job.waitForCompletion(true) ? 0 : 1);
            return 0;
        }
        public static void main(String[] args) throws Exception {
            int res = ToolRunner.run(new Configuration(), new TopK(), args);
            System.exit(res);
        }
    
    }



  • 相关阅读:
    《Python编程从入门到实践》学习笔记<7>:用户输入和while循环
    《Python编程从入门到实践》学习笔记<6>:字典
    《Python编程从入门到实践》学习笔记<5>:IF语句
    Navicat12激活,版本v12.1.18
    将博客搬至CSDN
    SVN中文提示
    SQL Server行转列
    .net操作AD域
    当经历过,你成长了,自己知道就好
    Outlook2010 POP3方式连接Hotmail等邮箱的错误处理
  • 原文地址:https://www.cnblogs.com/bhlsheji/p/4206076.html
Copyright © 2020-2023  润新知