• MapReduce案例运行


    从《Hadoop权威指南》选取了一个小案例,在Hadoop集群环境中运行。

    1、新建JAVA类,保存书中源代码。

    [huser@master bin]$ vi URLCat.java
    import java.io.InputStream;
    import java.net.URL;
    
    import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
    import org.apache.hadoop.io.IOUtils;
    
    public class URLCat {
    
            static {
                    URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
            }
    
            public static void main(String[] args) throws Exception {
                    InputStream in = null;
                    try {
                            in = new URL(args[0]).openStream();
                            IOUtils.copyBytes(in, System.out, 4096, false);
                    } finally {
                            IOUtils.closeStream(in);
                    }
            }
    }
    
    ~
    "URLCat.java" [新] 23L, 481C 已写入                            

    2、编译JAVA类。

    [huser@master bin]$ javac URLCat.java 
    URLCat.java:4: 错误: 程序包org.apache.hadoop.fs不存在
    import org.apache.hadoop.fs.FsUrlStreamHandlerFactory;
                               ^
    URLCat.java:5: 错误: 程序包org.apache.hadoop.io不存在
    import org.apache.hadoop.io.IOUtils;
                               ^
    URLCat.java:10: 错误: 找不到符号
                    URL.setURLStreamHandlerFactory(new FsUrlStreamHandlerFactory());
                                                       ^
      符号:   类 FsUrlStreamHandlerFactory
      位置: 类 URLCat
    URLCat.java:17: 错误: 找不到符号
                            IOUtils.copyBytes(in, System.out, 4096, false);
                            ^
      符号:   变量 IOUtils
      位置: 类 URLCat
    URLCat.java:19: 错误: 找不到符号
                            IOUtils.closeStream(in);
                            ^
      符号:   变量 IOUtils
      位置: 类 URLCat
    5 个错误

    这是因为找不到编译需要加载的类库,指定编译的类库路径。

    [huser@master bin]$ javac -classpath ../hadoop-core-1.2.1.jar URLCat.java 
    [huser@master bin]$ ll
    总用量 152
    -rwxr-xr-x 1 huser huser 15147 7月  23 2013 hadoop
    -rwxr-xr-x 1 huser huser  2643 7月  23 2013 hadoop-config.sh
    -rwxr-xr-x 1 huser huser  5064 7月  23 2013 hadoop-daemon.sh
    -rwxr-xr-x 1 huser huser  1329 7月  23 2013 hadoop-daemons.sh
    -rwxr-xr-x 1 huser huser  2810 7月  23 2013 rcc
    -rwxr-xr-x 1 huser huser  2050 7月  23 2013 slaves.sh
    -rwxr-xr-x 1 huser huser  1166 7月  23 2013 start-all.sh
    -rwxr-xr-x 1 huser huser  1065 7月  23 2013 start-balancer.sh
    -rwxr-xr-x 1 huser huser  1745 7月  23 2013 start-dfs.sh
    -rwxr-xr-x 1 huser huser  1145 7月  23 2013 start-jobhistoryserver.sh
    -rwxr-xr-x 1 huser huser  1259 7月  23 2013 start-mapred.sh
    -rwxr-xr-x 1 huser huser  1119 7月  23 2013 stop-all.sh
    -rwxr-xr-x 1 huser huser  1116 7月  23 2013 stop-balancer.sh
    -rwxr-xr-x 1 huser huser  1246 7月  23 2013 stop-dfs.sh
    -rwxr-xr-x 1 huser huser  1131 7月  23 2013 stop-jobhistoryserver.sh
    -rwxr-xr-x 1 huser huser  1168 7月  23 2013 stop-mapred.sh
    -rwxr-xr-x 1 huser huser 63598 7月  23 2013 task-controller
    -rw-rw-r-- 1 huser huser  1021 4月  17 23:09 URLCat.class
    -rw-rw-r-- 1 huser huser   481 4月  17 23:04 URLCat.java

    编译成功为CLASS。

    3、运行程序

    [huser@master bin]$ ../bin/hadoop URLCat hdfs://master/user/huser/in/test2.txt
    Warning: $HADOOP_HOME is deprecated.
    
    14/04/17 23:34:37 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:38 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:39 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:40 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:41 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:42 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:43 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:44 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:45 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    14/04/17 23:34:46 INFO ipc.Client: Retrying connect to server: master/192.168.1.115:8020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1 SECONDS)
    Exception in thread "main" java.net.ConnectException: Call to master/192.168.1.115:8020 failed on connection exception: java.net.ConnectException: 拒绝连接
            at org.apache.hadoop.ipc.Client.wrapException(Client.java:1142)
            at org.apache.hadoop.ipc.Client.call(Client.java:1118)
            at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
            at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:606)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:85)
            at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:62)
            at com.sun.proxy.$Proxy1.getProtocolVersion(Unknown Source)
            at org.apache.hadoop.ipc.RPC.checkVersion(RPC.java:422)
            at org.apache.hadoop.hdfs.DFSClient.createNamenode(DFSClient.java:183)
            at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:281)
            at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:245)
            at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:100)
            at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1446)
            at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:67)
            at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1464)
            at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:263)
            at org.apache.hadoop.fs.FsUrlConnection.connect(FsUrlConnection.java:45)
            at org.apache.hadoop.fs.FsUrlConnection.getInputStream(FsUrlConnection.java:56)
            at java.net.URL.openStream(URL.java:1037)
            at URLCat.main(URLCat.java:16)
    Caused by: java.net.ConnectException: 拒绝连接
            at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
            at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
            at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
            at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
            at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
            at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:457)
            at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:583)
            at org.apache.hadoop.ipc.Client$Connection.access$2200(Client.java:205)
            at org.apache.hadoop.ipc.Client.getConnection(Client.java:1249)
            at org.apache.hadoop.ipc.Client.call(Client.java:1093)
            ... 22 more

    这是因为连接失败,需要检查HDFS环境。

    [huser@master conf]$ cat core-site.xml 
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
     <name>fs.default.name</name>
     <value>hdfs://master:9000</value>
    </property>

    端口是9000,不是默认值。

    [huser@master bin]$ ../bin/hadoop URLCat hdfs://master:9000/user/huser/in/test2.txt
    Warning: $HADOOP_HOME is deprecated.
    
    hello hadoop

    运行成功。

  • 相关阅读:
    最舒适的路线(并查集)
    POJ 2411 状态压缩DP
    NYOJ 708 ones
    HUD 1024 Max Sum Plus Plus
    最长上升子序列
    HDU 4717 The Moving Points
    重新开始写随笔
    读书的意义
    读《如何阅读一本书》笔记
    读《GRAY HAT PYTHON》笔记
  • 原文地址:https://www.cnblogs.com/guarder/p/3721952.html
Copyright © 2020-2023  润新知