• win8下使用eclipse进行hadoop2.6.2开发


    最近在win平台下使用eclipse Mars做在远程linux上的hadoop2.6开发,出现很多问题,让人心力交瘁,在经过不懈努力后,终于解决了,让人欢欣雀跃。

    1、安装JDK

    在做hadoop2.6最好使用jdk7版本的,下载后安装。

    2、下载eclipse

    去http://www.eclipse.org/downloads/ 下载你需要的版本,我们这里下载的是win64位版。直接解压到目录中。进行简单设置,根据你的开发需要,选择jdk的版本

    3、安装Hadoop2.6.0-eclipse-plugin

    去https://github.com/winghc/hadoop2x-eclipse-plugin,在下载zip包后,在release目录中,有hadoop-eclipse-plugin-2.6.0.jar可以直接使用,不用在行编译。如使用其他版本,请参考其他文档。直接将hadoop-eclipse-plugin-2.6.0.jar复制到eclipse的plugins目录中即可。

    4、安装Hadoop windows插件

    包括hadoop.dll、winutils.exe。下载地址:https://github.com/srccodes/hadoop-common-2.2.0-bin

    下载解压后,还需要配置环境变量,HADOOP_HOME = 解压目录,Path后增加 %HADOOP_HOME%in;

    5、在eclipse中设置hadoop开发插件

    打开eclipse,设置好工作区域后,点击

    选择下图中红色标记,

    在eclipse的工作区间中,左上角和下方会出现图中标记,如果出现,则说明前几步你都正确了。接下来对插件进行设置

    6、设置hadoop插件

    在eclipse菜单中选择,window - preferences,打开设置菜单

    这里所用的hadoop版本需要和你linux上安装的hadoop版本一致,开发的时候插件会在这个目录中获取需要的开发包。设置完成后保存。

    注:需要用二进制包,不用修改配置文件

    设置完成后,就可以在eclipse的右上角看到你的hadoop的目录结构了。

    7、上面只是点毛毛雨,下面才真正开始重点了,如何进行开发,我们使用hadoop的wordcount来做测试。

    创建mr项目

    设置项目名称

    创建类

    设置类属性

    创建完成后,将hadoop-2.6.2-srchadoop-mapreduce-projecthadoop-mapreduce-examplessrcmainjavaorgapachehadoopexamples目录下的WordCount.java文件内容,copy到刚创建的文件中。

    8、接下来创建配置环境

    在项目中,再创建一个Source Folder,名字叫resources,把你集群里的hadoop配置文件(etc/hadoop)拷贝到这个目录中,包括log4j.properties、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml。这几个配置文件除log4j.properties外,其他的因个人需求不一样而不同,但必须包括以下内容。

    core-site.xml

    <configuration>
          <property>
              <name>fs.defaultFS</name>
              <value>hdfs://192.168.114.128:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/zkpk/hadoop_data/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://192.168.114.128:9000</value>
        </property>
    </configuration>

    hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/zkpk/hadoop_data/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/home/zkpk/hadoop_data/dfs/data</value>
        </property>
    </configuration>

    mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
          <description>远程开发必须</description>
        </property>
        <property>
            <name>mapred.remote.os</name>
            <value>Linux</value>
            <description>远程开发必须</description>
        </property>
        <property>
            <name>mapreduce.app-submission.cross-platform</name>
            <value>true</value>
            <description>远程开发必须</description>
        </property>
        <property>
            <name>mapreduce.application.classpath</name>
            <value>
                /home/zkpk/hadoop-2.6.2/etc/hadoop,
                /home/zkpk/hadoop-2.6.2/share/hadoop/common/*,
                /home/zkpk/hadoop-2.6.2/share/hadoop/common/lib/*,
                /home/zkpk/hadoop-2.6.2/share/hadoop/hdfs/*,
                /home/zkpk/hadoop-2.6.2/share/hadoop/hdfs/lib/*,
                /home/zkpk/hadoop-2.6.2/share/hadoop/mapreduce/*,
                /home/zkpk/hadoop-2.6.2/share/hadoop/mapreduce/lib/*,
                /home/zkpk/hadoop-2.6.2/share/hadoop/yarn/*,
                /home/zkpk/hadoop-2.6.2/share/hadoop/yarn/lib/*
            </value>
            <description>远程开发必须,制定远程目录上</description>
        </property>    
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>192.168.114.128:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>192.168.114.128:19888</value>
        </property>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
            <description>远程开发必须</description>
        </property>
        <property>
            <name>mapred.job.tracker</name>
            <value>192.168.114.128:9001</value>
            <description>远程开发必须</description>
        </property>
    </configuration>

    yarn-site.xml

    <configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>远程开发必须</description>
      </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>192.168.114.128:8032</value>
        </property>
    </configuration>

    以上完成后,即完成开发环境配置,接下来试试运行是否成功。

    完成后,直接点击运行即可

    看结果,如下就恭喜你成功了

    15/12/18 09:14:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/12/18 09:14:13 INFO client.RMProxy: Connecting to ResourceManager at /192.168.114.128:8032
    15/12/18 09:14:14 WARN mapreduce.JobResourceUploader: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
    15/12/18 09:14:14 INFO input.FileInputFormat: Total input paths to process : 2
    15/12/18 09:14:14 INFO mapreduce.JobSubmitter: number of splits:2
    15/12/18 09:14:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1450342418109_0004
    15/12/18 09:14:14 INFO mapred.YARNRunner: Job jar is not present. Not adding any jar to the list of resources.
    15/12/18 09:14:14 INFO impl.YarnClientImpl: Submitted application application_1450342418109_0004
    15/12/18 09:14:14 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1450342418109_0004/
    15/12/18 09:14:14 INFO mapreduce.Job: Running job: job_1450342418109_0004
    15/12/18 09:14:21 INFO mapreduce.Job: Job job_1450342418109_0004 running in uber mode : false
    15/12/18 09:14:21 INFO mapreduce.Job:  map 0% reduce 0%
    15/12/18 09:14:31 INFO mapreduce.Job:  map 100% reduce 0%
    15/12/18 09:14:39 INFO mapreduce.Job:  map 100% reduce 100%
    15/12/18 09:14:40 INFO mapreduce.Job: Job job_1450342418109_0004 completed successfully
    15/12/18 09:14:40 INFO mapreduce.Job: Counters: 49
    	File System Counters
    		FILE: Number of bytes read=79
    		FILE: Number of bytes written=320590
    		FILE: Number of read operations=0
    		FILE: Number of large read operations=0
    		FILE: Number of write operations=0
    		HDFS: Number of bytes read=268
    		HDFS: Number of bytes written=41
    		HDFS: Number of read operations=9
    		HDFS: Number of large read operations=0
    		HDFS: Number of write operations=2
    	Job Counters 
    		Launched map tasks=2
    		Launched reduce tasks=1
    		Data-local map tasks=2
    		Total time spent by all maps in occupied slots (ms)=16220
    		Total time spent by all reduces in occupied slots (ms)=4519
    		Total time spent by all map tasks (ms)=16220
    		Total time spent by all reduce tasks (ms)=4519
    		Total vcore-seconds taken by all map tasks=16220
    		Total vcore-seconds taken by all reduce tasks=4519
    		Total megabyte-seconds taken by all map tasks=16609280
    		Total megabyte-seconds taken by all reduce tasks=4627456
    	Map-Reduce Framework
    		Map input records=2
    		Map output records=8
    		Map output bytes=82
    		Map output materialized bytes=85
    		Input split bytes=218
    		Combine input records=8
    		Combine output records=6
    		Reduce input groups=5
    		Reduce shuffle bytes=85
    		Reduce input records=6
    		Reduce output records=5
    		Spilled Records=12
    		Shuffled Maps =2
    		Failed Shuffles=0
    		Merged Map outputs=2
    		GC time elapsed (ms)=385
    		CPU time spent (ms)=1660
    		Physical memory (bytes) snapshot=460673024
    		Virtual memory (bytes) snapshot=6179151872
    		Total committed heap usage (bytes)=259063808
    	Shuffle Errors
    		BAD_ID=0
    		CONNECTION=0
    		IO_ERROR=0
    		WRONG_LENGTH=0
    		WRONG_MAP=0
    		WRONG_REDUCE=0
    	File Input Format Counters 
    		Bytes Read=50
    	File Output Format Counters 
    		Bytes Written=41

    错误解说:

    1、WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).

    INFO mapreduce.Job: Task Id : attempt_1450852806248_0029_m_000000_0, Status : FAILED
    Error: java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.hadoop.mr.WordCount$TokenizerMapper not found

    是由于没有jar上传到集群,所以mr的时候回出现找不到类的情况,需要再程序中增加以下语句

    在mian函数的Configuration conf = new Configuration();后增加conf.set("mapred.jar", jar打包所在的地址);

    例:conf.set("mapred.jar", "E:\workspace\WordCountTest\bin\org\apache\hadoop\mr\WordCountTest.jar");

  • 相关阅读:
    mwutil公用方法库——dateTool
    mwutil公用方法库文档——cookieTool
    mwutil 公用方法库文档 ——arrayTool
    localstorage封装!
    github commit 规范
    你好,C++(40)7.1 一切指针都是纸老虎:彻底理解指针
    你好,C++(39)6.4.4 依葫芦画瓢:用C++表达设计结果(下)
    你好,C++(38)从问题描述中发现对象的属性和行为 6.4 工资程序成长记:类与对象(上)
    你好,C++(37)上车的人请买票!6.3.3 用虚函数实现多态
    你好,C++(36)人参再好,也不能当饭吃!6.3 类是如何面向对象的
  • 原文地址:https://www.cnblogs.com/qpanda/p/5054857.html
Copyright © 2020-2023  润新知