• Hadoop-2.2.0中国文献——MapReduce 下一代 —配置单节点集群


    Mapreduce 包

    你需从公布页面获得MapReduce tar包。若不能。你要将源代码打成tar包。

    $ mvn clean install -DskipTests
    $ cd hadoop-mapreduce-project
    $ mvn clean install assembly:assembly -Pnative

    注意:你须要安装有protoc 2.5.0。

    忽略本地建立mapreduce。你能够在maven中省略-Pnative參数。

    tar包应该在target/directory。

    配置环境

    如果你已经安装hadoop-common/hadoop-hdfs,而且输出了$HADOOP_COMMON_HOME/$HADOOP_HDFS_HOME,解压hadoop mapreduce 包,配置环境变量$HADOOP_MAPRED_HOME到要安装的文件夹。$HADOOP_YARN_HOME的配置和 $HADOOP_MAPRED_HOME一样.

    注意:以下的操作如果你已经执行了hdfs。

    设置配置信息

    要启动ResourceManager and NodeManager, 你必须升级配置。如果你的 $HADOOP_CONF_DIR是配置文件夹。而且已经安装了HDFS和core-site.xml。还有2个配置文件你必须设置 mapred-site.xml 和yarn-site.xml.

    设置 mapred-site.xml

    加入以下的配置到你的mapred-site.xml.

    <property>
        <name>mapreduce.cluster.temp.dir</name>
        <value></value>
        <description>No description</description>
        <final>true</final>
      </property>
    
      <property>
        <name>mapreduce.cluster.local.dir</name>
        <value></value>
        <description>No description</description>
        <final>true</final>
      </property>

    设置 yarn-site.xml

    加入以下的配置到你的yarn-site.xml.

    <property>
        <name>yarn.resourcemanager.resource-tracker.address</name>
        <value>host:port</value>
        <description>host is the hostname of the resource manager and 
        port is the port on which the NodeManagers contact the Resource Manager.
        </description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.address</name>
        <value>host:port</value>
        <description>host is the hostname of the resourcemanager and port is the port
        on which the Applications in the cluster talk to the Resource Manager.
        </description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
        <description>In case you do not want to use the default scheduler</description>
      </property>
    
      <property>
        <name>yarn.resourcemanager.address</name>
        <value>host:port</value>
        <description>the host is the hostname of the ResourceManager and the port is the port on
        which the clients can talk to the Resource Manager. </description>
      </property>
    
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value></value>
        <description>the local directories used by the nodemanager</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.address</name>
        <value>0.0.0.0:port</value>
        <description>the nodemanagers bind to this port</description>
      </property>  
    
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>10240</value>
        <description>the amount of memory on the NodeManager in GB</description>
      </property>
     
      <property>
        <name>yarn.nodemanager.remote-app-log-dir</name>
        <value>/app-logs</value>
        <description>directory on hdfs where the application logs are moved to </description>
      </property>
    
       <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value></value>
        <description>the directories used by Nodemanagers as log directories</description>
      </property>
    
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
        <description>shuffle service that needs to be set for Map Reduce to run </description>
      </property>


    设置 capacity-scheduler.xml

    确保你放置根队列到capacity-scheduler.xml.

     <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>unfunded,default</value>
      </property>
      
      <property>
        <name>yarn.scheduler.capacity.root.capacity</name>
        <value>100</value>
      </property>
      
      <property>
        <name>yarn.scheduler.capacity.root.unfunded.capacity</name>
        <value>50</value>
      </property>
      
      <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>50</value>
      </property>

    执行守护进程

    如果环境变量 $HADOOP_COMMON_HOME$HADOOP_HDFS_HOME$HADOO_MAPRED_HOME$HADOOP_YARN_HOME,$JAVA_HOME 和 $HADOOP_CONF_DIR 已经设置正确。$$YARN_CONF_DIR 的设置同 $HADOOP_CONF_DIR。

    执行ResourceManager 和 NodeManager 例如以下:

    $ cd $HADOOP_MAPRED_HOME
    $ sbin/yarn-daemon.sh start resourcemanager
    $ sbin/yarn-daemon.sh start nodemanager

    你应该启动和执行。你能够执行randomwriter例如以下:

    $ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out

    祝你好运。

  • 相关阅读:
    Eclipse插件
    Android res文件夹下新建layout文件夹出错:invalid resource directory name
    Java笔记一:斐波那契数列
    Android应用的启动界面
    android短信系列之实现发送短信,并获得发送报告与接收报告
    转:android 使用html5作布局文件
    ubuntu10.10 全自动安装微软雅黑字体
    gcswf32.dll已停用
    Android连接真机之中兴
    在Servlet中连接Access
  • 原文地址:https://www.cnblogs.com/yxwkf/p/5037435.html
Copyright © 2020-2023  润新知