• Hadoop Yarn 安装


    环境:Linux, 8G 内存。60G 硬盘 , Hadoop 2.2.0

    为了构建基于Yarn体系的Spark集群。先要安装Hadoop集群,为了以后查阅方便记录了我本次安装的详细步骤。

    事前准备

    1. 机器准备

    三台主机,#后面说明了用途
    • 192.168.1.1   #hadoop1 : master
    • 192.168.1.2   #hadoop2 : datanode1
    • 192.168.1.3   #hadoop3:  datanode2

    在hadoop1上, vi /etc/sysconfig/network,改动HOSTNAME=hadoop1
    在hadoop2上, vi /etc/sysconfig/network,改动HOSTNAME=hadoop2
    在hadoop3上, vi /etc/sysconfig/network,改动HOSTNAME=hadoop3

    在三台机器上,在/etc/hosts末尾加入
    • 192.168.1.1   hadoop1
    • 192.168.1.2   hadoop2
    • 192.168.1.3   hadoop3

    在hadoop1上, 执行 hostname hadoop1
    在hadoop2上。 执行 hostname hadoop2
    在hadoop3上, 执行 hostname hadoop3

    exit重连之后,hostname 就会变成hadoop[1-3],这样做的优点是ssh hadoop2 会自己主动解析连接192.168.1.2。方便以后使用。

    这也是短域名实现的方式。


    2. 文件夹创建

    $mkdir -p /hadoop/hdfs
    $mkdir -p /hadoop/tmp
    $mkdir -p /hadoop/log
    $mkdir -p /usr/java                               ###java安装路径
    $mkdir -p /usr/hadoop                             ###hadoop安装路径
    $chmod -R 777 /hadoop

    能够依据自己的情况确定安装路径。

    安装Java

    1. 下载JDK,并安装。建议安装JDK 1.7。

    本次下载 jdk-7u60-linux-x64.tar.gz

    http://www.oracle.com/technetwork/java/javase/downloads/index.html

    $tar -zxvf jdk-7u60-linux-x64.tar.gz
    $mv jdk1.7.0_60 java
    
    
    凝视:下载的java 包类型不同。安装略有不同。
    2. 配置Java 环境
    能够改动/etc/profile,也能够改动自己home文件夹下的~/.profile(ksh)或者~/.bash_profile(bash),本次安装是bash。所以在.bash_profile 末尾加入
    export JAVA_HOME=/usr/java
    export CLASSPATH=.:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/lib/dt.jar
    export PATH=$JAVA_HOME/bin:$PATH
    使环境马上生效。运行
    $source .bash_profile

    3. 检查Java是否成功安装

    $ java -version
    java version "1.7.0_60"
    Java(TM) SE Runtime Environment (build 1.7.0_60-b19)
    Java HotSpot(TM) 64-Bit Server VM (build 24.60-b09, mixed mode)

    配置SSH 无password登录

    hadoop1 上

    $ mkdir .ssh
    $ cd .ssh
    $ ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/export/home/zilzhang/.ssh/id_rsa): 
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in ~/.ssh/id_rsa.
    Your public key has been saved in ~/.ssh/id_rsa.pub.
    The key fingerprint is:
    b0:76:89:6a:44:8b:cd:fc:23:a4:3f:69:55:3f:83:e3 ...
    $ ls -lrt
    total 2
    -rw-------   1     887 Jun 30 02:10 id_rsa
    -rw-r--r--   1     232 Jun 30 02:10 id_rsa.pub
    $ touch authorized_keys 
    $ cat id_rsa.pub >> authorized_keys
    

    hadoop2和hadoop3上。相同生成公钥和私钥。


    [hadoop2]$ mv id_rsa.pub pub2
    [hadoop3]$ mv id_rsa.pub pub3

    把pub2,pub3都scp到hadoop1上,然后

    $ cat pub2 >> authorized_keys
    $ cat pub3 >> authorized_keys


    把authorized_keys scp到hadoop2和hadoop3上。这样就能够免password登录了。

     一言以蔽之,就是在每台node上生成公钥和私钥,把全部公钥的内容汇总成authorized_keys,并把authorized_keys分发到集群全部node上同样的文件夹,这样每一个node都拥有整个集群node的公钥。互相之间就能够免password登录了。


    验证免password登录。在hadoop1上:

    $ ssh haoop1
    ssh: Could not resolve hostname haoop1: Name or service not known
    [zilzhang@hadoop3 hadoop]$ ssh hadoop1
    The authenticity of host 'hadoop1 (192.168.1.1)' can't be established.
    RSA key fingerprint is 18:85:c6:50:0c:15:36:9c:55:34:d7:ab:0e:1c:c7:0f.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'hadoop1' (RSA) to the list of known hosts.
    
            #################################################################
            #                                                               #
            #       This system is for the use of authorized users only.    #
            #       Individuals using this computer system without          #
            #       authority, or in excess of their authority, are         #
            #       subject to having all of their activities on this       #
            #       system monitored and recorded by system personnel.      #
            #                                                               #
            #       In the course of monitoring individuals improperly      #
            #       using this system, or in the course of system           #
            #       maintenance, the activities of authorized users         #
            #       may also be monitored.                                  #
            #                                                               #
            #       Anyone using this system expressly consents to such     #
            #       monitoring and is advised that if such monitoring       #
            #       reveals possible evidence of criminal activity,         #
            #       system personnel may provide the evidence of such       #
            #       monitoring to law enforcement officials.                #
            #                                                               #
            #       This system/database contains restricted data.          #
            #                                                               #
            #################################################################
    
    [hadoop1 ~]$


     安装Hadoop

    1. 下载与解压(全部节点)
    $ wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0.tar.gz 
    $ tar -zxvf hadoop-2.2.0.tar.gz
    $ mv hadoop-2.2.0 /usr/hadoop

    下面都执行在haoop1上
    2. 配置环境变量,在.bash_profile末尾加入

    export HADOOP_HOME=/usr/hadoop
    export HADOOP_MAPARED_HOME=${HADOOP_HOME}
    export HADOOP_COMMON_HOME=${HADOOP_HOME}
    export HADOOP_HDFS_HOME=${HADOOP_HOME}
    export YARN_HOME=${HADOOP_HOME}
    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop
    export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

    source .bash_profile
    3. $HADOOP_HOME/etc/hadoop/hadoop-env.sh,末尾加入
    export JAVA_HOME=/usr/java/java

    4. $HADOOP_HOME/etc/hadoop/core-site.xml 加入

    <property>
     <name>hadoop.tmp.dir</name>
     <value>/hadoop/tmp</value>
     <description>A base for other temporary directories.</description>
    </property>
     <property>
     <name>fs.default.name</name>
     <value>hdfs://192.168.1.1:9000</value>
    </property>

    5. $HADOOP_HOME/etc/hadoop/slaves 内容变为(datanode)

    192.168.1.2

    192.168.1.3


    6. $HADOOP_HOME/etc/hadoop/hdfs-site.xml 加入

    <property>
     <name>dfs.replication</name>
     <value>3</value>
    </property>
     
    <property>
     <name>dfs.namenode.name.dir</name>
     <value>file:/hadoop/hdfs/name</value>
     <final>true</final>
    </property>
     
    <property>
     <name>dfs.federation.nameservice.id</name>
     <value>ns1</value>
    </property>
     
    <property>
     <name>dfs.namenode.backup.address.ns1</name>
     <value>192.168.1.1:50100</value>
    </property>
     
    <property>
     <name>dfs.namenode.backup.http-address.ns1</name>
     <value>192.168.1.1:50105</value>
    </property>
     
    <property>
     <name>dfs.federation.nameservices</name>
     <value>ns1</value>
    </property>
     
    <property>
     <name>dfs.namenode.rpc-address.ns1</name>
     <value>192.168.1.1:9000</value>
    </property>
    <property>
     <name>dfs.namenode.rpc-address.ns2</name>
     <value>192.168.1.1:9000</value>
    </property>
     
    <property>
     <name>dfs.namenode.http-address.ns1</name>
     <value>192.168.1.1:23001</value>
    </property>
     
    <property>
     <name>dfs.namenode.http-address.ns2</name>
     <value>192.168.1.1:13001</value>
    </property>
     
    <property>
     <name>dfs.dataname.data.dir</name>
     <value>file:/hadoop/hdfs/data</value>
     <final>true</final>
    </property>
     
    <property>
     <name>dfs.namenode.secondary.http-address.ns1</name>
     <value>192.168.1.1:23002</value>
    </property>
     
    <property>
     <name>dfs.namenode.secondary.http-address.ns2</name>
     <value>192.168.1.1:23002</value>
    </property>
     
    <property>
     <name>dfs.namenode.secondary.http-address.ns1</name>
     <value>192.168.1.1:23003</value>
    </property>
     
    <property>
     <name>dfs.namenode.secondary.http-address.ns2</name>
     <value>192.168.1.1:23003</value>
    </property>


    7. $HADOOP_HOME/etc/hadoop/yarn-site.xml 加入

    <property>
     <name>yarn.resourcemanager.address</name>
     <value>192.168.1.1:18040</value>
    </property>
     
    <property>
     <name>yarn.resourcemanager.scheduler.address</name>
     <value>192.168.1.1:18030</value>
    </property>
     
    <property>
     <name>yarn.resourcemanager.webapp.address</name>
     <value>192.168.1.1:50030</value>
    </property>
     
    <property>
     <name>yarn.resourcemanager.resource-tracker.address</name>
     <value>192.168.1.1:18025</value>
    </property>
     
    <property>
     <name>yarn.resourcemanager.admin.address</name>
     <value>192.168.1.1:18141</value>
    </property>
     
    <property>
     <name>yarn.nodemanager.aux-services</name>
     <value>mapreduce_shuffle</value>
    </property>

    <property> 
        <name>yarn.web-proxy.address</name> 
          <value>hadoop1-9014.lvs01.dev.ebayc3.com:54315</value> 
    </property> 

    8. $HADOOP_HOME/etc/hadoop/httpfs-site.xml 加入

    <property>
     <name>hadoop.proxyuser.root.hosts</name>
     <value>192.168.1.1</value>
    </property>
     <property>
     <name>hadoop.proxyuser.root.groups</name>
     <value>*</value>
    </property>


    9. $HADOOP_HOME/etc/hadoop/mapred-site.xml 加入(配置job提交到yarn上而且配置history log server)

    <property>
       <name>mapreduce.framework.name</name>
       <value>yarn</value>
       <description>Execution framework set to Hadoop YARN.</description>
    </property>

    <property>
     <name>mapreduce.jobhistory.address</name>
      <value>hadoop1-9014.lvs01.dev.ebayc3.com:10020</value>
    </property>
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>hadoop1-9014.lvs01.dev.ebayc3.com:19888</value>
    </property>
    <property>
      <name>mapreduce.jobhistory.intermediate-done-dir</name>
      <value>/log/tmp</value>
    </property>
    <property>
      <name>mapreduce.jobhistory.done-dir</name>
      <value>/log/history</value>
    </property>

    这个是说明把job放到yarn 上去跑。

    10. 配置同步到其它datanode上

    $ scp ~/.bash_profile hadoop2:~/.bash_profile
    $ scp $HADOOP_HOME/etc/hadoop/hadoop-env.sh hadoop2:$HADOOP_HOME/etc/hadoop/
    $ scp $HADOOP_HOME/etc/hadoop/core-site.xml hadoop2:$HADOOP_HOME/etc/hadoop/
    $ scp $HADOOP_HOME/etc/hadoop/slaves hadoop2:$HADOOP_HOME/etc/hadoop/
    $ scp $HADOOP_HOME/etc/hadoop/hdfs-site.xml hadoop2:$HADOOP_HOME/etc/hadoop/
    $ scp $HADOOP_HOME/etc/hadoop/yarn-site.xml hadoop2:$HADOOP_HOME/etc/hadoop/
    $ scp $HADOOP_HOME/etc/hadoop/httpfs-site.xml hadoop2:$HADOOP_HOME/etc/hadoop/
    $ scp $HADOOP_HOME/etc/hadoop/mapred-site.xml hadoop2:$HADOOP_HOME/etc/hadoop/

    把hadoop2改成hadoop3,,把配置同步到hadoop3上

    启动Hadoop集群

    1. 格式化

    hadoop namenode -format

    2. 启动hdfs

    start-dfs.sh

    3. 启动yarn

    start-yarn.sh
     
    4. 启动history server
     
    mr-jobhistory-daemon.sh start historyserver

    5. 启动proxy server
     
    yarn-daemons.sh start proxyserver

    httpfs.sh start
    6. 创建日志存放文件夹
    hadoop fs -mkdir -p /log/tmp
    hadoop fs -mkdir -p /log/history

    測试hadoop集群

    1. hadoop1,看看进程是否已经开启
    $ jps
    8606 NameNode
    4640 Bootstrap
    17007 Jps
    16077 ResourceManager
    8781 SecondaryNameNode
    这些进程必须都有
    2. 在hadoop2 上看进程是否开启
    $ jps
    5992 Jps
    5422 NodeManager
    3292 DataNode
    这些进程必须都有
    3. hadoop fs -ls /    看能否够列出文件
    4. 測试hadoop job
    hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar wordcount /input /output7

    假设执行正常,能够在job monitor页面看到job执行状况。



    总结:在安装过程中会遇到各种问题,这里不一一列示,以免太过啰嗦。



  • 相关阅读:
    DHCP服务器
    继承、抽象、多态
    范围随机数Random
    阿里、北理工、清华以及华为的镜像站
    创建kafka生产对象
    kafka消费者的配置
    Kafka 流数据 SQL 引擎 -- KSQL
    认证maven工程
    基础算法
    Java基础
  • 原文地址:https://www.cnblogs.com/zsychanpin/p/6917516.html
Copyright © 2020-2023  润新知