• Hadoop 分布式环境搭建


    一、前期环境

    • 安装概览
    IP Host Name Software Node
    192.168.23.128 ae01 JDK 1.7 NameNode, SecondaryNameNode, DataNode, JobTracker, TaskTracker
    192.168.23.129 ae02 JDK 1.7 DataNode, TaskTracker
    192.168.23.130 ae03 JDK 1.7 DataNode, TaskTracker
      若使用虚拟机安装,可以安装 samba, smbfs方便对于文件的控制。
    • 系统环境: ubuntu-12.04.2-server-amd64
    • 安装目录: /usr/local/ae
    • JDK 安装目录: export JAVA_HOME=/usr/local/ae/jdk1.7.0_51
    • Hadoop版本: hadoop-1.2.1

    二、服务器间无密码登录

    进入每台服务器,分别安装SSH,并生成ssh-key。下面安装步骤只讲解在ae01安装SSH和生成SSH-KEY,ae02、ae03重复此步骤。

    • 安装SSH
      user@ae01:~$ sudo apt-get install openssh-server
    • 生成SSH-KEY
      user@ae01:~# ssh-keygen -t rsa -P ""
      Generating public/private rsa key pair.
      Enter file in which to save the key (/root/.ssh/id_rsa):
      Created directory '/root/.ssh'.
      Your identification has been saved in /root/.ssh/id_rsa.
      Your public key has been saved in /root/.ssh/id_rsa.pub.
      The key fingerprint is:
      64:3d:a4:70:94:c4:33:64:6b:6b:1c:7c:e9:8f:15:93 user@ae01
      The key's randomart image is:
      +--[ RSA 2048]----+
      |      .=*..      |
      |       === . .   |
      |        Oo= E    |
      |       = = . o   |
      |        S . .    |
      |       .   +     |
      |          . .    |
      |                 |
      |                 |
      +-----------------+
    • 配置无密码SSH登录
      如果你希望服务器a1不需要密码就能SSH登录服务器a2,你需要将a1生成的公共密码添加到a2的~/.ssh/authorized_keys文件.
      此次安装中,ae01是NameNode,需要无密码登录到DataNode(ae02,ae03)的服务器,所以我们需要分别将ae01生成的公钥,添加到ae02,ae03的authorized_keys文件。
      修改ae01的公钥名字为id_rsa_ae01.pub。
      user@ae01:~/.ssh$ sudo cp id_rsa.pub id_rsa_ae01.pub

      复制id_rsa_ae01.pub到服务器ae02

      user@ae01:~/.ssh$ scp ./id_rsa_ae01.pub user@192.168.23.129:~/.ssh/

      登录到ae02,将id_rsa_ae01.pub 添加到authorized_keys

      user@ae02:~/.ssh$ cat id_rsa_ae01.pub >> authorized_keys

      重新登录到ae01,并尝试无密码方面ae02

      user@ae01:~/.ssh$ ssh ae02
      Welcome to Ubuntu 12.04.2 LTS (GNU/Linux 3.5.0-23-generic x86_64)
      
       * Documentation:  https://help.ubuntu.com/
      
        System information as of Thu Jun 12 12:21:45 CST 2014
      
        System load:  0.0                Processes:           86
        Usage of /:   10.3% of 18.45GB   Users logged in:     1
        Memory usage: 35%                IP address for eth0: 192.168.23.129
        Swap usage:   0%
      
        Graph this data and manage this system at https://landscape.canonical.com/
      
      122 packages can be updated.
      69 updates are security updates.
      
      Last login: Tue Jun 10 20:06:57 2014 from 192.168.23.128

      对以上机器都进行如上操作,确保两两之间可以实现无密码ssh.

    三、安装 Hadoop

    • 修改host文件,添加3台服务器的host
      user@ae01:/usr/local/ae$ sudo vim /etc/hosts
      127.0.0.1         localhost
      192.168.23.128    ae01
      192.168.23.129    ae02
      192.168.23.129    ae03
    • 解压Hadoop
      将hadoop-1.2.1.tar.gz 复制到 /usr/local/ae,解压
      user@ae01:/usr/local/ae$ sudo tar -zxvf hadoop-1.2.1.tar.gz
    • 添加Hadoop环境变量
      export HADOOP_HOME=/usr/local/ae/hadoop-1.2.1
      export HADOOP_HOME_WARN_SUPPRESS=1
      export PATH=$PATH:$HADOOP_HOME/bin
    • 配置Hadoop
      core-site.xml是全局配置,hdfs-site.xml和mapred-site.xml分别是hdfs和mapred的局部配置

      修改$HADOOP_HOME/conf/hadoop-env.sh 添加JAVA_HOME

      export JAVA_HOME=/usr/local/ae/jdk1.7.0_51

      修改$HADOOP_HOME/conf/core-site.xml 加入以下文件到<configuration>节点

      <property>
              <name>hadoop.tmp.dir</name>
              <value>/usr/local/ae/storage/hadoop/temp</value>
              <description>A base for other temporary directories.</description>
      </property>
      
      <property>
              <name>fs.default.name</name>
              <value>hdfs://ae01:9000</value>
              <description>
      The name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation.  The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.  The uri's authority is used to
          determine the host, port, etc. for a filesystem.
              </description>
      </property>
      <property>
              <name>fs.checkpoint.period</name>
              <value>3600</value>
              <description>
      The number of seconds between two periodic checkpoints.
              </description>
      </property>
      
      <property>
              <name>fs.checkpoint.size</name>
              <value>67108864</value>
              <description>
      The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.
              </description>
      </property>

      修改$HADOOP_HOME/conf/hdfs-site.xml 加入以下文件到<configuration>节点

      <property>
              <name>dfs.name.dir</name>
              <value>/usr/local/ae/storage/hadoop/name</value>
              <description>
      Determines where on the local filesystem the DFS name node should store the name table(fsimage).  If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
              </description>
      </property>
      
      <property>
              <name>dfs.data.dir</name>
              <value>/usr/local/ae/storage/hadoop/data</value>
              <description>]
      Determines where on the local filesystem the DFS name node should store the name table(fsimage).  If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
              </description>
      </property>
      
      <property>
              <name>dfs.http.address</name>
              <value>ae01:50070</value>
              <description>
      The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port.
              </description>
      </property>
      
      <property>
              <name>dfs.permissions</name>
              <value>false</value>
              <description>
      If "true", enable permission checking in HDFS. If "false", permission checking is turned off,
      but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.
              </description>
      </property>
      
      <property>
              <name>dfs.replication</name>
              <value>1</value>
              <description>Default block replication. The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
              </description>
      </property>

      修改$HADOOP_HOME/conf/mapred-site.xml 加入以下文件到<configuration>节点

      <property>
              <name>mapred.job.tracker</name>
              <value>ae01:9001</value>
              <description>
      The host and port that the MapReduce job tracker runs  at.  If "local", then jobs are run in-process as a single map and reduce task.
              </description>
         </property> 

      修改$HADOOP_HOME/conf/masters

      ae01

      修改$HADOOP_HOME/conf/slaves

      ae01
      ae02
      ae03

      在上述配置中:
      fs.default.name的值hdfs://ae01:9000 用来决定NameNode
      mapred.job.tracker的值ae01:9001 用来决定JobTracker
      masters文件的值决定SecondaryNameNode
      slaves文件的值决定DataNode和TaskTracker

      创建文件目录/usr/local/ae/storage/hadoop, 并赋予hadoop 文件夹足够的权限

      user@ae01: ~$ /usr/local/ae$ sudo chmod 777 ./storage/hadoop/

       将配置好的Hadoop复制到ae02和ae03,并在ae02和ae03上创建文件目录/usr/local/ae/storage/hadoop

    • 初始化和启动Hadoop

      登录ae01,初始化

      user@ae01:~$ hadoop namenode -format

      启动Hadoop

      user@ae01:~$ start-all.sh

      使用jps查看java进程
      ae01

      user@ae01:/usr/local/ae$ jps
      26239 JobTracker
      26158 SecondaryNameNode
      36052 Jps
      26468 TaskTracker
      25687 NameNode
      25926 DataNode

      ae02

      user@ae02:~$ jps
      25021 Jps
      18999 TaskTracker
      18791 DataNode

      ae03

      user@ae03:~$ jps
      3901 DataNode
      9485 Jps
      4106 TaskTracker
  • 相关阅读:
    塔 · 第 二 条 约 定
    nyoj 325
    塔 · 第 一 条 约 定
    大一上
    Django之ORM
    mysql概念
    数据库索引
    使用pymysql进行数据库的增删改查
    sql注入攻击
    pymysql
  • 原文地址:https://www.cnblogs.com/tannerBG/p/4271831.html
Copyright © 2020-2023  润新知