• Hadoop学习笔记——搭建


    一搭建环境列表

    操作系统:centos6.5 64位

    JDK环境:jdk1.7.0_71

    hadoop版本:社区版本2.7.2,hadoop-2.7.2-src.tar.gz

    主机名

    ip

    角色

    用户

    master1

    192.168.204.202

    Namenode;secondary namenode;resourcemanager

    hadoop

    slave1

    192.168.204.203

    Datanode; nodemanager

    hadoop

    slave2

    192.168.204.204

    Datanode; nodemanager

    hadoop

    二操作系统环境准备

    1设置主机名:hostname

    vi  /etc/sysconfig/network

    2设置防火墙

    chkconfig iptables off

    service iptables off

    3关闭Selinux

    vi /etc/sysconfig/selinux

    SELINUX=disabled

    [root@cloud001 Desktop]# hostname

    [root@cloud001 Desktop]# ifconfig

    [root@cloud001 Desktop]# service iptables status

    [root@cloud001 Desktop]# sestatus

    4安装jdk

    配置环境变量

    [root@master1 hadoopsolf]vim /etc/profile

    JAVA_HOME=/usr/java/jdk1.7.0_71(根据实际情况修改)

    CLASSPATH=.:$JAVA_HOME/lib.tools.jar

    PATH=$JAVA_HOME/bin:$PATH

    export JAVA_HOME CLASSPATH PATH

    [root@ master1 hadoopsolf]source /etc/profile

    三hadoop2.X软件编译环境准备

    1下载http://apache.claz.org/hadoop/common/最新版

    2准备编译环境

    tar -zxvf hadoop-2.7.2-src.tar.gz得到hadoop-2.7.2-src文件夹。

    进入hadoop-2.7.2-src文件夹,查看BUILDING.txt

    cd  hadoop-2.7.2-src

    vim  BUILDING.txt

    可以看到编译所需的库或者工具。

    3jdk

    安装jdk;然后打开/etc/profile配置jdk环境变量

    export  JAVA_HOME=/usr/java/jdk1.7.0_71

    export  CLASSPATH=.:$JAVA_HOME/jre/lib/rt.jar:$JAVA_HOME/lib/tools.jar

    export  JRE_HOME=/usr/java/jdk1.7.0_71

    export  PATH=$PATH:$JRE_HOME/bin

    source  /etc/profile

    运行javac -version 查看状态

    4安装各种库

    yum -y install svn ncurses-devel gcc*

    yum -y install lzo-devel zlib-develautoconf automake libtool cmake openssl-devel

    5安装protobuf-2.5.0.tar.gz

    tar zxvf protobuf-2.5.0.tar.gz进入protobuf-2.5.0依次执行

    cd  protobuf-2.5.0

    ./configure

    make

    make install

    查版本[root@master1 protobuf-2.5.0]# protoc  -version

    6安装maven

    下载http://maven.apache.org/download.cgi

    tar -zxvf apache-maven-3.3.9-bin.tar.gz  -C  /hadoopsolf

    然后打开/etc/profile配置环境变量

    export MAVEN_HOME=/hadoopsolf/apache-maven-3.3.9

    export MAVEN_OPTS="-Xms256m-Xmx512m"

    export PATH=$PATH:$MAVEN_HOME/bin

    source /etc/profile

    查版本:[root@master1 protobuf-2.5.0]# mvn -version

    7安装ant

    下载:http://ant.apache.org/bindownload.cgi

    tar -zxvf  apache-ant-1.9.6-bin.tar.gz  -C /hadoopsolf

    然后打开/etc/profile配置环境变量

    export ANT_HOME=/hadoopsolf/apache-ant-1.9.6

    export PATH=$ANT_HOME/bin:$PATH

    source /etc/profile

    验证:[root@master1 protobuf-2.5.0]# ant -version

    8安装findbugs

    下载:http://findbugs.sourceforge.net/downloads.html

    vim /etc/profile 文件末尾添加:

    export FINDBUGS_HOME=/hadoopsolf/findbugs-3.0.1

    export PATH=$PATH:$FINDBUGS_HOME/bin

    source /etc/profile

    验证:[root@master1 protobuf-2.5.0]# findbugs -version

    9 hadoop2.X软件编译

    mvn clean package -Pdist,native -DskipTests -Dtar

    或者

    mvn package -Pdist,native -DskipTests -Dtar 

    务必保持网络畅通,需要经过漫长的等待!

    四安装配置hadoop2.X

    1配置ssh免密码登录

    1.1各主机配置hosts文件与主机名

    vi  /etc/hosts

    192.168.204.202  master1

    192.168.204.203   slave1

    192.168.204.204   slave2

    vi  /etc/sysconfig/network

    NETWORKING=yes

    HOSTNAME=master1  // slave1, slave2

    1.2各主机设置静态ip,互ping

    1.3配置master1到2个slave(slave1,slave2)的免密码登录(按顺序操作)

    (1)首先检查各机器是否已经安装[root@master1Desktop]# rpm -qa|grep  ssh

    已经安装

    openssh-askpass-5.3p1-94.el6.x86_64

    libssh2-1.4.2-1.el6.x86_64

    openssh-5.3p1-94.el6.x86_64

    openssh-server-5.3p1-94.el6.x86_64

    openssh-clients-5.3p1-94.el6.x86_64

    如果没有安装则 yum install  ssh

    (2)在master1主机:

    hadoop用户执行:ssh-keygen -t rsa下一步继续到结束

    [hadoop@master1 ~]$ cd  /home/hadoop/.ssh/

    [hadoop@master1 .ssh]$ ls

    id_rsa  id_rsa.pub

    [hadoop@master1 .ssh]$ cat id_rsa.pub>> authorized_keys

    root用户执行[master1Desktop]# chmod  600  /home/hadoop/.ssh/authorized_keys

    hadoop用户执行验证

    [hadoop@master1 ~]$ ssh  master1

    Last login: Mon Feb 22 22:23:16 2016 from master1

    [hadoop@master1 ~]$

    slave机器hadoop用户执行:mkdir-p  /home/hadoop/.ssh

    master1主机hadoop用户执行传送

    [hadoop@master1~]$

     scp /home/hadoop/.ssh/authorized_keys hadoop@slave1:/home/hadoop/.ssh/

    [hadoop@master1~]$

     scp /home/hadoop/.ssh/authorized_keys hadoop@slave2:/home/hadoop/.ssh/

    slave机器root用户执行:chmod  600 /home/hadoop/.ssh/authorized_keys

    master1机器hadoop用户执行验证

    [hadoop@master1 ~]$ ssh  slave1

    [hadoop@master1 ~]$ ssh  slave2

    2安装hadoop2.X

    hadoop用户在master1操作:

    2.1把编译好的hadoop2.X解压至目录

    (自定义,我这里是/home/hadoop)

    [hadoop@master1 ~]$ tar  -zxvf  /hadoopsolf/hadoop-2.7.2.tar.gz  -C  /home/hadoop

    配置hadoop2.X的环境变量,修改~/.bash_profile

    vi  /home/hadoop/.bash_profile

    export HADOOP_HOME=/home/hadoop/hadoop-2.7.2

    export HADOOP_MAPRED_HOME=${HADOOP_HOME}

    export HADOOP_COMMON_HOME=${HADOOP_HOME}

    export HADOOP_HDFS_HOME=${HADOOP_HOME}

    export YARN_HOME=${HADOOP_HOME}

    export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

    export HDFS_CONF_DIR=${HADOOP_HOME}/etc/hadoop

    export YARN_CONF_DIR=${HADOOP_HOME}/etc/hadoop

    export HADOOP_LOG_DIR=${HADOOP_HOME}/logs

    export HADOOP_PID_DIR=/var/hadoop/pids

    --- 注意

    (root用户创建/var/hadoop/pids并赋予hadoop权限

    mkdir   -p /var/hadoop/pids

    chown  -R hadoop: hadoop /var/hadoop/pids

    )

    export PATH=$PATH:$HADOOP_HOME/bin

    export JAVA_HOME=/usr/java/jdk1.7.0_71

    export  CLASSPATH=.:$JAVA_HOME/lib.tools.jar

    export  PATH=$JAVA_HOME/bin:$PATH

    2.2配置Hadoop中基础目录

    cd  /home/hadoop/hadoop-2.7.2

    $ mkdir -p dfs/name

    $ mkdir -p dfs/data

    $ mkdir -p tmp

    $ cd  etc/hadoop

    2.3配置Hadoop中配置文件

    需要配置的文件如下core-site.xml, hdfs-site.xml, mapred-site.xml, hadoop-env.sh,所有的文件均位于hadoop2.7.2/etc/hadoop下面,具体需要的配置如下:

    core-site.xml 配置如下:

    <property>

    <name>hadoop.tmp.dir</name>

    <value>file:/home/hadoop/hadoop-2.7.2/tmp</value>

    <description>Abase for other temporary directories.</description>

    </property>   //接收Client连接的RPC端口,用于获取文件系统metadata信息。

          <property>

           <name>fs.defaultFS</name>

          <value>hdfs://master1:9000</value>

          </property>

    <property>

    <name>io.file.buffer.size</name>

    <value>131702</value>

    </property>

    hdfs-site.xml配置如下:

    <property>

    <name>dfs.namenode.secondary.http-address</name>

    <value>master1:9001</value>

    </property>

    <property>

    <name>dfs.namenode.name.dir</name>

    <value>file:/home/hadoop/hadoop-2.7.2/dfs/name</value>

    </property>

    <property>

    <name>dfs.datanode.data.dir</name>

    <value>file:/home/hadoop/hadoop-2.7.2/dfs/data</value>

    </property>

    <property>   //2份

    <name>dfs.replication</name>

    <value>2</value>

    </property>

    <property>

    <name>dfs.webhdfs.enabled</name>

    <value>true</value>

    </property>

    mapred-site.xml配置如下:(cp mapred-site.xml.template  mapred-site.xml)  

    <property>

    <name>mapreduce.framework.name</name>  // mapreduce运行在yarn上。

    <value>yarn</value>

    </property>

    <property>

    <name>mapreduce.jobhistory.address</name>

    <value>master1:10020</value>

    </property>

    <property>

    <name>mapreduce.jobhistory.webapp.address</name>

    <value>master1:19888</value>

    </property>

    yarn-site.xml配置如下:

    <property>

    <name>yarn.nodemanager.aux-services</name>

    <value>mapreduce_shuffle</value>

    </property>

    <property>

    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

    <value>org.apache.hadoop.mapred.ShuffleHandler</value>

    </property>

    <property>

    <name>yarn.resourcemanager.address</name>

    <value>master1:8032</value>

    </property>

    <property>

    <name>yarn.resourcemanager.scheduler.address</name>

    <value>master1:8030</value>

    </property>

    <property>

    <name>yarn.resourcemanager.resource-tracker.address</name>

    <value>master1:8031</value>

    </property>

    <property>

    <name>yarn.resourcemanager.admin.address</name>

    <value>master1:8033</value>

    </property>

    <property>

    <name>yarn.resourcemanager.webapp.address</name>

    <value>master1:8088</value>

    </property>

    Vi slaves:

    slave1

    slave2

    vi  hadoop-env.sh

    export JAVA_HOME=/usr/java/jdk1.7.0_71

    vi  yarn-env.sh

    export JAVA_HOME=/usr/java/jdk1.7.0_71

    2.4拷贝

    从master1远程复制文件夹到slaveX机

    scp -r /home/hadoop/hadoop-2.7.2  slave1:/home/hadoop

    scp -r /home/hadoop/hadoop-2.7.2  slave2:/home/hadoop

    3启动hadoop2.X

    master1机器操作

    3.1初始化HDFS系统

    bin/hdfs  namenode  -format

    3.2开启NameNode和DataNode守护进程

    sbin/start-dfs.sh(此命令启动了namenode、secondaryNamenode以及datanode)

    [hadoop@master1 sbin]$ ./start-dfs.sh

    Starting namenodes on [master1]

    master1: Error: JAVA_HOME is not set and could not be found.

    slave2: Error: JAVA_HOME is not set and could not be found.

    slave1: Error: JAVA_HOME is not set and could not be found.

    Starting secondary namenodes [master1]

    master1: Error: JAVA_HOME is not set and could not be found.

    解决

    vi /home/hadoop/hadoop-2.7.2/libexec/hadoop-config.sh
    添加 export JAVA_HOME=/usr/java/jdk1.7.0_71

    此时在master1上面运行的进程有:namenode secondarynamenode

    slaveX上面运行的进程有:datanode

    ./sbin/start-yarn.sh (此命令启动了ResourceManager和NodeManager)

    此时在master1上面运行的进程有:namenode secondarynamenode resourcemanager

    slaveX上面运行的进程有:datanode  NodeManager

    查看各进程

    [hadoop@master1 hadoop-2.7.2]$ jps

    8176 Jps

    4356 ResourceManager

    6277 NameNode

    6429 SecondaryNameNode

    3.3基本状态查看

    查看帮助:

    [hadoop@master1 hadoop-2.7.2]$./hdfs  –help

    [hadoop@master1 hadoop-2.7.2]$./hdfs dfs –help

    [hadoop@master1 bin]$ hdfs dfsadmin  -help

    查看集群状态:./bin/hdfs dfsadmin –report

    查看文件块组成:./bin/hdfs fsck / -files -blocks

    可以通过登录Web控制台,查看HDFS集群状态:     http://master1:50070 (hdfs-site.xml)

    ResourceManager运行在主节点master上,查看yarn:    http://master1:8088 (yarn-site.xml)

    NodeManager运行在从节点上,查看例如节点slave1:http://slave1:8042/

    管理JobHistory Server(先要启动mr-jobhistory-daemon.shstart historyserver),通过Web查看:http://master1:19888/jobhistory

    查看hadoop:http://master1:9001

     
     
  • 相关阅读:
    系统调用简单总结
    系统调用原理详解
    总结:c 语言与Linux操作系统的关系
    poj3171 Cleaning Shifts
    洛谷P1032 字串变换
    poj3662 Telephone Lines
    洛谷P1073 最优贸易
    Uva1330/poj1964 City Game
    poj2559/SP1805 Largest Rectangle in a Histogram
    洛谷 P1196 [NOI2002]银河英雄传说
  • 原文地址:https://www.cnblogs.com/huangjinwen/p/5842075.html
Copyright © 2020-2023  润新知