• Docker安装hadoop2.6


    10.35.22.91
    1
    注意:这个镜像中的root用户的密码是root
    mkdir centos-ssh-root
    cd centos-ssh-root
    Vi Dockerfile

    FROM centos
    MAINTAINER jieranli <jieran.li@thomsonreuters.com>
    RUN yum install -y openssh-server sudo
    RUN sed -i 's/UsePAM yes/UsePAM no/g' /etc/ssh/sshd_config
    RUN yum install -y openssh-clients
    RUN echo "root:root" | chpasswd
    RUN echo "root ALL=(ALL) ALL" >> /etc/sudoers
    RUN ssh-keygen -t dsa -f /etc/ssh/ssh_host_dsa_key
    RUN ssh-keygen -t rsa -f /etc/ssh/ssh_host_rsa_key
    RUN mkdir /var/run/sshd
    EXPOSE 22
    CMD ["/usr/sbin/sshd", "-D"]

    构建命令:
    docker build -t centos-ssh-root:v1.0 .

    查询刚才构建成功的镜像
    docker images

    2
    mkdir centos-ssh-root-jdk
    cd centos-ssh-root-jdk
    Cp ../jdk-8u181-linux-x64.tar.gz .
    Vi Dockerfile

    FROM centos-ssh-root:v1.0
    ADD jdk-8u181-linux-x64.tar.gz /usr/local/
    RUN mv /usr/local/jdk1.8.0_181 /usr/local/jdk1.8
    ENV JAVA_HOME /usr/local/jdk1.8
    ENV PATH $JAVA_HOME/bin:$PATH

    构建命令:
    docker build -t centos-ssh-root-jdk:v2.0 .

    查询构建成功的镜像
    docker images

    4
    mkdir centos-ssh-root-jdk-hadoop
    cd centos-ssh-root-jdk-hadoop
    Cp ../hadoop-2.6.0-cdh5.5.2.tar.gz .
    Vi Dockerfile

    FROM centos-ssh-root-jdk:v2.0
    ADD hadoop-2.6.0-cdh5.5.2.tar.gz /usr/local
    RUN mv /usr/local/hadoop-2.6.0-cdh5.5.2 /usr/local/hadoop
    ENV HADOOP_HOME /usr/local/hadoop
    ENV PATH $HADOOP_HOME/bin:$PATH

    构建命令:
    docker build -t hadoop:v3.0 .


    二:搭建hadoop分布式集群
    1:集群规划
    准备搭建一个具有三个节点的集群,一主两从
    主节点:hadoop0 ip:10.35.22.11
    从节点1:hadoop1 ip:10.35.22.12
    从节点2:hadoop2 ip:10.35.22.13

    但是由于docker容器重新启动之后ip会发生变化,所以需要我们给docker设置固定ip。使用pipework给docker容器设置固定ip
    2:启动三个容器,分别作为hadoop0 hadoop1 hadoop2
    在宿主机上执行下面命令,给容器设置主机名和容器的名称,并且在hadoop0中对外开放端口50070 和8088

    docker run --name hadoop0 --hostname hadoop0 -d -P -p 50070:50070 -p 9000:9000 -p 50090:50090 -p 10020:10020 -p 19888:19888 -p 8088:8088 hadoop:v3.0

    docker run --name hadoop1 --hostname hadoop1 -d -P hadoop:v3.0

    docker run --name hadoop2 --hostname hadoop2 -d -P hadoop:v3.0

    使用docker ps 查看刚才启动的是三个容器

    3:给这三台容器设置固定IP
    docker run -itd --name hadoop hadoop:v3.0 /bin/bash #生成容器
    docker exec -it hadoop /bin/bash #进入正在运行的容器
    1:下载pipework
    下载地址:https://github.com/jpetazzo/pipework.git
    2:把下载的zip包上传到宿主机服务器上,解压,改名字

    docker cp pipework-master.zip hadoop:/work/pipework-master.zip
    unzip pipework-master.zip
    mv pipework-master pipework
    cp -rp pipework/pipework /usr/local/bin/


    3:安装bridge-utils

    yum -y install bridge-utils
    brctl show
    1
    4:创建网络

    sudo brctl addbr br1
    sudo brctl delbr br0
    brctl delif br0 veth1pl24213
    sudo ip link set dev br1 up
    sudo ip addr add 10.35.22.1/24 dev br1


    5:给容器设置固定ip

    pipework br0 hadoop0 10.35.22.11/24
    pipework br0 hadoop1 10.35.22.12/24
    pipework br0 hadoop2 10.35.22.15/24


    验证一下,分别ping三个ip,能ping通就说明没问题


    4:配置hadoop集群
    先连接到hadoop0上,
    使用命令

    docker exec -it hadoop2 /bin/bash
    1
    下面的步骤就是hadoop集群的配置过程
    1:设置主机名与ip的映射,修改三台容器:vi /etc/hosts
    添加下面配置

    10.35.22.11 hadoop0
    10.35.22.12 hadoop1
    10.35.22.15 hadoop2


    2:设置ssh免密码登录
    在hadoop0上执行下面操作

    cd ~
    mkdir .ssh
    cd .ssh
    ssh-keygen -t rsa(一直按回车即可)
    ssh-copy-id -i localhost
    ssh-copy-id -i hadoop0
    ssh-copy-id -i hadoop1
    ssh-copy-id -i hadoop2

    在hadoop1上执行下面操作
    docker exec -it hadoop1 /bin/bash
    cd ~
    cd .ssh
    ssh-keygen -t rsa(一直按回车即可)
    ssh-copy-id -i localhost
    ssh-copy-id -i hadoop0
    ssh-copy-id -i hadoop1
    ssh-copy-id -i hadoop2
    在hadoop2上执行下面操作
    docker exec -it hadoop2 /bin/bash
    cd ~
    cd .ssh
    ssh-keygen -t rsa(一直按回车即可)
    ssh-copy-id -i localhost
    ssh-copy-id -i hadoop0
    ssh-copy-id -i hadoop1
    ssh-copy-id -i hadoop2


    3:在hadoop0上修改hadoop的配置文件
    vi .bash_profile


    export JAVA_HOME=/usr/local/jdk1.8
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOME}/bin:$PATH

    HADOOP_HOME=/usr/local/hadoop
    HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    PATH=$HADOOP_HOME/bin:$PATH
    export HADOOP_HOME HADOOP_CONF_DIR PATH

    进入到/usr/local/hadoop/etc/hadoop目录
    修改目录下的配置文件core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml
    (1)hadoop-env.sh

    export JAVA_HOME=/usr/local/jdk1.8
    1
    (2)core-site.xml

    <configuration>
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://hadoop0:9000</value>
    </property>
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/usr/local/hadoop/tmp</value>
    </property>
    <property>
    <name>io.file.buffer.size</name>
    <value>131072</value>
    <description>Size of read/write buffer used inSequenceFiles.</description>
    </property>
    <property>
    <name>fs.trash.interval</name>
    <value>1440</value>
    </property>
    </configuration>

    (3)hdfs-site.xml

    mkdir -p dfs/name
    mkdir -p dfs/data
    mkdir -p dfs/namesecondary
    <configuration>
    <property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>hadoop0:50090</value>
    <description>The secondary namenode http server address andport.</description>
    </property>
    <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///usr/local/hadoop/dfs/name</value>
    <description>Path on the local filesystem where the NameNodestores the namespace and transactions logs persistently.</description>
    </property>

    <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///usr/local/hadoop/dfs/data</value>
    <description>Comma separated list of paths on the local filesystemof a DataNode where it should store its blocks.</description>
    </property>

    <property>
    <name>dfs.namenode.checkpoint.dir</name>
    <value>file:///usr/local/hadoop/dfs/namesecondary</value>
    <description>Determines where on the local filesystem the DFSsecondary name node should store the temporary images to merge. If this is acomma-delimited list of directories then the image is replicated in all of thedirectories for redundancy.</description>
    </property>

    <property>
    <name>dfs.replication</name>
    <value>2</value>
    </property>
    </configuration>

    (4)yarn-site.xml

    <configuration>
    <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop0</value>
    <description>The hostname of theRM.</description>
    </property>

    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    <description>Shuffle service that needs to be set for Map Reduceapplications.</description>
    </property>
    <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
    </property>
    </configuration>

    (5)修改文件名:mv mapred-site.xml.template mapred-site.xml
    vi mapred-site.xml

    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    <description>Theruntime framework for executing MapReduce jobs. Can be one of local, classic oryarn.</description>
    </property>

    <property>
    <name>mapreduce.jobhistory.address</name>
    <value>hadoop0:10020</value>
    <description>MapReduce JobHistoryServer IPC host:port</description>
    </property>

    <property>
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>hadoop0:19888</value>
    <description>MapReduce JobHistoryServer Web UI host:port</description>
    </property>


    (6)格式化
    进入到/usr/local/hadoop目录下
    1、执行格式化命令

    bin/hdfs namenode -format

    格式化操作不能重复执行。如果一定要重复格式化,带参数-force即可。
    (7)启动伪分布hadoop

    命令:sbin/start-all.sh
    1
    第一次启动的过程中需要输入yes确认一下。
    这里写图片描述

    使用jps,检查进程是否正常启动?能看到下面几个进程表示伪分布启动成功

    [root@hadoop0 hadoop]# jps
    3267 SecondaryNameNode
    3003 NameNode
    3664 Jps
    3397 ResourceManager
    3090 DataNode
    3487 NodeManager

    (8)停止伪分布hadoop

    命令:sbin/stop-all.sh
    1
    (9)指定nodemanager的地址,修改文件yarn-site.xml

    <property>
    <description>The hostname of the RM.</description>
    <name>yarn.resourcemanager.hostname</name>
    <value>hadoop0</value>
    </property>

    (10)修改hadoop0中hadoop的一个配置文件etc/hadoop/slaves
    删除原来的所有内容,修改为如下

    hadoop1
    hadoop2

    (11)在hadoop0中执行命令-q不显示传输精度条

    scp -rq /usr/local/hadoop hadoop1:/usr/local
    scp -rq /usr/local/hadoop hadoop2:/usr/local
    1
    2
    (12)启动hadoop分布式集群服务

    sbin/start-all.sh

    (13)验证集群是否正常
    首先查看进程:
    Hadoop0上需要有这几个进程

    [root@hadoop0 hadoop]# jps
    4643 Jps
    4073 NameNode
    4216 SecondaryNameNode
    4381 ResourceManager

    Hadoop1上需要有这几个进程

    [root@hadoop1 hadoop]# jps
    715 NodeManager
    849 Jps
    645 DataNode

    Hadoop2上需要有这几个进程

    [root@hadoop2 hadoop]# jps
    456 NodeManager
    589 Jps
    388 DataNode


    hadoop fs -put
    hdfs dfs -put aa.txt /
    cd /usr/local/hadoop/share/hadoop/mapreduce
    hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount /a.txt /out


    通过浏览器访问集群的服务
    由于在启动hadoop0这个容器的时候把50070和8088映射到宿主机的对应端口上了

    adb9eba7142b crxy/centos-ssh-root-jdk-hadoop "/usr/sbin/sshd -D" About an hour ago Up About an hour 0.0.0.0:8088->8088/tcp, 0.0.0.0:50070->50070/tcp, 0.0.0.0:32770->22/tcp hadoop0
    1
    所以在这可以直接通过宿主机访问容器中hadoop集群的服务
    宿主机的ip为:10.35.22.92

    http://10.35.22.92:50070/
    http://10.35.22.92:19888/


    三:集群节点重启
    停止三个容器,在宿主机上执行下面命令

    docker stop hadoop0
    docker stop hadoop1
    docker stop hadoop2

    容器停止之后,之前设置的固定ip也会消失,重新再使用这几个容器的时候还需要重新设置固定ip
    先把之前停止的三个容器起来

    docker start hadoop0
    docker start hadoop1
    docker start hadoop2

    在宿主机上执行下面命令重新给容器设置固定ip

    pipework br0 hadoop0 10.35.22.11/24
    pipework br0 hadoop1 10.35.22.12/24
    pipework br0 hadoop2 10.35.22.15/24


    还需要重新在容器中配置主机名和ip的映射关系,每次都手工写比较麻烦
    写一个脚本,runhosts.sh

    #!/bin/bash
    echo 10.35.22.11 hadoop0 > /etc/hosts
    echo 10.35.22.12 hadoop1 > /etc/hosts
    echo 10.35.22.15 hadoop2 > /etc/hosts


    添加执行权限,chmod +x runhosts.sh
    把这个脚本拷贝到所有节点,并且分别执行这个脚本

    scp runhosts.sh hadoop1:~
    scp runhosts.sh hadoop2:~
    1
    2
    执行脚本的命令 ./runhosts.sh

    查看/etc/hosts文件中是否添加成功
    这里写图片描述

    注意:有一些docker版本中不会在hosts文件中自动生成下面这些映射,所以我们才在这里手工给容器设置固定ip,并设置主机名和ip的映射关系。

    172.17.0.25 hadoop0
    172.17.0.25 hadoop0.bridge
    172.17.0.26 hadoop1
    172.17.0.26 hadoop1.bridge
    172.17.0.27 hadoop2
    172.17.0.27 hadoop2.bridge


    启动hadoop集群

    sbin/start-all.sh

  • 相关阅读:
    LeetCode 1748. 唯一元素的和
    LeetCode 2047. 句子中的有效单词数
    LeetCode 1345. 跳跃游戏 IV
    LeetCode 1725. 可以形成最大正方形的矩形数目
    LeetCode 1765. 地图中的最高点
    LeetCode 2034. 股票价格波动
    LeetCode 1996. 游戏中弱角色的数量
    LeetCode 2013. 检测正方形
    LeetCode 1219. 黄金矿工
    LeetCode 2045. 到达目的地的第二短时间
  • 原文地址:https://www.cnblogs.com/jieran/p/11400127.html
Copyright © 2020-2023  润新知