• hadoop2.7.4在ubuntu16桌面版上的集群安装


    Hadoopubuntu16桌面版上的集群安装


    参考文档1:http://blog.csdn.net/mark_lq/article/details/53384358


    参考文档2:http://blog.csdn.net/quiet_girl/article/details/74352190



        说明:由于我的ubuntu系统中已经有tensorflow、python2.7、syntaxnet等,并不是全新的系统,安装hadoop集群也是在此之上。安装中也是各种坑,最大的一个坑是ssh免密码登录,其次是安装前要新建一个hadoop账号。我是按照参考文档1进行安装,启动出问题后又想到了参考文档2。在此不说人家的文档写的不好什么的,因为每个人的系统环境不同,你安装启动不了就说人家的文档没写好。言归正传,开始:

    本来是写在最后的结束语,但想想还是写在开关:

    三个系统建议安装好一个后,把下面的步骤全走完再克隆,要干的事情是修改/etc/hostname,/etc/hosts,以及在master上执行以下命令:

    sudo scp ./authorized_keys  hadoop@slaver1:~/.ssh

    sudo scp ./authorized_keys  hadoop@slaver2:~/.ssh

    最后是在master上格式化和启动。


    1. 系统环境

    Ubuntu 16.04
    vmware 12.5.2 build-4638234
    hadoop 2.7.4
    java 1.8.0_131

    master:192.168.93.140
    slaver1:192.168.93.141
    slaver2:192.168.93.142

    2. 部署步骤

    2.1 Basic Requirements

    1.、添加 hadoop 用户,并添加到 sudoers

    sudo adduser hadoop
    说明:sudoers文件保存时要用wq!
    sudo vim /etc/sudoers

    添加如下:

    # User privilege specification
    root    ALL=(ALL:ALL) ALL
    hadoop    ALL=(ALL:ALL) ALL

    2、切换到 hadoop 用户

    说明:在桌面版下切换用户按右上角的设置图标进行切换,不要用su hadoop

    su hadoop

    3、修改 /etc/hostname 主机名为 master

    sudo vim /etc/hostname

    4、修改 /etc/hosts

    127.0.0.1   localhost
    127.0.1.1   localhost.localdomain   localhost
     
    # The following lines are desirable for IPv6 capable hosts
    ::1     localhost ip6-localhost ip6-loopback
    ff02::1 ip6-allnodes
    ff02::2 ip6-allrouters
     
    # hadoop nodes
    192.168.93.140  master
    192.168.93.141  slave1
    192.168.93.142  slave2
    说明:jdk我用的是openjdk,默认安装在/usr/lib/jvm/java-1.8.0-openjdk-amd64

    5、安装配置 java 环境
    下载 jdk1.8 解压到 /usr/local 目录下(为了保证所有用户都能使用),修改 /etc/profile,并生效:

    # set jdk classpath
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
    export JRE_HOME=$JAVA_HOME/jre
     
    export PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH
    export CLASSPATH=$CLASSPATH:.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib
    source /etc/profile
     

    验证 jdk 是否安装配置成功

    hadoop@master:~$ jjava -version
    openjdk version "1.8.0_131"
    OpenJDK Runtime Environment (build 1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11)
    OpenJDK 64-Bit Server VM (build 25.131-b11, mixed mode)1
     

    6、安装 openssh-server

    sudo apt-get install ssh

    ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

     

    7、对于 slave1 和 slave2 可采用虚拟机 clone 的方法实现复制,复制主机后注意修改 /etc/hostname 为 slave1 和 slave2

    8、配置 master 节点可通过 SSH 无密码访问 slave1 和 slave2 节点

    将生成的 authorized_keys 文件复制到 slave1 和 slave2 的 .ssh目录下

    sudo scp ./authorized_keys  hadoop@slaver1:~/.ssh

    sudo scp ./authorized_keys  hadoop@slaver2:~/.ssh

    测试 master 节点无密码访问 slave1 和 slave2 节点:

    ssh slaver1
    ssh slaver2
     

    输出:

    hadoop@master:/usr/lib/jvm/java-1.8.0-openjdk-amd64$ ssh slaver1
    Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-35-generic x86_64)
     
     * Documentation:  https://help.ubuntu.com
     * Management:     https://landscape.canonical.com
     * Support:        https://ubuntu.com/advantage
     
    50 packages can be updated.
    0 updates are security updates.
     
    Last login: Tue Sep 19 01:20:08 2017 from 192.168.93.140
    hadoop@slaver1:~$

    2.2 Hadoop 2.7 Cluster Setup

    说明:先要在hadoop用户下建文件夹software,cd到文件夹下去下载hadoop-2.7.4.tar.gz,命令如下:

    wget http://mirrors.tuna.tsinghua.edu.cn/apache/hadoop/common/hadoop-2.7.4/hadoop-2.7.4.tar.gz

    1、hadoop 用户目录下解压下载的 hadoop-2.7.4.tar.gz

    ·         hadoop@master:~/software$ ll
    ·         total 260460
    ·         drwxrwxr-x  4 hadoop hadoop      4096 Sep 19 03:43 ./
    ·         drwxr-xr-x 19 hadoop hadoop      4096 Sep 19 02:37 ../
    ·         drwxrwxr-x  3 hadoop hadoop      4096 Sep 19 03:43 hadoop-2.7.0/
    ·         drwxr-xr-x 11 hadoop hadoop      4096 Sep 19 02:35 hadoop-2.7.4/
    ·         -rw-rw-r--  1 hadoop hadoop 266688029 Aug  6 01:15 hadoop-2.7.4.tar.gz 

    2、配置 hadoop 的环境变量

    sudo vim /etc/profile
     

    配置如下:

    # set hadoop classpath
    export HADOOP_HOME=/home/hadoop/software/hadoop-2.7.4
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_COMMON_HOME=$HADOOP_HOME
    export HADOOP_HDFS_HOME=$HADOOP_HOME
    export YARN_HOME=$HADOOP_HOME
    export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native
    export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
    export HADOOP_PREFIX=$HADOOP_HOME
    export CLASSPATH=$CLASSPATH:.:$HADOOP_HOME/bin

    生效文件:source /etc/profile

    3、配置 hadoop 的配置文件,主要配置 core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml 文件。

    • core-site.xml
    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <!-- master: /etc/hosts 配置的域名 master -->
            <value>hdfs://master:9000/</value>
        </property>
    </configuration>
     
     
    • hdfs-site.xml
     <configuration>
            <property>
                <name>dfs.namenode.name.dir</name>
                <value>/home/hadoop/software/hadoop-2.7.0/dfs/namenode</value>
            </property>
            <property>
                    <name>dfs.datanode.data.dir</name>
                    <value>/home/hadoop/software/hadoop-2.7.0/dfs/datanode</value>
            </property>
            <property>
                    <name>dfs.replication</name>
                    <value>1</value>
            </property>
            <property>
                    <name>dfs.namenode.secondary.http-address</name>
                    <value>master:9001</value>
            </property>
    </configuration>
     
    说明:mapred-site.xml文件在etc/hadoop下找不到,解决办法是
    sudo cp mapred-site.xml.template mapred-site.xml
    • mapred-site.xml
    <configuration>
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
        <property>
          <name>mapreduce.jobhistory.address</name>
          <value>master:10020</value>
        </property>
        <property>
          <name>mapreduce.jobhistory.webapp.address</name>
          <value>master:19888</value>
        </property>
    </configuration>
     
     
    • yarn-site.xml
    <configuration>
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
        <property>                                                             
          <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
          <name>yarn.resourcemanager.address</name>
            <value>master:8032</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>master:8030</value>
        </property>
        <property>
          <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>master:8031</value>
        </property>
        <property>
          <name>yarn.resourcemanager.admin.address</name>
            <value>master:8033</value>
        </property>
        <property>
          <name>yarn.resourcemanager.webapp.address</name>
            <value>master:8088</value>
        </property>
    </configuration>
     

    4、修改env环境变量文件,为 hadoop-env.sh、mapred-env.sh、yarn-env.sh 文件添加 JAVA_HOME:

    # The java implementation to use.
    export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64
     

    5、在etc/hadoop下配置 masters和slaves 文件:

    但只找到slaves文件,那就复制一份

    cp slaves masters

    文件masters中的内容为master
    文件slaves中的内容为slaver1, slaver2

    6、 向 slave1 和 slave2 节点复制 hadoop2.7.4 整个目录至相同的位置

    scp -r hadoop-2.7.4/ hadoop@slaver1:~/software

    scp -r hadoop-2.7.4/ hadoop@slaver2:~/software

    2.3 Start Hadoop cluster from master

    1、初始格式化文件系统 bin/hdfs namenode -format

    ·         hadoop@master:~/software/hadoop-2.7.4/bin$ ./hdfs namenode -format

    输出 17/09/19 03:43:18 INFO common.Storage: Storage directory /home/hadoop/software/hadoop-2.7.0/dfs/namenode has been successfully formatted.

    17/09/19 03:43:18 INFO namenode.FSImageFormatProtobuf: Saving image file /home/hadoop/software/hadoop-2.7.0/dfs/namenode/current/fsimage.ckpt_0000000000000000000 using no compression

    17/09/19 03:43:18 INFO namenode.FSImageFormatProtobuf: Image file /home/hadoop/software/hadoop-2.7.0/dfs/namenode/current/fsimage.ckpt_0000000000000000000 of size 322 bytes saved in 0 seconds.

    17/09/19 03:43:18 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

    17/09/19 03:43:18 INFO util.ExitUtil: Exiting with status 0

    17/09/19 03:43:18 INFO namenode.NameNode: SHUTDOWN_MSG:

    /************************************************************

    SHUTDOWN_MSG: Shutting down NameNode at master/192.168.93.140

    ************************************************************/

    启动 Hadoop 集群 start-all.sh

    hadoop@master:~/software/hadoop-2.7.4/sbin$ ./start-all.sh

    This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh

    Starting namenodes on [master]

    master: starting namenode, logging to /home/hadoop/software/hadoop-2.7.4/logs/hadoop-hadoop-namenode-master.out

    slaver2: starting datanode, logging to /home/hadoop/software/hadoop-2.7.4/logs/hadoop-hadoop-datanode-slaver2.out

    slaver1: starting datanode, logging to /home/hadoop/software/hadoop-2.7.4/logs/hadoop-hadoop-datanode-slaver1.out

    Starting secondary namenodes [master]

    master: starting secondarynamenode, logging to /home/hadoop/software/hadoop-2.7.4/logs/hadoop-hadoop-secondarynamenode-master.out

    starting yarn daemons

    starting resourcemanager, logging to /home/hadoop/software/hadoop-2.7.4/logs/yarn-hadoop-resourcemanager-master.out

    slaver1: starting nodemanager, logging to /home/hadoop/software/hadoop-2.7.4/logs/yarn-hadoop-nodemanager-slaver1.out

    slaver2: starting nodemanager, logging to /home/hadoop/software/hadoop-2.7.4/logs/yarn-hadoop-nodemanager-slaver2.out

    hadoop@master:~/software/hadoop-2.7.4/sbin$

    jps 输出运行的 java 进程:

    浏览器查看 HDFS:http://192.168.93.140:50070

    Overview 'master:9000' (active)

    Started:   Tue Sep 19 03:43:41 PDT 2017

    Version:   2.7.4, rcd915e1e8d9d0131462a0b7301586c175728a282

    Compiled:  2017-08-01T00:29Z by kshvachk from branch-2.7.4

    Cluster ID:   CID-232a6925-5563-411b-9eb9-828aa4623ed0

    Block Pool ID:    BP-2106543616-192.168.93.140-1505817798572

    Summary

    Security is off.

    Safemode is off.

    1 files and directories, 0 blocks = 1 total filesystem object(s).

    Heap Memory used 44.74 MB of 221 MB Heap Memory. Max Heap Memory is 889 MB.

    Non Heap Memory used 45.64 MB of 46.5 MB Commited Non Heap Memory. Max Non Heap Memory is -1 B.

    浏览器查看 mapreduce:http://192.168.93.140:8088

    注意:在 hdfs namenode -formatstart-all.sh 运行 HDFS 或 Mapreduce 无法正常启动时(master节点或 slave 节点),可将 master 节点和 slave 节点目录下的 dfs、logs、tmp 等目录删除,重新 hdfs namenode -format,再运行 start-all.sh

    2.4 Stop Hadoop cluster from master

    hadoop@master:~/software/hadoop-2.7.4/sbin$ jps

    9318 ResourceManager

    8840 NameNode

    9098 SecondaryNameNode

    15628 Jps

    hadoop@master:~/software/hadoop-2.7.4/sbin$ ./stop-all.sh

    This script is Deprecated. Instead use stop-dfs.sh and stop-yarn.sh

    Stopping namenodes on [master]

    master: stopping namenode

    slaver1: stopping datanode

    slaver2: stopping datanode

    Stopping secondary namenodes [master]

    master: stopping secondarynamenode

    stopping yarn daemons

    stopping resourcemanager

    slaver1: stopping nodemanager

    slaver2: stopping nodemanager

    no proxyserver to stop

    hadoop@master:~/software/hadoop-2.7.4/sbin$

     

    hadoop与hbase兼容列表

  • 相关阅读:
    mongoDB的常用语法
    Linux系统清除缓存
    110:类视图讲解
    109:大型CSV文件的处理方式
    108:生成和下载csv文件
    107:JsonResponse用法详解
    106:HttpResponse对象讲解
    104~105:HttpRequest对象讲解和QueryDict的用法讲解
    103:重定向详解
    102:限制请求method装饰器
  • 原文地址:https://www.cnblogs.com/herosoft/p/8134146.html
Copyright © 2020-2023  润新知