• hadoop2.7.7 分布式集群安装与配置


    环境准备

    服务器四台:

    系统信息 角色 hostname IP地址
    Centos7.4 Mster hadoop-master-001 10.0.15.100
    Centos7.4 Slave hadoop-slave-001 10.0.15.99
    Centos7.4 Slave hadoop-slave-002 10.0.15.98
    Centos7.4 Slave hadoop-slave-003 10.0.15.97

    四台节点统一操作操作

    创建操作用户
        gourpadd hduser
        useradd hduser -g hduser
    
    切换用户并配置java环境变量
    笔者这里用的1.8的
    JAVA_HOME=~/jdk1.8.0_151
    PATH=$PATH:$JAVA_HOME/bin
    export JAVA_HOME
    export PATH
    
    配置/etc/hosts
    10.0.15.100 hadoop-master-001
    10.0.15.99 hadoop-data-001
    10.0.15.98 hadoop-data-002
    10.0.15.97 hadoop-data-003
    
    设置ssh免密
      这个网上比较多,这里不在累述

    安装流程(所有节点,包括master与slave)

    下载hadoop并安装

    http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz
    tar -zxvf  hadoop-2.7.7.tar.gz

    移动并修改权限

    chown hduser:hduser hadoop-2.7.7
    mv hadoop-2.7.7 /usr/local/hadoop

    切换用户并配置环境变量

    su - hduser
    vim .basrc
    
    #变量信息
    export JAVA_HOME=/home/hduser/jdk1.8.0_151
    export HADOOP_HOME=/usr/local/hadoop 
    export PATH=$PATH:$HADOOP_HOME/bin 
    export PATH=$PATH:$HADOOP_HOME/sbin 
    export HADOOP_MAPRED_HOME=$HADOOP_HOME 
    export HADOOP_COMMON_HOME=$HADOOP_HOME 
    export HADOOP_HDFS_HOME=$HADOOP_HOME 
    export YARN_HOME=$HADOOP_HOME 
    export HADOOP_COMMON_HOME=$HADOOP_HOME 
    export HADOOP_HDFS_HOME=$HADOOP_HOME 
    export YARN_HOME=$HADOOP_HOME 
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
    export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib" 
    export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH

    修改Master配置文件

    vim hadoop-env.sh
    /**/
        配置java路径 
        export JAVA_HOME=/home/hduser/jdk1.8.0_151
    /**/
    vim core-site.xml
    /**/
    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://hadoop-master-001:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/hadoop_data/hadoop_tmp</value>
        </property>
    </configuration>
    /**/
    vim hdfs-site.xml
    /**/
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/data/hadoop_data/hdfs/namenode</value>        #创建真实的路径用来存放名称节点
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/data/hadoop_data/hdfs/datanode</value>        #创建真实的路径用了存放数据
        </property>
    </configuration>
    /**/
    vim mapred-site.xml
    /**/
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>
    /**/
    vim yarn-site.xml
    /**/
    <configuration>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop-master-001</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>hadoop-master-001:8050</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>hadoop-master-001:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>hadoop-master-001:8025</value>
        </property>
        #使用hadoop yarn运行pyspark时,不添加下面两个参数会报错
        <property>
            <name>yarn.nodemanager.pmem-check-enabled</name>
            <value>false</value>
        </property>
        <property>
            <name>yarn.nodemanager.vmem-check-enabled</name>
            <value>false</value>
        </property>
    </configuration>
    /**/

    修改Slave配置文件 

    vim hadoop-env.sh
    /**/
        配置java路径 
        export JAVA_HOME=/home/hduser/jdk1.8.0_151
    /**/
    vim core-site.xml
    /**/
    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://hadoop-master-001:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/data/hadoop_data/hadoop_tmp</value>
        </property>
    </configuration>
    /**/
    vim hdfs-site.xml
    /**/
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/home/data/hadoop_data/hdfs/datanode</value>
        </property>
    </configuration>
    /**/
    vim mapred-site.xml
    /**/
    <configuration>
        <property>
            <name>mapred.job.tracker</name>
            <value>hadoop-master-001:54311</value>
        </property>
    </configuration>
    /**/
    vim yarn-site.xml
    /**/
    <configuration>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>hadoop-master-001:8050</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>hadoop-master-001:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>hadoop-master-001:8025</value>
        </property>
        #使用hadoop yarn运行pyspark时,不添加下面两个参数会报错
        <property>
            <name>yarn.nodemanager.pmem-check-enabled</name>
            <value>false</value>
        </property>
        <property>
            <name>yarn.nodemanager.vmem-check-enabled</name>
            <value>false</value>
        </property>
    </configuration>
    /**/

    其他操作(所有节点,包括master与slave)

    #执行hadoop 命令报WARNING解决办法
    vim log4j.properties添加如下行
        log4j.logger.org.apache.hadoop.util.NativeCodeLoader=ERROR

     启动操作

    安装并配置完成后返回master节点格式化namenode
    cd /data/hadoop_data/hdfs/namenode
    hadoop namenode -format
    
    在master节点执行命令
    start-all.sh         //启动
    stop-all.sh         //关闭

     异常处理

    hadoop数据节点查看hdfs文件时:
    ls: No Route to Host from  hadoop-data-002/10.0.15.98 to hadoop-master-001:9000 failed on socket timeout exception: java.net.NoRouteToHostException: 没有到主机的路由; For more details see:  http://wiki.apache.org/hadoop/NoRouteToHost
    
    解决方式数据节点telnet namenode的9000端口
    正常原因/etc/hosts中主机名与ip地址不符或者端口未开放防火墙引起

    效果图

    扩展连接

    spark集群安装并集成到hadoop集群

  • 相关阅读:
    深度学习优化方法比较
    调参
    Numpy/Pytorch之数据类型与强制转换
    numpy:维度问题
    js模板引擎-juicer
    js模板引擎-腾讯artTemplate 简洁语法例子
    canva绘制时钟
    js中的break ,continue, return
    JavaScript奇技淫巧44招
    数据类型
  • 原文地址:https://www.cnblogs.com/charles1ee/p/11239375.html
Copyright © 2020-2023  润新知