• CentOS 7下Hadoop3.1伪分布式环境搭建


        本教程JDK版本为1.8.0  Hadoop版本为3.1.1

    相关资源:https://pan.baidu.com/s/1EhkiCXidke-iN6kU3yuMJQ 提取码:p0bl

    1.安装虚拟机

      可自行选择VMware或者VirtualBox进行虚拟机安装

      此教程基于VMware 

    2.安装操作系统

      可从CentOS 官网自行选择版本进行安装 

      此教程基于CentOS 7 X86_64-Minimal-1804

    3.检查是否安装ssh (CentOS 7 即使是最小化安装也已附带openssh 可跳过本步骤)

      rpm -qa | grep ssh

      若已安装进行下一步骤 若未安装 请自行百度 本教程不做过多讲解

    4.配置ssh,实现无密码登录

      1.开启sshd服务

      systemctl start sshd.service

      2.进入 ~/.ssh 文件夹

      cd ~/.ssh

       若不存在该文件夹 可使用以下命令 使用root账户登录后生成

      ssh root@localhost

       然后输入yes 并输入本机root密码 

      3.进入 .ssh目录后 执行

      ssh-keygen -t rsa 

       一路按回车就可以 

      4.做ssh免密认证 执行以下命令即可

      cat id_rsa.pub >> authorized_keys

      5.修改文件权限

      chmod 644 authorized_keys

      6.检测是否可以免密登录

      ssh root@localhost

       无需输入密码登录 即为成功

    5.上传jdk,并配置环境变量

      通过xftp 或者 winSCP等工具 将文件上传至CentOS7 的 /usr/local/java 文件夹中

      进入文件夹并进行解压缩

      cd /usr/local/java
      tar -zxvf jdk-8u191-linux-x64.tar.gz

      设置环境变量 

      vim ~/.bashrc

      在最下方添加

      export JAVA_HOME=/usr/local/java/jdk1.8.0_191
      export PATH=$JAVA_HOME/bin:$PATH

      使用以下命令使配置生效

      source ~/.bashrc

    6.上传Hadoop,并配置环境变量

     1.系统环境变量 

      通过xftp 或者 winSCP等工具 将文件上传至CentOS7 的 /usr/local/java 文件夹中

      进入文件夹并进行解压缩

      cd /usr/local/hadoop
      tar -zxvf hadoop-3.1.1.tar.gz

      设置环境变量

      vim ~/.bashrc

      在最下方添加

     export HADOOP_HOME=/usr/local/hadoop/hadoop-3.1.1
     export HADOOP_INSTALL=$HADOOP_HOME
     export HADOOP_MAPRED_HOME
    =$HADOOP_HOME  export HADOOP_COMMON_HOME=$HADOOP_HOME  export HADOOP_HDFS_HOME=$HADOOP_HOME  export YARN_HOME=$HADOOP_HOME  export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native  export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

      使用以下命令使配置生效

      source ~/.bashrc

     2.准备工作

      创建存放数据的目录

      mkdir /usr/local/hadoop/tmp

      创建namenode 存放 name table 的目录

      mkdir /usr/local/hadoop/tmp/dfs/name

      创建 datanode  存放 数据 block 的目录

      mkdir /usr/local/hadoop/tmp/dfs/data

     3.修改/usr/local/hadoop/hadoop-3.1.1/etc/hadoop文件夹下的core-site.xml配置文件

      默认情况下,Hadoop将数据保存在/tmp下,当重启系统时,/tmp中的内容将被自动清空,所以我们需要制定自己的一个Hadoop的目录,用来存放数据。另外需要配置Hadoop所使用的默认文件系统,以及Namenode进程所在的主机

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
    <!-- 指定hadoop运行时产生文件的存储路径-->
    <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> <description>A base for other temporary directories.</description> </property> <property>
    <!--hdfs namenode的通信地址-->
    <name>fs.defaultFS</name> <value>hdfs://127.0.0.1:9000</value> </property> </configuration>

     4.修改/usr/local/hadoop/hadoop-3.1.1/etc/hadoop文件夹下的hdfs-site.xml配置文件

      该文件指定与HDFS相关的配置信息。需要修改HDFS默认的块的副本属性,因为HDFS默认情况下每个数据块保存3个副本,而在伪分布式模式下运行时,由于只有一个数据节点,所以需要将副本个数改为1,否则Hadoop程序会报错

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
    <!--指定HDFS储存数据的副本数目,默认情况下为3份-->
    <name>dfs.replication</name> <value>1</value> </property> <property>
    <!--name node 存放 name table 的目录-->
    <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/name</value> </property> <property>
    <!--data node 存放数据 block 的目录-->
    <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop/tmp/dfs/data</value> </property> <property>
    <!--设置监控页面的端口及地址-->
    <name>dfs.http.address</name> <value>0.0.0.0:50070</value> </property> </configuration>

     5.修改/usr/local/hadoop/hadoop-3.1.1/etc/hadoop文件夹下的mapred-site.xml配置文件

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
    <!-- 指定mapreduce 编程模型运行在yarn上 -->
    <name>mapreduce.framework.name</name> <value>yarn</value> </property> </configuration>

     6.修改/usr/local/hadoop/hadoop-3.1.1/etc/hadoop文件夹下的yarn-site.xml配置文件

    <?xml version="1.0"?>
    <!--
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
    
    http://www.apache.org/licenses/LICENSE-2.0
    
    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    <!-- Site specific YARN configuration properties -->
    <property>
    <!-- 指定mapreduce 编程模型运行在yarn上 -->
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
    </configuration>

      7.格式化namenode,只格式化一次即可

       hadoop namenode -format

      8.启动hadoop

      start-all.sh

      9.查看进程,检查是否启动

      jps

      若显示五个进程 : namenodesecondarynamenodedatanoderesourcemanagernodemanager 则启动成功

    7.排除错误

     1. 初始化时的问题

      WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

      解决方案(共两种方式)

      1.在~/.bashrc 中加入如下配置 

      export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_COMMON_LIB_NATIVE_DIR"

        使上面的配置生效

      source ~/.bashrc

      2.修改core-site.xml文件,添加

    <property>
        <name>hadoop.native.lib</name>
        <value>false</value>
    </property>

     2.启动时的错误

      错误 1

    Starting namenodes on [localhost]
    ERROR: Attempting to operate on hdfs namenode as root
    ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
    Starting datanodes
    ERROR: Attempting to operate on hdfs datanode as root
    ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
    Starting secondary namenodes [localhost.localdomain]
    ERROR: Attempting to operate on hdfs secondarynamenode as root
    ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

      解决方案

      因为缺少用户定义造成的,所以分别编辑开始和关闭脚本 

         /usr/local/hadoop/hadoop-3.1.1/sbin 下的 start-dfs.sh 和 stop-dfs.sh 

      在最上方 #/usr/bin/env bash 下空白处添加

    HDFS_DATANODE_USER=root 
    HADOOP_SECURE_DN_USER=hdfs 
    HDFS_NAMENODE_USER=root 
    HDFS_SECONDARYNAMENODE_USER=root 

      错误 2

    Starting resourcemanager 
    ERROR: Attempting to launch yarn resourcemanager as root 
    ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting launch. 
    Starting nodemanagers 
    ERROR: Attempting to launch yarn nodemanager as root 
    ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting launch. 

       解决方案

         因为缺少用户定义造成的,所以分别编辑开始和关闭脚本

          /usr/local/hadoop/hadoop-3.1.1/sbin 下的 start-yarn.sh 和 stop-yarn.sh

       在最上方 #/usr/bin/env bash 下空白处添加

    YARN_RESOURCEMANAGER_USER=root
    HADOOP_SECURE_DN_USER=yarn
    YARN_NODEMANAGER_USER=root

      错误 3

    WARNING: HADOOP_SECURE_DN_USER has been replaced by HDFS_DATANODE_SECURE_USER. Using value of HADOOP_SECURE_DN_USER.

      解决方案

      在/usr/local/hadoop/hadoop-3.1.1/sbin 下的 start-dfs.sh 和 stop-dfs.sh 中将

    HDFS_DATANODE_USER=root  
    HADOOP_SECURE_DN_USER=hdfs  
    HDFS_NAMENODE_USER=root  
    HDFS_SECONDARYNAMENODE_USER=root 

      改为

    HDFS_DATANODE_USER=root  
    HDFS_DATANODE_SECURE_USER=hdfs  
    HDFS_NAMENODE_USER=root  
    HDFS_SECONDARYNAMENODE_USER=root 

    8.至此已经成功安装完成Hadoop

      HDFS Web界面:http://192.168.0.3:50070 

      ResourceManager Web界面:http://192.168.0.3:8088 

  • 相关阅读:
    PHP之数据库操作(一)
    PHP之字符串操作
    Subline使用方法
    POST和GET的区别(面试回答)
    面试题(1)
    http协议
    JS 闭包(内存溢出与内存泄漏)(垃圾回收机制)
    JS----事件机制 事件冒泡 事件捕获 事件委托
    js的数据类型、判断对象类型 js对象
    JS----DOM节点操作:创建 ,插入,删除,复制,查找节点
  • 原文地址:https://www.cnblogs.com/xiaoerjun/p/10070633.html
Copyright © 2020-2023  润新知