• hadoop安装文档


    一、准备

      该准备工作在三台机器上都需要进行,首先使用 vmvare 创建 1 个虚拟机,这台虚拟机是 master,一会需要把 master 克隆出两台 slave

        

        

        

         

        

        

        

        

        

        

        点确定然后开启此虚拟机

        

        

        

        

        

        

        

        然后添加/boot 分区,大小为 1G,文件系统选 ext4 

        

        然后添加 swap 分区,注意,swap 分区为内存的 2 倍,文件系统则选择为 swap 

        

        然后点完成 

        

        

        

        

        

        

        然后等待安装完成,然后点重启

        

        

        到此系统安装就完成了,然后设置网络 

        

        

        点完确定后,然后再进去查看下网关

        

        

        点取消,记住这个网关 

        1、我先换下主机名 

    [root@localhost ~]# hostnamectl set-hostname wangmaster 
    [root@localhost ~]# hostname wangmaster 
    [root@localhost ~]# exit

        2、重新登录,然后设置网卡 

    [root@wangmaster ~]# vi /etc/sysconfig/network-scripts/ifcfg-eno16777736  
    TYPE=Ethernet 
    BOOTPROTO=static 
    DEFROUTE=yes 
    PEERDNS=yes 
    PEERROUTES=yes 
    IPV4_FAILURE_FATAL=no 
    IPV6INIT=yes 
    IPV6_AUTOCONF=yes 
    IPV6_DEFROUTE=yes 
    IPV6_PEERDNS=yes 
    IPV6_PEERROUTES=yes 
    IPV6_FAILURE_FATAL=no 
    NAME=eno16777736 
    DEVICE=eno16777736 
    ONBOOT=yes  //启用网卡 
    IPADDR=192.168.225.100  //设置 IP 
    NETMASK=255.255.255.0  //设置掩码 
    GATEWAY=192.168.225.2  //设置网关,就是记住的网关 
    DNS1=114.114.114.114 //设置 DNS 
    DNS2=114.114.114.115   //设置备用 DNS 
    [root@wangmaster ~]# systemctl restart network.service //重启网络

        3、设置网络 YUM 源 

        

        选择远程登录工具登录操作 (可以用Xshell)

        

        点击文件传输按钮,进入Xftp软件,进行传输文件。

        

        将上面保存的文件传入/etc/yum.repos.d文件夹下。

            

    [root@wangmaster ~]# cd /etc/yum.repos.d/ 
    [root@wangmaster yum.repos.d]# ls
    CentOS7-Base-163.repo  CentOS-Debuginfo.repo  CentOS-Sources.repo 
    CentOS-Base.repo       CentOS-fasttrack.repo  CentOS-Vault.repo 
    CentOS-CR.repo         CentOS-Media.repo 
    [root@wangmaster yum.repos.d]# mv CentOS-Base.repo CentOS-Base.repo.bak //使原来的 yum 失效 
    [root@wangmaster yum.repos.d]# yum clean all  //清除 yum 缓存 
    已加载插件:fastestmirror 
    正在清理软件源: base extras updates 
    Cleaning up everything 
    [root@wangmaster yum.repos.d]# yum repolist  //更新 yum 库 
    已加载插件:fastestmirror 
    base                                                     | 3.6 kB     00:00      
    extras                                                   | 3.4 kB     00:00      
    updates                                                  | 3.4 kB     00:00      
    (1/4): base/7/x86_64/group_gz                              | 155 kB   00:00      
    (2/4): extras/7/x86_64/primary_db                          | 139 kB   00:00      
    (3/4): base/7/x86_64/primary_db                            | 5.6 MB   00:09      
    (4/4): updates/7/x86_64/primary_db                         | 3.9 MB   00:11      
    Determining fastest mirrors 
    源标识                         源名称                                      状态 
    base/7/x86_64                  CentOS-7 - Base - 163.com                 9,363 
    extras/7/x86_64                CentOS-7 - Extras - 163.com               311 
    updates/7/x86_64               CentOS-7 - Updates - 163.com              1,126 
    repolist: 10,800 
    [root@wangmaster yum.repos.d]# 
    [root@wangmaster yum.repos.d]# yum install -y vim //安装 VIM 工具 

        4、关闭 selinux

    [root@wangmaster yum.repos.d]# vim /etc/selinux/config  
    # This file controls the state of SELinux on the system. 
    # SELINUX= can take one of these three values: 
    #     enforcing - SELinux security policy is enforced. 
    #     permissive - SELinux prints warnings instead of enforcing. 
    #     disabled - No SELinux policy is loaded. 
    SELINUX=disabled 
    # SELINUXTYPE= can take one of three two values: 
    #     targeted - Targeted processes are protected, 
    #     minimum - Modification of targeted policy. Only selected processes are 
    pro 
    tected.  
    #     mls - Multi Level Security protection. 
    SELINUXTYPE=targeted

        5、停止防火墙功能

    [root@wangmaster ~]# systemctl stop firewalld.service 停止防火墙 
    [root@wangmaster ~]# systemctl disable firewalld.service  停止防火墙开机自启动 
    [root@wangmaster ~]# systemctl status firewalld 查看防火墙状态

        6、规划为 3 个虚拟机,分别为 master,slave1,slave2,在/etx/hosts 文件中修改 

    [root@wangmaster ~]# vim /etc/hosts 
    127.0.0.1   localhost localhost.localdomain localhost4 
    localhost4.localdomain4 
    ::1         localhost localhost.localdomain localhost6 
    localhost6.localdomain6 
     
    192.168.225.100 wangmaster 
    192.168.225.101 wangslave1 
    192.168.225.102 wangslave2 

        然后重启虚拟机(一定要重启,因为 selinux 设置重启才生效)

        (注意:在所有三台虚拟机中都进行这样的修改,ip 地址根据实际情况进行修改)

        7、使用如下命令在线安装 

    [root@wangmaster ~]$ yum install –y wget 
    [root@wangmaster ~]$ yum install –y net-tools 

        8、创建目录

    [root@wangmaster ~]# mkdir /opt/bigdata

        9、将 jdk 拷贝到 192.168.225.100 的 opt 的 bigdata 目录 

    [root@wangmaster bigdata]# ls 
    hadoop-2.7.3.tar.gz  jdk1.8.tar.gz 
    这里我提前在 bigdata 中传入了所需要的软件

        10、在 master 中创建用户 hadoop 

    [root@wangmaster bigdata]# useradd hadoop 
    [root@wangmaster bigdata]# id hadoop 
    uid=1000(hadoop) gid=1000(hadoop) 组=1000(hadoop) 
    [root@wangmaster ~]# passwd hadoop 
    更改用户 hadoop 的密码 。我设置的密码是 123456,需要打两遍 
    新的 密码: 
    无效的密码: 密码少于 8 个字符 
    重新输入新的 密码: 
    passwd:所有的身份验证令牌已经成功更新。 
    [root@wangmaster ~]# 

        11、使用户成为 sudoers,以 root 用户修改文件/etc/sudoers,修改方式如下: 

    [root@wangmaster bigdata]# vim /etc/sudoers 
    ## Allow root to run any commands anywhere 
    root    ALL=(ALL)       ALL 
    hadoop  ALL=(ALL)       ALL 

        12、修改/opt/bigdata 文件夹的权限

    [root@wangmaster ~]# chmod -R 777 /opt/bigdata 
    [root@wangmaster ~]# chown -R hadoop.hadoop /opt/bigdata 
    [root@wangmaster ~]# ll /opt 
    总用量 4 
    drwxrwxrwx. 2 hadoop hadoop 4096 49 05:28 bigdata 

        13、安装 JDK 运行环境 

    [hadoop@wangmaster bigdata]# tar -zxvf jdk1.8.tar.gz  
    [hadoop@wangmaster bigdata]# mv /opt/bigdata/jdk1.8 /opt/bigdata/ 
    hadoop@wangmaster bigdata]# ls 
    hadoop-2.7.3.tar.gz  jdk1.8  jdk1.8.tar.gz  opt 

        14、在修改/etc/profile 文件,配置 java 环境: 

    [root@wangmaster ~]$ vim /etc/profile 
    #java configuration 
    JAVA_HOME=/opt/bigdata/jdk1.8 
    JAVA_BIN=/opt/bigdata/jdk1.8/bin 
    PATH=$PATH:$JAVA_HOME/bin 
    CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JAVA_HOME/jre/lib/rt.jar 
    export JAVA_HOME 
    export JAVA_BIN  
    export PATH  
    export CLASSPATH 
    [hadoop@wangmaster ~]# source /etc/profile 
    [hadoop@wangmaster bigdata]$ java -version 
    java version "1.8.0_111" 
    Java(TM) SE Runtime Environment (build 1.8.0_111-b14) 
    Java HotSpot(TM) 64-Bit Server VM (build 25.111-b14, mixed mode) 
    [hadoop@wangmaster bigdata]$ javac -version 
    javac 1.8.0_111 

        (注:上图中,JAVA_HOME 为你安装的 JDK 路径) 

        15、安装 hadoop

    [hadoop@wangmaster bigdata]$ tar -zxvf hadoop-2.7.3.tar.gz  
    [hadoop@wangmaster bigdata]$ ll 
    总用量 386500 
    drwxr-xr-x. 9 hadoop hadoop      4096 818 2016 hadoop-2.7.3 
    -rwxrwxrwx. 1 hadoop hadoop 214092195 313 19:16 hadoop-2.7.3.tar.gz 
    drwxrwxrwx. 8 hadoop hadoop      4096 313 00:14 jdk1.8 
    -rwxrwxrwx. 1 hadoop hadoop 181668321 322 23:31 jdk1.8.tar.gz

        16、在 hadoop 目录下建立 tmp 目录,并将权限设定为 777 

    [hadoop@wangmaster bigdata]$ cd hadoop-2.7.3  
    [hadoop@wangmaster hadoop-2.7.3]$ mkdir tmp 
    [hadoop@wangmaster hadoop-2.7.3]$ chmod 777 tmp 
    [hadoop@wangmaster hadoop-2.7.3]$ mkdir dfs 
    [hadoop@wangmaster hadoop-2.7.3]$ mkdir dfs/name 
    [hadoop@wangmaster hadoop-2.7.3]$ mkdir dfs/data

     二、 Hadoop 安装与配置 

        Hadoop 是大数据生态圈的基石,下面首先以 Hadoop 的安装与配置开始。 

        1、进入安装目录 

    [hadoop@wangmaster ~]$ cd /opt/bigdata/hadoop-2.7.3

        2、环境配置 

    [hadoop@wangmaster hadoop-2.7.3]$ cd etc/hadoop 
    [hadoop@wangmaster hadoop]$ vim yarn-env.sh 
    # some Java parameters 
    export JAVA_HOME=/opt/bigdata/jdk1.8 

        3、core 配置

    [hadoop@wangmaster hadoop]$ vim core-site.xml  
    <property> 
            <name>hadoop.tmp.dir</name> 
    <value>/opt/bigdata/hadoop-2.7.3/tmp</value> 
    </property> 
    <property> 
     <name>fs.default.name</name> 
     <value>hdfs://wangmaster:9000</value> 
    </property> 
    <property> 
            <name>hadoop.proxyuser.hadoop.hosts</name> 
            <value>*</value> 
    </property> 
    <property> 
           <name>hadoop.proxyuser.hadoop.groups</name> 
            <value>*</value> 
    </property> 

        4、hdfs 配置 

    [hadoop@wangmaster hadoop]$ vim hdfs-site.xml 
    <configuration> 
    <property> 
        <name>dfs.replication</name>   
        <value>3</value> 
    </property> 
    <property> 
      <name>dfs.namenode.name.dir</name> <value>/opt/bigdata/hadoop-2.7.3/dfs/name</value> </property> <property>   <name>dfs.datanode.data.dir</name> <value>/opt/bigdata/hadoop-2.7.3/dfs/data</value>
    </property> <property>   <name>dfs.web.ugi</name> <value>hdfs,hadoop</value> </property> <property>   <name>dfs.permissions</name>   <value>false</value> </property> </configuration>

        5、yarn 配置 

    [hadoop@wangmaster hadoop]$ vim yarn-site.xml 
    <configuration> 
     
    <!-- Site specific YARN configuration properties --> 
    <property> 
        <name>yarn.resourcemanager.hostname</name> 
        <value>wangmaster</value> 
    </property> 
     
    <property> 
        <name>yarn.resourcemanager.webapp.address</name> 
        <value>wangmaster:8088</value> 
    </property> 
     
    <property> 
        <name>yarn.resourcemanager.scheduler.address</name> 
        <value>wangmaster:8081</value> 
    </property>
    <property> 
        <name>yarn.resourcemanager.resource-tracker.address</name> 
        <value>wangmaster:8082</value> 
    </property> 
     
    <property> 
        <name>yarn.nodemanager.aux-services</name> 
        <value>mapreduce_shuffle</value> 
    </property> 
    <property> 
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> 
        <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
    </property> 
    <property> 
        <name>yarn.web-proxy.address</name> 
        <value>wangmaster:54315</value> 
    </property> 
     
    </configuration> 

        6、mapreduce 配置

    [hadoop@wangmaster hadoop]$ vim mapred-site.xml 
    <configuration> 
    <property> 
        <name>mapreduce.framework.name</name> 
        <value>yarn</value> 
    </property> 
    <property> 
        <name>mapred.job.tracker</name>   
        <value>wangmaster:9001</value> 
    </property> 
    <property>    
          <name>mapreduce.jobhistory.address</name>    
          <value>wangmaster:10020</value>    
    </property> 
    </configuration> 

        7、slaves 配置(master、slave1 和 slave2 均作为 datanode) 

    [hadoop@wangmaster hadoop]$ vim slaves  
    wangmaster 
    wangslave1 
    wangslave2
    master02 
    slave01 
    slave02 
    slave03

        8、配置系统环境 

    [root@wangmaster bin]# vim /etc/profile  
    末尾添加这两句 
    export HADOOP_HOME=/opt/bigdata/hadoop-2.7.3 
    export PATH=$HADOOP_HOME/bin:$PATH 

        使配置生效:

    [hadoop@wangmaster hadoop]$ source /etc/profile 

        9、将虚拟机复制成 wangslave1 和 wangslave2 上。

        

        把 CentOS 64 位 Minimal 重名为 wangmaster,克隆 wangmaster,建立 wangslave1 和 wangslave2 节点,克隆教程

        首先给 wangmaster 关机

        

        

        

        复制过后的虚拟机不能直接使用,需要进行如下操作:

        

        

        

     [root@wangmaster ~]# vim /etc/sysconfig/network-scripts/ifcfg-eno16777736 
    HWADDR=00:50:56:36:BF:60  //修改这个里 mac 地址,改成刚才生成的 
    TYPE="Ethernet" 
    BOOTPROTO="static" 
    DEFROUTE="yes" 
    PEERDNS="yes" 
    PEERROUTES="yes" 
    IPV4_FAILURE_FATAL="no" 
    IPV6INIT="yes" 
    IPV6_AUTOCONF="yes" 
    IPV6_DEFROUTE="yes" 
    IPV6_PEERDNS="yes" 
    IPV6_PEERROUTES="yes" 
    IPV6_FAILURE_FATAL="no" 
    NAME="eno16777736" 
    DEVICE="eno16777736" 
    ONBOOT="yes" 
    IPADDR=192.168.225.101 //这里改成 wangslave1 的 ip 192.168.225.101 
    NETMASK=255.255.255.0 
    GATEWAY=192.168.225.2 
    DNS1=114.114.114.114 
    DNS2=114.114.114.115 
    [root@wangmaster ~]# systemctl restart network.service //重启网络

        然后修改主机名 

    [root@wangmaster ~]# hostnamectl set-hostname wangslave1 
    [root@wangmaster ~]# hostname wangslave1 
    [root@wangmaster ~]# exit 
    重新登录 
    测试 ping 自己 
    [root@wangslave1 ~]# ping wangslave1 
    PING wangslave1 (192.168.225.101) 56(84) bytes of data. 
    64 bytes from wangslave1 (192.168.225.101): icmp_seq=1 ttl=64 time=0.012ms

        根据上面操作,克隆出 wangslave2

        10、需要在 hadoop 用户下进行 master 和 slave 之间的免密 

        Master 给自己和 slave1,slave2 发证书 

    [hadoop@wangmaster ~]$ ssh-keygen 
    [hadoop@wangmaster ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@wangslave1 
    [hadoop@wangmaster ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@wangslave2 
    [hadoop@wangmaster ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@wangmaster 
    这步完成后,正常情况下就可以无密码登录本机了,即 ssh localhost,无需输入密码。

        然后 slave1 给 master 发证书 

    [hadoop@ wangslave1 ~]$ ssh-keygen 
    [hadoop@ wangslave1 ~]$ ssh-copy-id -i .ssh/id_rsa.pub hadoop@wangmaste

        然后 slave2 给 master 发证书 (略)

        11、首次启动之前格式化 hdfs: 

    [hadoop@wangmaster ~]$ hdfs namenode -format

        启动各组件 

    [hadoop@wangmaster ~]$ cd /opt/bigdata/hadoop-2.7.3/sbin

        全部启动: 

    [hadoop@wangmaster sbin]$ ./start-all.sh  
    验证 
    [hadoop@wangmaster sbin]$ jps 
    1666 DataNode 
    2099 NodeManager 
    2377 Jps 
    1853 SecondaryNameNode 
    1998 ResourceManager 
    1567 NameNode 
    [hadoop@wangslave1 ~]$ jps 
    1349 NodeManager 
    1452 Jps 
    1245 DataNode 
    [hadoop@wangslave2 ~]$ jps 
    1907 Jps 
    1703 DataNode 
    1807 NodeManager 

        在浏览器里输入 http://192.168.225.100:50070 ,判断是否启动成功

        12、全部停止

    [hadoop@wangmaster sbin]$ ./stop-all.sh

    三、实验

        1、使用

    [hadoop@wangmaster ~]$ hadoop fs -mkdir /wang 
    [hadoop@wangmaster ~]$ cd /opt/bigdata/hadoop-2.7.3 
    [hadoop@wangmaster hadoop-2.7.3]$ hadoop fs -put LICENSE.txt  /wang 
    [hadoop@wangmaster ~]$ hadoop fs -ls /wang

        2、实验,进行 wordcount 程序(可选)。一个统计文本单词个数的程序,它会统计放入文件夹内的文本的总共单词的出现个数

    [hadoop@wangmaster hadoop-2.7.3]$ hadoop fs -ls /wang 查看                      
    Found 1 items 
    -rw-r--r--   3 hadoop supergroup      84854 2017-04-09 07:34 
    /wang/LICENSE.txt  
    [hadoop@wangmaster hadoop-2.7.3]$ cd 
    /opt/bigdata/hadoop-2.7.3/share/hadoop/mapreduce 
    [hadoop@wangmaster mapreduce]$ hadoop jar 
    hadoop-mapreduce-examples-2.7.3.jar wordcount /wang /output  执行程序 
    [hadoop@wangmaster mapreduce]$ hadoop fs -ls /output  查看程序运行的输出文件 
    Found 2 items 
    -rw-r--r--   3 hadoop supergroup          0 2017-04-09 07:38 
    /output/_SUCCESS 
    -rw-r--r--   3 hadoop supergroup      22002 2017-04-09 07:38 
    /output/part-r-00000 [hadoop@wangmaster mapreduce]$ hadoop fs -cat 
    /output/part-r-00000  查看结果 
  • 相关阅读:
    VMware rhel 7 网卡绑定
    VMware 克隆虚拟机后网卡无法启动
    rhel7 批量新建和删除用户
    2019.3.27 Linux 学习
    20180313前端笔试
    javascript中的一些问题
    flex布局学习笔记(阮一峰flex布局教程)
    个推面试总结
    笔试题目整理
    @JsonFormat与@DateTimeFormat注解的使用
  • 原文地址:https://www.cnblogs.com/g-smile/p/9106261.html
Copyright © 2020-2023  润新知