• Hadoop1.0之集群搭建


    VirtualBox虚拟机

    下载地址

    下载择操作系统对应的基础安装包
    下载扩展包(不区分操作系统)

    http://www.oracle.com/technetwork/cn/server-storage/virtualbox/downloads/index.html

    安装基础包

    按照提示安装即可

    扩展包安装

    1 先安装基础包

    2 安装扩展包

    打开虚拟机 -> 管理 -> 全局设定 -> 扩展 -> 点击右边的加号 -> 选择下载的扩展包文件,按照提示安装即可


    VBox安装CentOS7

    下载minimalISO

    http://isoredirect.centos.org/centos/7/isos/x86_64/CentOS-7-x86_64-Minimal-1804.iso

    从镜像列表里选择离自己近的镜像下载,我选择的是网易163源

    虚拟机配置规划

    CPU 2核,内存 1G,SWAP:2G,硬盘 40G(动态增长,非预先分配固定模式)

    创建虚拟机

    新建 -> 按照提示操作即可

    安装CentOS7操作系统

    创建包含操作系统iso文件的光盘

    选中刚才创建的虚拟机 -> 设置 -> 存储 -> 点击+号,然后选择CentOS7 ISO文件

    安装操作系统

    启动虚拟机,会显示图形化安装界面,按照提示操作即可,会提示一系列的设置,硬盘分区我选择的自动分区。
    各种设置都完成后,点击安装,等待一段时间,安装需要一些时间,我的机器上,大概等了十几分钟:)

    配置虚拟机网络

    最好选择NAT网络模式

    安装依赖的包

    因为是安装的是最小化ISO,有些基础的包都没有

    yum install gcc wget lrzsz vim

    问题

    1 NAT网络虚拟机可以ping通主机,但主机ping不通虚拟机

    采用了以下办法都不能解决(PS:以前是可以的)

    1.关闭主机,虚拟机防火墙
    2.重新安装VBox和虚拟机

    折腾了几小时,未找到原因,先暂时使用桥接模式,后面再看看能不能想到是什么原因


    Hadoop集群环境搭建

    集群规划

    使用3台虚拟机
    1台master, ip地址:192.168.1.15
    2台slave, slave1 ip地址:192.168.1.15, slave2 ip地址:192.168.1.16

    实际环境中,namenode需要多分配内存,datanode需要多分配硬盘空间

    master虚拟机操作

    安装java

    下载地址

    选择安装java8
    http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

    解压

    tar -xvzf jdk-8u181-linux-x64.tar.gz
    

    设置环境变量

    export JAVA_HOME=/usr/local/src/jdk1.8.0_181
    export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib
    export PATH=$PATH:$JAVA_HOME/bin
    

    环境变量生效

    source ~/.bashrc
    

    问题

    1. x86,x64是什么意思?
      x86:32位,x64:64位

    2.选择tar.gz还是rpm?
    这个看个人喜好吧,我选择的tar.gz,需要单独配置java相关的环境变量

    hadoop 1.2.1安装

    解压

    [root@localhost src]# tar -xvzf hadoop-1.2.1-bin.tar.gz
    

    创建tmp目录

    [root@localhost src]# cd hadoop-1.2.1
    [root@localhost hadoop-1.2.1]# mkdir tmp
    

    配置

    进入conf目录

    [root@localhost hadoop-1.2.1]# cd conf
    [root@localhost conf]# pwd
    /usr/local/src/hadoop-1.2.1/conf
    
    1. 配置masters
    [root@localhost conf]# vim masters
    
    master
    
    1. 配置slaves
    [root@localhost conf]# vim slaves
    
    slave1
    slave2
    
    1. 配置core-site.xml
    vim core-site.xml
    
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
            <name>hadoop.tmp.dir</name>
            <value>/usr/local/src/hadoop-1.2.1/tmp</value>
    </property>
    <property>
            <name>fs.default.name</name>
            <value>hdfs://192.168.1.15:9000</value>
            </property>
    </configuration>
    
    1. 配置mapred-site.xml
    [root@localhost conf]# vim mapred-site.xml
    
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
            <property>
                    <name>mapred.job.tracker</name>
                    <value>http://192.168.1.15:9001</value>
            </property>
    </configuration>
    
    1. 配置hdfs-site.xml
    [root@localhost conf]# vim hdfs-site.xml
    
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
            <property>
                    <name>dfs.replication</name>
                    <value>3</value>
            </property>
    </configuration>
    

    6.配置hadoop-env.sh

    [root@localhost conf]# vim hadoop-env.sh
    
    # 增加
    export JAVA_HOME=/usr/local/src/jdk1.8.0_181
    
    1. 配置hosts
    [root@localhost conf]# vim /etc/hosts
    192.168.1.15 master
    192.168.1.16 slave1
    192.168.1.17 slave2
    
    1. 配置hostname
    [root@localhost conf]# hostnamectl set-hostname master
    [root@localhost conf]# hostnamectl status
       Static hostname: master
             Icon name: computer-vm
               Chassis: vm
            Machine ID: 8751162d551a426393cd5e5c2fadf3d3
               Boot ID: 4d3093f75e514da399ff522bea8b420f
        Virtualization: kvm
      Operating System: CentOS Linux 7 (Core)
           CPE OS Name: cpe:/o:centos:centos:7
                Kernel: Linux 3.10.0-862.el7.x86_64
          Architecture: x86-64
    

    slave1虚拟机操作

    创建

    从master克隆一份(克隆之前,先退出虚拟机)

    选中master虚拟机->点击鼠标右键->复制->设置虚拟名(勾选重新初始化所有网卡地址)->接下来的步骤按提示操作

    设置hostname

    hostnamectl set-hostname slave2
    

    slave2虚拟机操作

    操作同slave1,只不过主机名设置为slave2

    虚拟机间建立互信,实现免密码登录

    1.三台机分别生成rsa非对称秘钥

    # master
    [wadeyu@master ~]$ su root
    Password: 
    [root@master wadeyu]# ssh-keygen
    
    # slave1
    [wadeyu@slave1 ~]$ su root
    Password: 
    [root@slave1 wadeyu]# ssh-keygen
    
    # slave2
    [wadeyu@slave2 ~]$ su root
    Password: 
    [root@slave2 wadeyu]# ssh-keygen
    
    

    2.保存公钥到~/.ssh/authorized_keys文件中

    # master机器操作
    
    [root@master wadeyu]# cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
    
    追加slave1和slave2的公钥到这个文件中
    [root@master wadeyu]# scp slave1:~/.ssh/id_rsa.pub ~/slave1_id_rsa.pub
    [root@master wadeyu]# scp slave2:~/.ssh/id_rsa.pub ~/slave2_id_rsa.pub
    [root@master wadeyu]# cat ~/slave1_id_rsa.pub >> ~/.ssh/authorized_keys 
    [root@master wadeyu]# cat ~/slave2_id_rsa.pub >> ~/.ssh/authorized_keys
    
    复制文件~/.ssh/authorized_keys到slave1,slave2
    [root@master wadeyu]# scp ~/.ssh/authorized_keys slave1:~/.ssh
    root@slave1's password: 
    authorized_keys                                                                                              100% 1179   458.2KB/s   00:00    
    [root@master wadeyu]# scp ~/.ssh/authorized_keys slave2:~/.ssh
    root@slave2's password: 
    authorized_keys 
    
    

    其它操作(每台虚拟机)

    为了减少系统配置对集群的影响,学习环境关闭防火墙和selinux

    1.关闭防火墙

    [root@master wadeyu]# systemctl stop firewalld
    [root@master wadeyu]# systemctl status firewalld
    ● firewalld.service - firewalld - dynamic firewall daemon
       Loaded: loaded (/usr/lib/systemd/system/firewalld.service; enabled; vendor preset: enabled)
       Active: inactive (dead) since Sat 2018-09-01 11:26:29 CST; 5s ago
         Docs: man:firewalld(1)
      Process: 635 ExecStart=/usr/sbin/firewalld --nofork --nopid $FIREWALLD_ARGS (code=exited, status=0/SUCCESS)
     Main PID: 635 (code=exited, status=0/SUCCESS)
    
    Sep 01 10:23:16 master systemd[1]: Starting firewalld - dynamic firewall daemon...
    Sep 01 10:23:18 master systemd[1]: Started firewalld - dynamic firewall daemon.
    Sep 01 11:26:21 master systemd[1]: Stopping firewalld - dynamic firewall daemon...
    Sep 01 11:26:29 master systemd[1]: Stopped firewalld - dynamic firewall daemon.
    
    1. 关闭selinux
    [root@master wadeyu]# getenforce
    Enforcing
    [root@master wadeyu]# setenforce 0
    [root@master wadeyu]# getenforce
    Permissive
    

    启动集群

    master节点操作,进入hadoop/bin目录

    1. 第一次启动需要对hadoop格式化
    [root@master wadeyu]# cd /usr/local/src/hadoop-1.2.1
    hadoop-1.2.1/            hadoop-1.2.1-bin.tar.gz  
    [root@master wadeyu]# cd /usr/local/src/hadoop-1.2.1
    [root@master hadoop-1.2.1]# cd /usr/local/src/hadoop-1.2.1/bin
    [root@master bin]# ./hadoop namenode -format
    18/09/01 11:37:07 INFO namenode.NameNode: STARTUP_MSG: 
    /************************************************************
    STARTUP_MSG: Starting NameNode
    STARTUP_MSG:   host = master/192.168.1.15
    STARTUP_MSG:   args = [-format]
    STARTUP_MSG:   version = 1.2.1
    STARTUP_MSG:   build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152; compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
    STARTUP_MSG:   java = 1.8.0_181
    ************************************************************/
    18/09/01 11:37:08 INFO util.GSet: Computing capacity for map BlocksMap
    18/09/01 11:37:08 INFO util.GSet: VM type       = 64-bit
    18/09/01 11:37:08 INFO util.GSet: 2.0% max memory = 1013645312
    18/09/01 11:37:08 INFO util.GSet: capacity      = 2^21 = 2097152 entries
    18/09/01 11:37:08 INFO util.GSet: recommended=2097152, actual=2097152
    18/09/01 11:37:08 INFO namenode.FSNamesystem: fsOwner=root
    18/09/01 11:37:08 INFO namenode.FSNamesystem: supergroup=supergroup
    18/09/01 11:37:08 INFO namenode.FSNamesystem: isPermissionEnabled=true
    18/09/01 11:37:08 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
    18/09/01 11:37:08 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
    18/09/01 11:37:08 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
    18/09/01 11:37:08 INFO namenode.NameNode: Caching file names occuring more than 10 times 
    18/09/01 11:37:09 INFO common.Storage: Image file /usr/local/src/hadoop-1.2.1/tmp/dfs/name/current/fsimage of size 110 bytes saved in 0 seconds.
    18/09/01 11:37:09 INFO namenode.FSEditLog: closing edit log: position=4, editlog=/usr/local/src/hadoop-1.2.1/tmp/dfs/name/current/edits
    18/09/01 11:37:09 INFO namenode.FSEditLog: close success: truncate to 4, editlog=/usr/local/src/hadoop-1.2.1/tmp/dfs/name/current/edits
    18/09/01 11:37:09 INFO common.Storage: Storage directory /usr/local/src/hadoop-1.2.1/tmp/dfs/name has been successfully formatted.
    18/09/01 11:37:09 INFO namenode.NameNode: SHUTDOWN_MSG: 
    /************************************************************
    SHUTDOWN_MSG: Shutting down NameNode at master/192.168.1.15
    ************************************************************/
    

    2.启动所有节点

    [root@master bin]# ./start-all.sh 
    starting namenode, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-wadeyu-namenode-master.out
    slave2: starting datanode, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-slave2.out
    slave1: starting datanode, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-datanode-slave1.out
    The authenticity of host 'master (192.168.1.15)' can't be established.
    ECDSA key fingerprint is SHA256:8DvdHBlcz1qInlLa9k2iYyd4Ip7auPhcb0mjHbEwZmo.
    ECDSA key fingerprint is MD5:9e:33:01:d2:fb:9c:dc:4f:40:30:90:fe:37:6e:1f:33.
    Are you sure you want to continue connecting (yes/no)? yes
    master: Warning: Permanently added 'master,192.168.1.15' (ECDSA) to the list of known hosts.
    master: starting secondarynamenode, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-secondarynamenode-master.out
    starting jobtracker, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-wadeyu-jobtracker-master.out
    slave1: starting tasktracker, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-slave1.out
    slave2: starting tasktracker, logging to /usr/local/src/hadoop-1.2.1/libexec/../logs/hadoop-root-tasktracker-slave2.out
    

    3.查看集群状态

    # master
    [root@master bin]# jps
    2116 JobTracker
    2232 Jps
    1883 NameNode
    2044 SecondaryNameNode
    
    # slave1
    [root@master bin]# ssh slave1
    Last login: Sat Sep  1 11:20:05 2018 from slave2
    [root@slave1 ~]# jps
    3936 Jps
    1617 TaskTracker
    1538 DataNode
    
    #slave2
    [root@slave1 ~]# exit
    logout
    Connection to slave1 closed.
    [root@master bin]# ssh slave2
    Last login: Sat Sep  1 11:20:24 2018 from slave1
    [root@slave2 ~]# jps
    3774 TaskTracker
    3695 DataNode
    3871 Jps
    

    4.hadoop文件操作示例

    # 查看/
    [root@master bin]# ./hadoop fs -ls /
    Found 1 items
    drwxr-xr-x   - root supergroup          0 2018-09-01 11:38 /usr
    
    # 上传文件
    [root@master bin]# ./hadoop fs -put /etc/passwd /
    [root@master bin]# ./hadoop fs -ls /
    Found 2 items
    -rw-r--r--   3 root supergroup        847 2018-09-01 11:44 /passwd
    drwxr-xr-x   - root supergroup          0 2018-09-01 11:38 /usr
    
    # 查看文件内容
    [root@master bin]# ./hadoop fs -cat /passwd
    root:x:0:0:root:/root:/bin/bash
    bin:x:1:1:bin:/bin:/sbin/nologin
    daemon:x:2:2:daemon:/sbin:/sbin/nologin
    adm:x:3:4:adm:/var/adm:/sbin/nologin
    lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
    sync:x:5:0:sync:/sbin:/bin/sync
    shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
    halt:x:7:0:halt:/sbin:/sbin/halt
    mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
    operator:x:11:0:operator:/root:/sbin/nologin
    games:x:12:100:games:/usr/games:/sbin/nologin
    ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
    nobody:x:99:99:Nobody:/:/sbin/nologin
    systemd-network:x:192:192:systemd Network Management:/:/sbin/nologin
    dbus:x:81:81:System message bus:/:/sbin/nologin
    polkitd:x:999:998:User for polkitd:/:/sbin/nologin
    sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
    postfix:x:89:89::/var/spool/postfix:/sbin/nologin
    wadeyu:x:1000:1000:wadeyu:/home/wadeyu:/bin/bash
    

    说明

    1. 虚拟机使用了桥接模式连接,我在路由器增加了虚拟机mac地址和ip的绑定,所以虚拟机没有固定ip

    参考资料

    【0】八斗学院内部学习资料

  • 相关阅读:
    stm32f103串口实现映射功能
    Who is YaoGe.(搞笑篇)
    hdoj-2066-一个人的旅行(迪杰斯特拉)
    Webpack 性能优化 (一)(使用别名做重定向)
    How Visual Studio 2012 Avoids Prompts for Source
    HDU 4031 Attack
    js实现的省市联动
    Java几种单例模式的实现与利弊
    python项目实现配置统一管理的方法
    我的AI之路 —— OCR文字识别快速体验版
  • 原文地址:https://www.cnblogs.com/wadeyu/p/9621503.html
Copyright © 2020-2023  润新知