• hadoop2.4.1伪分布式环境搭建


      注意:所有的安装用普通哟用户安装,所以首先使普通用户可以以sudo执行一些命令:

    0.虚拟机中前期的网络配置参考:

      http://www.cnblogs.com/qlqwjy/p/7783253.html

    1.赋予hadoop用户以sudo执行一些命令

    visodo
    或者
     vim /etc/sudoers

    添加下面第二行内容:

    登录hadoop用户查看命令:

    [hadoop@localhost java]$ sudo -l  #查看当前用户可以以sudo命令执行哪些命令
    Matching Defaults entries for hadoop on this host:
        requiretty, !visiblepw, always_set_home, env_reset, env_keep="COLORS DISPLAY HOSTNAME HISTSIZE INPUTRC KDEDIR
        LS_COLORS", env_keep+="MAIL PS1 PS2 QTDIR USERNAME LANG LC_ADDRESS LC_CTYPE", env_keep+="LC_COLLATE
        LC_IDENTIFICATION LC_MEASUREMENT LC_MESSAGES", env_keep+="LC_MONETARY LC_NAME LC_NUMERIC LC_PAPER LC_TELEPHONE",
        env_keep+="LC_TIME LC_ALL LANGUAGE LINGUAS _XKB_CHARSET XAUTHORITY", secure_path=/sbin:/bin:/usr/sbin:/usr/bin
    
    User hadoop may run the following commands on this host:
        (ALL) ALL

     ------------------------安装hadoop运行环境,切换到hadoop用户----------------------

      我所有的文件上传采用的sftp,建议安装git工具自带ssh和sftp等。注意自己的linux位数,我刚开始安装的64位JDK,结果linux是32位,JDK不能用

    查看位数:

    uname -a
    或者
    getconf LONG_BIT

    1.安装JDK

    (1)上传到服务器之后解压

    sudo tar -zxvf ./jdk-7u65-linux-i586.tar.gz 

    (2)查看当前安装目录:

    [hadoop@localhost jdk1.7.0_65]$ pwd
    /opt/java/jdk1.7.0_65

    (3)配置环境变量 ;

    [hadoop@localhost jdk1.7.0_65]$ tail -4 ~/.bashrc 
    export JAVA_HOME=/opt/java/jdk1.7.0_65
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOME}/bin:${PATH}

    重新加载环境变量:

    [hadoop@localhost jdk1.7.0_65]$ source ~/.bashrc 

     (4)执行java或者javac测试:

    [hadoop@localhost jdk1.7.0_65]$ java -vsersion
    Unrecognized option: -vsersion
    Error: Could not create the Java Virtual Machine.
    Error: A fatal exception has occurred. Program will exit.
    [hadoop@localhost jdk1.7.0_65]$ javac -version
    javac 1.7.0_65

    2. 安装hadoop2.4.1

    (1)将文件上传到服务器 

    sftp> put hadoop-2.4.1.tar.gz

    (2)解压

    sudo tar -zxvf ./hadoop-2.4.1.tar.gz

    (3)解压后查看目录:

    [hadoop@localhost hadoop-2.4.1]$ ls
    bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share

      其中java相关的jar包存放在share目录,下面还有个docs目录,没啥用,删掉就行了。

      bin是可执行文件

      etc是hadoop是相关配置文件

      lib,libexec是相关的本地服务

      sbin是hadoop的管理执行文件

     

    (4)修改配置文件:hadoop2.x的配置文件$HADOOP_HOME/etc/hadoop

    • 修改:hadoop-env.sh(设置JDK环境变量)

    #第27行

    export JAVA_HOME=/opt/java/jdk1.7.0_65
    • 修改:core-site.xml
            <!-- 指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
            <property>
                <name>fs.defaultFS</name>
                <value>hdfs://localhost:9000</value>
            </property>
            <!-- 指定hadoop运行时产生文件的存储目录 -->
            <property>
                <name>hadoop.tmp.dir</name>
                <value>/opt/hadoop/hadoop-2.4.1/data/</value>
           </property>
    • 修改hdfs-site.xml   hdfs-default.xml
            <!-- 指定HDFS副本的数量 -->
            <property>
                <name>dfs.replication</name>
                <value>1</value>
          </property>
      • 修改   mapred-site.xml  (mapreduce)

    首先将mapred-site.xml.template改名字为mapred-site.xml。否则hadoop不会读取

    [hadoop@localhost hadoop]$ sudo mv ./mapred-site.xml.template ./mapred-site.xml

    修改:

            <!-- 指定mapreduce运行在yarn上 -->
            <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
          </property>
            
      • 修改 yarn-site.xml  (修改yarn)
            <!-- 指定YARN的老大(ResourceManager)的地址 -->
            <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>localhost</value>
          </property>
            <!-- reducer获取数据的方式 -->
          <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
          </property>

    (5)关闭linux的防火墙:

    [root@localhost ~]# service iptables stop  #关闭防火墙
    iptables: Flushing firewall rules: [  OK  ]
    iptables: Setting chains to policy ACCEPT: filter [  OK  ]
    iptables: Unloading modules: [  OK  ]
    [root@localhost ~]# ls
    anaconda-ks.cfg  install.log  install.log.syslog
    [root@localhost ~]# service iptables status  #查看iptables状态
    iptables: Firewall is not running.

    3.启动hadoop与测试hadoop

    (1)前期准备

    • 首先将hadoop添加到环境变量,便于在任意目录使用hadoop的命令:
    export JAVA_HOME=/opt/java/jdk1.7.0_65
    export HADOOP_HOME=/opt/hadoop/hadoop-2.4.1
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOME}/bin:${PATH}:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
    • 格式化namenode(是对namenode进行初始化)
    hdfs namenode -format (hadoop namenode -format)

    执行命令之后会在我们的配置的hadoop的临时目录下面创建  dfs/name/current/    目录并且写入四个文件:

    [root@localhost data]# ll ./dfs/name/current/
    total 16
    -rw-r--r--. 1 root root 351 Apr 11 02:51 fsimage_0000000000000000000
    -rw-r--r--. 1 root root  62 Apr 11 02:51 fsimage_0000000000000000000.md5
    -rw-r--r--. 1 root root   2 Apr 11 02:51 seen_txid
    -rw-r--r--. 1 root root 202 Apr 11 02:51 VERSION

    (2)启动hadoop(最好设置ssh秘钥登录,否则会输入多次密码,可以自己写个shell脚本调用hdfs和yarn两个ssh脚本)

    • 启动HDFS

    先启动HDFS,到hadoop安装目录下:  /opt/hadoop/hadoop-2.4.1/sbin
      

    sbin/start-dfs.sh

    验证是否启动成功

    [root@localhost sbin]# jps
    664 SecondaryNameNode
    803 Jps
    500 DataNode
    422 NameNode

    解释:  上面启动hadoop的时候会读取启动localhost的Namenode,因为hadoop的安装目录下的etc下有个slaves文件,指定从哪些机器启动Namenode

    如果搭建多个节点需要在下面的配置文件增加节点,正规的分布式集群

    [root@localhost hadoop]# cat ./slaves 
    localhost
    • 启动yarn
    [root@localhost sbin]# ./start-yarn.sh

    再次查看:

    [root@localhost sbin]# jps
    1154 NodeManager
    882 ResourceManager
    664 SecondaryNameNode
    500 DataNode
    1257 Jps
    422 NameNode

    (3)测试上面启动的hdfs和yarn

    http://192.168.2.136:50070 (HDFS管理界面)
    http://192.168.2.136:8088 (MR管理界面)

    • 测试hdfs

    我们也可以通过网页浏览hafs文件:

     首先我们上传一个文件:

    [root@localhost ~]# ll
    total 60
    -rw-------. 1 root root  2388 Sep  9  2013 anaconda-ks.cfg
    -rw-r--r--. 1 root root 37667 Sep  9  2013 install.log
    -rw-r--r--. 1 root root  9154 Sep  9  2013 install.log.syslog
    [root@localhost ~]# hadoop fs -put install.log hdfs://localhost:9000/  #将当前目录下的install.log上传到hsfs的根目录下

     接下来我们再次查看数据会发现:

    点开也可以下载文件:

    我们在本地删掉install.log然后从hdfs中下载文件:

    [root@localhost ~]# rm -rf ./install.log  #删除文件
    [root@localhost ~]# ls
    anaconda-ks.cfg  install.log.syslog

    [root@localhost ~]# hadoop fs -get hdfs://localhost:9000/install.log  #hadoop下载文件
    [root@localhost ~]# ls  
    anaconda-ks.cfg install.log install.log.syslo
    • 测试mapreduce

    由于我们没有编写mapreduce程序,所以我们需要利用hadoop自带的一些程序进行测试,下面测试一个求PI的值和一个统计单词出现次数的mapreduce程序

    进入到hadoop的mapreduce目录下:

    [root@localhost mapreduce]# pwd
    /opt/hadoop/hadoop-2.4.1/share/hadoop/mapreduce

    例一:计算求pi值的mapreduce程序

    [root@localhost mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar pi 5 5  #执行求pi值的mapreduce,开启5个map,每个map取样5个
    Number of Maps  = 5
    Samples per Map = 5
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2
    Wrote input for Map #3
    Wrote input for Map #4
    Starting Job
    18/04/11 03:54:52 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
    18/04/11 03:54:53 INFO input.FileInputFormat: Total input paths to process : 5
    18/04/11 03:54:53 INFO mapreduce.JobSubmitter: number of splits:5
    18/04/11 03:54:54 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523441540916_0001
    18/04/11 03:54:56 INFO impl.YarnClientImpl: Submitted application application_1523441540916_0001
    18/04/11 03:54:56 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1523441540916_0001/
    18/04/11 03:54:56 INFO mapreduce.Job: Running job: job_1523441540916_0001
    18/04/11 03:55:26 INFO mapreduce.Job: Job job_1523441540916_0001 running in uber mode : false
    18/04/11 03:55:26 INFO mapreduce.Job:  map 0% reduce 0%
    18/04/11 03:57:27 INFO mapreduce.Job:  map 40% reduce 0%
    18/04/11 03:57:31 INFO mapreduce.Job:  map 80% reduce 0%
    18/04/11 03:57:32 INFO mapreduce.Job:  map 100% reduce 0%
    18/04/11 03:57:57 INFO mapreduce.Job:  map 100% reduce 100%
    18/04/11 03:57:58 INFO mapreduce.Job: Job job_1523441540916_0001 completed successfully
    18/04/11 03:58:00 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=116
                    FILE: Number of bytes written=559767
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=1315
                    HDFS: Number of bytes written=215
                    HDFS: Number of read operations=23
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=3
            Job Counters 
                    Launched map tasks=5
                    Launched reduce tasks=1
                    Data-local map tasks=5
                    Total time spent by all maps in occupied slots (ms)=633857
                    Total time spent by all reduces in occupied slots (ms)=17751
                    Total time spent by all map tasks (ms)=633857
                    Total time spent by all reduce tasks (ms)=17751
                    Total vcore-seconds taken by all map tasks=633857
                    Total vcore-seconds taken by all reduce tasks=17751
                    Total megabyte-seconds taken by all map tasks=649069568
                    Total megabyte-seconds taken by all reduce tasks=18177024
            Map-Reduce Framework
                    Map input records=5
                    Map output records=10
                    Map output bytes=90
                    Map output materialized bytes=140
                    Input split bytes=725
                    Combine input records=0
                    Combine output records=0
                    Reduce input groups=2
                    Reduce shuffle bytes=140
                    Reduce input records=10
                    Reduce output records=0
                    Spilled Records=20
                    Shuffled Maps =5
                    Failed Shuffles=0
                    Merged Map outputs=5
                    GC time elapsed (ms)=21046
                    CPU time spent (ms)=17350
                    Physical memory (bytes) snapshot=619728896
                    Virtual memory (bytes) snapshot=2174615552
                    Total committed heap usage (bytes)=622153728
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=590
            File Output Format Counters 
                    Bytes Written=97
    Job Finished in 188.318 seconds
    Estimated value of Pi is 3.68000000000000000000  #计算结果

    例二:一个wordcount的mapreduce(给一篇英文文章,会统计每个单词出现的次数)

    (1)编辑一个英文文件

    [root@localhost mapreduce]# cat ./test.txt 
    hello lll
    hello kkk
    hello meinv
    hello 

    (2)为了计算我们需要将文件上传到hdfs中

    先在hdfs中建一个目录:(两种创建目录的方式)

    [root@localhost mapreduce]# hadoop fs -mkdir hdfs://localhost:9000/wordcount  #第一种
    [root@localhost mapreduce]# hadoop fs -mkdir /wordcount/input          #第二种。/是相对于hdfs的根目录

    然后我们可以在hdfs的web管理中看到目录:(其中tmp和user是我们执行上一个程序产生的目录)

     接下来我们将上面的英文文件上传到hdfs的wordcount/input/目录下

    [root@localhost mapreduce]# hadoop fs -put test.txt /wordcount/input

    从web中查看目录;

    测试wordcount程序:(mapreduce启动很慢,因为要启动很多程序)

    测试统计hdfs的/wordcount/input目录下的所有的文件,并将统计结果输出到/wordcount/output目录中,/是hdfs的根目录

    [root@localhost mapreduce]# hadoop jar hadoop-mapreduce-examples-2.4.1.jar wordcount /wordcount/input /wordcount/output
    18/04/11 04:09:58 INFO client.RMProxy: Connecting to ResourceManager at localhost/127.0.0.1:8032
    18/04/11 04:10:00 INFO input.FileInputFormat: Total input paths to process : 1
    18/04/11 04:10:00 INFO mapreduce.JobSubmitter: number of splits:1
    18/04/11 04:10:01 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1523441540916_0002
    18/04/11 04:10:02 INFO impl.YarnClientImpl: Submitted application application_1523441540916_0002
    18/04/11 04:10:02 INFO mapreduce.Job: The url to track the job: http://localhost:8088/proxy/application_1523441540916_0002/
    18/04/11 04:10:02 INFO mapreduce.Job: Running job: job_1523441540916_0002
    18/04/11 04:10:22 INFO mapreduce.Job: Job job_1523441540916_0002 running in uber mode : false
    18/04/11 04:10:22 INFO mapreduce.Job:  map 0% reduce 0%
    18/04/11 04:10:36 INFO mapreduce.Job:  map 100% reduce 0%
    18/04/11 04:10:48 INFO mapreduce.Job:  map 100% reduce 100%
    18/04/11 04:10:49 INFO mapreduce.Job: Job job_1523441540916_0002 completed successfully
    18/04/11 04:10:50 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=50
                    FILE: Number of bytes written=185961
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=150
                    HDFS: Number of bytes written=28
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=11652
                    Total time spent by all reduces in occupied slots (ms)=9304
                    Total time spent by all map tasks (ms)=11652
                    Total time spent by all reduce tasks (ms)=9304
                    Total vcore-seconds taken by all map tasks=11652
                    Total vcore-seconds taken by all reduce tasks=9304
                    Total megabyte-seconds taken by all map tasks=11931648
                    Total megabyte-seconds taken by all reduce tasks=9527296
            Map-Reduce Framework
                    Map input records=4
                    Map output records=7
                    Map output bytes=66
                    Map output materialized bytes=50
                    Input split bytes=111
                    Combine input records=7
                    Combine output records=4
                    Reduce input groups=4
                    Reduce shuffle bytes=50
                    Reduce input records=4
                    Reduce output records=4
                    Spilled Records=8
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=609
                    CPU time spent (ms)=3400
                    Physical memory (bytes) snapshot=218648576
                    Virtual memory (bytes) snapshot=725839872
                    Total committed heap usage (bytes)=137433088
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=39
            File Output Format Counter

     查看hdfs的/wordcount/output目录下的文件信息:

    [root@localhost mapreduce]# hadoop fs -ls /wordcount/output  查看目录信息
    Found 2 items
    -rw-r--r--   1 root supergroup          0 2018-04-11 04:10 /wordcount/output/_SUCCESS
    -rw-r--r--   1 root supergroup         28 2018-04-11 04:10 /wordcount/output/part-r-00000

     查看统计结果文件信息:

    [root@localhost mapreduce]# hadoop fs -cat /wordcount/output/part-r-00000
    hello   4
    kkk     1
    lll     1
    meinv   1

     也可以从web中下载查看:

  • 相关阅读:
    python定制类详解
    python格式化
    python3和2的区别
    深度优先和广度优先遍历
    python偏函数
    python匿名函数
    android 应用能够安装在什么地方
    C语言文件操作函数
    病毒木马查杀实战第026篇:“白加黑”恶意程序研究(上)
    函数指针
  • 原文地址:https://www.cnblogs.com/qlqwjy/p/8794995.html
Copyright © 2020-2023  润新知