• hadoop集群环境搭建


    物理机环境

    192.168.1.200 puroc-centos
    192.168.1.201 puroc-centos2

    注:物理机内存至少2G

    部署规划:

    192.168.1.200上部署NameNode、SecondaryNameNode、NodeManager、ResourceManager、DataNode

    192.168.1.201上部署DataNode、NodeManager

    第一步:物理机配置

    配置SSH互信

    配置ssh互信后,主机之间ssh不需要输入用户名和密码,配置方式不在此介绍

    /etc/hosts

    #真实的集群环境请将127.0.0.1 localhost注释掉,伪集群环境需要加上

    #127.0.0.1   localhost

    192.168.1.200 puroc-centos

    192.168.1.201 puroc-centos2

    修改/etc/hosts之后需要重启服务器

    关闭防火墙,默认开机不启动防火墙

    service iptables stop
    
    chkconfig iptables off

    第二步:下载hadoop

    下载地址:http://hadoop.apache.org/releases.html

    我下载的版本是2.7.1,将hadoop-2.7.1.tar.gz上传至服务器

    第三步:配置hadoop

    在192.168.1.200(namenode)上进行配置,需要修改以下几个配置文件

    core-site.xml

    <configuration>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://puroc-centos:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>file:/root/pud/hadoop/hadoop-2.7.1/tmp</value>
        </property>
        <property>
            <name>io.file.buffer.size</name>
            <value>131702</value>
        </property>
    </configuration>

    hdfs-site.xml

    <configuration>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/root/pud/hadoop/hadoop-2.7.1/hdfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/root/pud/hadoop/hadoop-2.7.1/hdfs/data</value>
        </property>
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>puroc-centos:9001</value>
        </property>
        <property>
            <name>dfs.webhdfs.enabled</name>
            <value>true</value>
        </property>
    </configuration>

    mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>puroc-centos:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>puroc-centos:19888</value>
        </property>
    </configuration>

    yarn-site.xml

    
    
    <configuration>
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>puroc-centos:8032</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>puroc-centos:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>puroc-centos:8031</value>
        </property>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>puroc-centos:8033</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>puroc-centos:8088</value>
        </property>
        <property>
            <name>yarn.nodemanager.resource.memory-mb</name>
            <value>2000</value>
        </property>
    </configuration>
     

    hadoop-env.sh和yarn-env.sh

    修改JAVA_HOME

    slaves

    将所有datanode所在服务器的主机名添加到该文件,如下:

    localhost
    puroc-centos2

    /etc/hosts

    将两台服务器的IP和主机名添加到该文件

    第四步:拷贝hadoop至其他服务器

    将上面配置好的hadoop拷贝至其他服务器

    第五步:格式化Namenode

    在namenode服务器上执行

    bin/hdfs namenode -format

    第六步:启停hadoop

    在namenode上执行如下命令:

    #在namenode上执行该指令,会自动启动所有datanode
    sbin/start-all.sh
    #停止所有程序
    sbin/stop-all.sh
    
    #启动之后可以通过jps命令查看启动的java进程

    第七步:监控

    可以访问如下两个网址

    http://192.168.1.200:50070

    http://192.168.1.200:8088

    第八步:验证

    执行wordcount这个自带的mapreduce程序,来验证hadoop环境是否搭建成功

    #在hdfs上新建文件夹test
    hdfs dfs -mkdir /test
    #上传README.txt文件到/test目录下
    hdfs dfs -put README.txt /test
    #执行mapreduce程序
    bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test output
    
    正常运行时输出结果如下:
    [root@puroc-centos hadoop-2.7.1]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test output
    15/10/23 23:23:35 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    15/10/23 23:23:35 INFO client.RMProxy: Connecting to ResourceManager at puroc-centos/192.168.1.200:8032
    15/10/23 23:23:36 INFO input.FileInputFormat: Total input paths to process : 1
    15/10/23 23:23:36 INFO mapreduce.JobSubmitter: number of splits:1
    15/10/23 23:23:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445667722544_0001
    15/10/23 23:23:37 INFO impl.YarnClientImpl: Submitted application application_1445667722544_0001
    15/10/23 23:23:37 INFO mapreduce.Job: The url to track the job: http://puroc-centos:8088/proxy/application_1445667722544_0001/
    15/10/23 23:23:37 INFO mapreduce.Job: Running job: job_1445667722544_0001
    15/10/23 23:23:46 INFO mapreduce.Job: Job job_1445667722544_0001 running in uber mode : false
    15/10/23 23:23:46 INFO mapreduce.Job:  map 0% reduce 0%
    15/10/23 23:23:55 INFO mapreduce.Job:  map 100% reduce 0%
    15/10/23 23:24:02 INFO mapreduce.Job:  map 100% reduce 100%
    15/10/23 23:24:02 INFO mapreduce.Job: Job job_1445667722544_0001 completed successfully
    15/10/23 23:24:02 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=1836
                    FILE: Number of bytes written=234741
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=1471
                    HDFS: Number of bytes written=1306
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=5468
                    Total time spent by all reduces in occupied slots (ms)=4150
                    Total time spent by all map tasks (ms)=5468
                    Total time spent by all reduce tasks (ms)=4150
                    Total vcore-seconds taken by all map tasks=5468
                    Total vcore-seconds taken by all reduce tasks=4150
                    Total megabyte-seconds taken by all map tasks=5599232
                    Total megabyte-seconds taken by all reduce tasks=4249600
            Map-Reduce Framework
                    Map input records=31
                    Map output records=179
                    Map output bytes=2055
                    Map output materialized bytes=1836
                    Input split bytes=105
                    Combine input records=179
                    Combine output records=131
                    Reduce input groups=131
                    Reduce shuffle bytes=1836
                    Reduce input records=131
                    Reduce output records=131
                    Spilled Records=262
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=139
                    CPU time spent (ms)=1230
                    Physical memory (bytes) snapshot=304541696
                    Virtual memory (bytes) snapshot=4119244800
                    Total committed heap usage (bytes)=182194176
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=1366
            File Output Format Counters 
                    Bytes Written=1306

    在执行这个mapreduce程序时,可能会遇到如下的问题:

    1、出现连接异常

    请查看程序是否全部正常启动,防火墙是否已经关闭

    2、日志始终停止在map 0% reduce 0%

    请查看程序是否全部正常启动,/etc/hosts是否配置正确

    3、出现其他异常

    请查看程序是否全部正常启动,查看各网元的日志,看是否有异常或错误信息

     

  • 相关阅读:
    Android开发切换host应用
    HTTP缓存相关头
    我理解的Android加载器
    Mysql的NULL的一个注意点
    Android的Activity生命周期
    说说jsonp
    PHP的pcntl多进程
    谈谈不换行空格
    关于Java代码优化的44条建议!
    java8 遍历数组的几种方式
  • 原文地址:https://www.cnblogs.com/puroc/p/4906231.html
Copyright © 2020-2023  润新知