• hadoop 集群搭建 基础配置测试


    1、首先准备4台能上网的机器,主机名叫shizhan01,shizhan02,shizhan03,shizhan04

    2、修改主机名和IP的映射关系 vim /etc/hosts 如下

    每一台机器都要配置

    192.168.137.200 shizhan01
    192.168.137.201 shizhan02
    192.168.137.202 shizhan03
    192.168.137.203 shizhan04

    3、关闭防火墙

    #查看防火墙状态
    service iptables status
    #关闭防火墙
    service iptables stop
    #查看防火墙开机启动状态
    chkconfig iptables --list
    #关闭防火墙开机启动
    chkconfig iptables off

    4、hadoop用户获取root权限

    vim /etc/sudoers 加入下面一行红色标记

    root ALL=(ALL) ALL
    hadoop ALL=(ALL) ALL

    5、重启Linux

    reboot

    6、安装JDK

    2.1上传alt+p 后出现sftp窗口,然后put d:xxxyylljdk-7u_65-i585.tar.gz(windows的路径)

    解压jdk
    #创建文件夹
    mkdir /home/hadoop/app
    #解压
    tar -zxvf jdk-7u55-linux-i586.tar.gz -C /home/hadoop/app

    7、将java添加到环境变量中

    vim /etc/profile
    #在文件最后添加
    export JAVA_HOME=/home/hadoop/app/jdk-7u_65-i585
    export PATH=$PATH:$JAVA_HOME/bin

    重新加载环境变量配置
    source /etc/profile

    8、安装hadoop2.6.x

    先上传hadoop的安装包到服务器上去/home/hadoop/
    注意:hadoop2.x的配置文件$HADOOP_HOME/etc/hadoop

    分布式需要修改5个配置文件
    8.1配置hadoop
    第一个:hadoop-env.sh
    vim hadoop-env.sh
    #第27行
    export JAVA_HOME=/home/hadoop/app/jdk1.7.0_25

    第二个:core-site.xml

    <!-- 指定HADOOP所使用的文件系统schema(URI),HDFS的老大(NameNode)的地址 -->
    <property>
    <name>fs.defaultFS</name>
    <value>hdfs://shizhan01:9000</value>
    </property>
    <!-- 指定hadoop运行时产生文件的存储目录 -->
    <property>
    <name>hadoop.tmp.dir</name>
    <value>/home/hadoop/hdpdata</value></property>

    第三个:hdfs-site.xml
    <!-- 指定HDFS副本的数量 -->
    <property>
    <name>dfs.replication</name>
    <value>2</value>
    </property>

    <!-- secondary启动后可以访问的界面,地址:http://shizhan02:50090  -->
    <property>
    <name>dfs.secondary.http.address</name>
    <value>shizhan02:50090</value>
    </property>




    第四个:mapred-site.xml 
    mv mapred-site.xml.template mapred-site.xml
    vim mapred-site.xml
    <!-- 指定mapreduce运行在yarn上 -->
    <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
    </property>

    第五个:yarn-site.xml
    <!-- 指定YARN的老大(ResourceManager)的地址 -->
    <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>shizhan01</value>
    </property>
    <property>

    <!-- reducer获取数据的方式 ,使用到的服务-->
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>

    8.2将hadoop添加到环境变量

    vim /etc/proflie
    export JAVA_HOME=/home/hadoop/app/jdk1.7.0_25
    export HADOOP_HOME=/home/hadoop/app/hadoop-2.6.4
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin

    source /etc/profile

    8.3格式化namenode(是对namenode进行初始化)
    hdfs namenode -format 

    8.4启动hadoop
    HDFS
    start-dfs.sh

    再启动YARN,这步自己做实验时忘记了,下面有讲
    start-yarn.sh



    8.5验证是否启动成功
    使用jps命令验证

    4809 ResourceManager
    4670 SecondaryNameNode
    4487 NameNode
    7075 Jps

    3542 Jps
    2779 NodeManager  如果没有启动yarn这个是没有的
    2665 DataNode

     

    http://shizhan01:50070 (HDFS管理界面)

    9、运行wordcount

    使用hadoop fs -mkdir /wordcount/input在hdfs下创建一个目录

    运行wordcount ,cd 到目录/home/hadoop/app/hadoop-2.6.4/share/hadoop/mapreduce

    运行命令:hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /wordcount/input /wordcount/output

    报如下错:测试

    18/07/22 01:00:46 INFO client.RMProxy: Connecting to ResourceManager at shizhan01/192.168.137.200:8032
    18/07/22 01:00:47 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    18/07/22 01:00:48 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    18/07/22 01:00:50 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    18/07/22 01:00:51 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    18/07/22 01:00:52 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
    18/07/22 01:00:53 INFO ipc.Client: Retrying connect to server: shizhan01/192.168.137.200:8032. Already tried 5 time(s); retry

    原因:yarn没有运行,使用start-yarn.sh,这个问题得到解决但是又报如下错

    192694875_0001 failed 2 times due to Error launching appattempt_1532192694875_0001_000002. Got exception: org.apache.hadoop.yarn.exceptions.YarnException: Unauthorized request to start container.
    This token is expired. current time is 1532352626322 found 1532193319198
    Note: System times on machines may be out of sync. Check system time and time zones.
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
    at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.instantiateException(SerializedExceptionPBImpl.java:168)
    at org.apache.hadoop.yarn.api.records.impl.pb.SerializedExceptionPBImpl.deSerialize(SerializedExceptionPBImpl.java:106)
    at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:123)
    at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:251)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:724)

    原因:集群的时间没有同步,也就是hadoop的namenode,datanode时间不一致出的错

    解决办法

        多个datanode与namenode进行时间同步,在每台服务器执行如下两个命令进行同步

    这里面用的是亚洲上海

        1)“cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime”

        2)“ntpdate pool.ntp.org”

    再运行

    运行命令:hadoop jar hadoop-mapreduce-examples-2.6.4.jar wordcount /wordcount/input /wordcount/output

    成功如下:

    18/07/28 14:52:26 INFO client.RMProxy: Connecting to ResourceManager at shizhan01/192.168.137.200:8032
    18/07/28 14:52:27 INFO input.FileInputFormat: Total input paths to process : 2
    18/07/28 14:52:27 INFO mapreduce.JobSubmitter: number of splits:2
    18/07/28 14:52:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532760644619_0001
    18/07/28 14:52:28 INFO impl.YarnClientImpl: Submitted application application_1532760644619_0001
    18/07/28 14:52:28 INFO mapreduce.Job: The url to track the job: http://shizhan01:8088/proxy/application_1532760644619_0001/
    18/07/28 14:52:28 INFO mapreduce.Job: Running job: job_1532760644619_0001
    18/07/28 14:52:36 INFO mapreduce.Job: Job job_1532760644619_0001 running in uber mode : false
    18/07/28 14:52:36 INFO mapreduce.Job: map 0% reduce 0%
    18/07/28 14:52:54 INFO mapreduce.Job: map 50% reduce 0%
    18/07/28 14:52:57 INFO mapreduce.Job: map 100% reduce 0%
    18/07/28 14:53:03 INFO mapreduce.Job: map 100% reduce 100%
    18/07/28 14:53:03 INFO mapreduce.Job: Job job_1532760644619_0001 completed successfully
    18/07/28 14:53:03 INFO mapreduce.Job: Counters: 49

  • 相关阅读:
    GridView使用技巧
    ilspy反编译
    Editplus php
    SQL 日期相减(间隔)datediff函数
    cmd创建文件命令
    iis7 bug解决
    删除qq互联
    discuz 数据库文件密码修改
    linux zip命令
    asp.net调用js方法
  • 原文地址:https://www.cnblogs.com/wuyl/p/9382208.html
Copyright © 2020-2023  润新知