• Hadoop基础-配置历史服务器


                    Hadoop基础-配置历史服务器

                                        作者:尹正杰

    版权声明:原创作品,谢绝转载!否则将追究法律责任。

       Hadoop自带了一个历史服务器,可以通过历史服务器查看已经运行完的Mapreduce作业记录,比如用了多少个Map、用了多少个Reduce、作业提交时间、作业启动时间、作业完成时间等信息。默认情况下,Hadoop历史服务器是没有启动的,我们可以通过Hadoop自带的命令(mr-jobhistory-daemon.sh)来启动Hadoop历史服务器。

    一.yarn上运行mr程序

    1>.启动集群

    [yinzhengjie@s101 ~]$ xcall.sh jps
    ============= s101 jps ============
    3043 ResourceManager
    2507 NameNode
    3389 Jps
    2814 DFSZKFailoverController
    命令执行成功
    ============= s102 jps ============
    2417 DataNode
    2484 JournalNode
    2664 NodeManager
    2828 Jps
    2335 QuorumPeerMain
    命令执行成功
    ============= s103 jps ============
    2421 DataNode
    2488 JournalNode
    2666 NodeManager
    2333 QuorumPeerMain
    2830 Jps
    命令执行成功
    ============= s104 jps ============
    2657 NodeManager
    2818 Jps
    2328 QuorumPeerMain
    2410 DataNode
    2477 JournalNode
    命令执行成功
    ============= s105 jps ============
    2688 Jps
    2355 NameNode
    2424 DFSZKFailoverController
    命令执行成功
    [yinzhengjie@s101 ~]$ 

    2>.在yarn上执行MapReduce程序

    [yinzhengjie@s101 ~]$ hadoop jar /soft/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /yinzhengjie/data/ /yinzhengjie/data/output
    18/08/21 07:37:35 INFO client.RMProxy: Connecting to ResourceManager at s101/172.30.1.101:8032
    18/08/21 07:37:37 INFO input.FileInputFormat: Total input paths to process : 1
    18/08/21 07:37:37 INFO mapreduce.JobSubmitter: number of splits:1
    18/08/21 07:37:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534851274873_0001
    18/08/21 07:37:37 INFO impl.YarnClientImpl: Submitted application application_1534851274873_0001
    18/08/21 07:37:37 INFO mapreduce.Job: The url to track the job: http://s101:8088/proxy/application_1534851274873_0001/
    18/08/21 07:37:37 INFO mapreduce.Job: Running job: job_1534851274873_0001
    18/08/21 07:37:55 INFO mapreduce.Job: Job job_1534851274873_0001 running in uber mode : false
    18/08/21 07:37:55 INFO mapreduce.Job:  map 0% reduce 0%
    18/08/21 07:38:13 INFO mapreduce.Job:  map 100% reduce 0%
    18/08/21 07:38:31 INFO mapreduce.Job:  map 100% reduce 100%
    18/08/21 07:38:32 INFO mapreduce.Job: Job job_1534851274873_0001 completed successfully
    18/08/21 07:38:32 INFO mapreduce.Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=4469
            FILE: Number of bytes written=249719
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=3925
            HDFS: Number of bytes written=3315
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=15295
            Total time spent by all reduces in occupied slots (ms)=15161
            Total time spent by all map tasks (ms)=15295
            Total time spent by all reduce tasks (ms)=15161
            Total vcore-milliseconds taken by all map tasks=15295
            Total vcore-milliseconds taken by all reduce tasks=15161
            Total megabyte-milliseconds taken by all map tasks=15662080
            Total megabyte-milliseconds taken by all reduce tasks=15524864
        Map-Reduce Framework
            Map input records=104
            Map output records=497
            Map output bytes=5733
            Map output materialized bytes=4469
            Input split bytes=108
            Combine input records=497
            Combine output records=288
            Reduce input groups=288
            Reduce shuffle bytes=4469
            Reduce input records=288
            Reduce output records=288
            Spilled Records=576
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=163
            CPU time spent (ms)=1430
            Physical memory (bytes) snapshot=439443456
            Virtual memory (bytes) snapshot=4216639488
            Total committed heap usage (bytes)=286785536
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=3817
        File Output Format Counters 
            Bytes Written=3315
    [yinzhengjie@s101 ~]$ 

    3>.通过webUI查看hdfs是否有数据产生

    4>.查看yarn的记录信息

    5>.查看历史日志,发现无法访问

    二.配置yarn历史服务器

    1>.修改“mapred-site.xml”配置文件

     1 [yinzhengjie@s101 ~]$ more /soft/hadoop/etc/hadoop/mapred-site.xml
     2 <?xml version="1.0"?>
     3 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     4 <configuration>
     5         <property>
     6                 <name>mapreduce.framework.name</name>
     7                 <value>yarn</value>
     8         </property>
     9 
    10     <property>
    11         <name>mapreduce.jobhistory.address</name>
    12         <value>s101:10020</value>
    13     </property>
    14     
    15     <property>
    16             <name>mapreduce.jobhistory.webapp.address</name>
    17             <value>s101:19888</value>
    18     </property>
    19 
    20 
    21     <property>
    22           <name>mapreduce.jobhistory.done-dir</name>
    23         <value>${yarn.app.mapreduce.am.staging-dir}/done</value>
    24     </property>
    25 
    26     <property>
    27             <name>mapreduce.jobhistory.intermediate-done-dir</name>
    28             <value>${yarn.app.mapreduce.am.staging-dir}/done_intermediate</value>
    29     </property>
    30 
    31     <property>
    32             <name>yarn.app.mapreduce.am.staging-dir</name>
    33             <value>/yinzhengjie/logs/hdfs/history</value>
    34     </property>
    35 
    36 </configuration>
    37 
    38 <!--
    39 mapred-site.xml 配置文件的作用:
    40         #HDFS的相关设定,如reduce任务的默认个数、任务所能够使用内存
    41 的默认上下限等,此中的参数定义会覆盖mapred-default.xml文件中的
    42 默认配置.
    43 
    44 mapreduce.framework.name 参数的作用:
    45         #指定MapReduce的计算框架,有三种可选,第一种:local(本地),第
    46 二种是classic(hadoop一代执行框架),第三种是yarn(二代执行框架),我
    47 们这里配置用目前版本最新的计算框架yarn即可。
    48 
    49 mapreduce.jobhistory.address 参数的作用:
    50     #指定job的历史服务器
    51 
    52 mapreduce.jobhistory.webapp.address 参数的作用:
    53     #指定日志服务器的web访问端口
    54 
    55 mapreduce.jobhistory.done-dir 参数的作用:
    56     #指定存放已经运行完的Hadoop作业记录
    57 
    58 mapreduce.jobhistory.intermediate-done-dir 参数的作用:
    59     #指定正在运行的Hadoop作业记录
    60 
    61 yarn.app.mapreduce.am.staging-dir 参数的作用:
    62     #指定applicationID以及需要的jar包文件等
    63 
    64 -->
    65 [yinzhengjie@s101 ~]$ 

    2>.启动历史服务器服务

    [yinzhengjie@s101 ~]$ hdfs dfs -mkdir /yinzhengjie/logs/hdfs/history      #创建存放历史日志的路径
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ mr-jobhistory-daemon.sh start historyserver      #启动历史服务
    starting historyserver, logging to /soft/hadoop-2.7.3/logs/mapred-yinzhengjie-historyserver-s101.out
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ jps
    3043 ResourceManager
    4009 JobHistoryServer        #注意,这个进程就是历史服务进程
    2507 NameNode
    4045 Jps
    2814 DFSZKFailoverController
    [yinzhengjie@s101 ~]$ 

    3>.在yarn上执行MapReduce程序

    [yinzhengjie@s101 ~]$ hdfs dfs -rm -R /yinzhengjie/data/output        #删除之前的输出路径
    18/08/21 08:43:34 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /yinzhengjie/data/output
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ hadoop jar /soft/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /yinzhengjie/data/input  /yinzhengjie/data/output
    18/08/21 08:44:58 INFO client.RMProxy: Connecting to ResourceManager at s101/172.30.1.101:8032
    18/08/21 08:44:58 INFO input.FileInputFormat: Total input paths to process : 1
    18/08/21 08:44:58 INFO mapreduce.JobSubmitter: number of splits:1
    18/08/21 08:44:58 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1534851274873_0002
    18/08/21 08:44:59 INFO impl.YarnClientImpl: Submitted application application_1534851274873_0002
    18/08/21 08:44:59 INFO mapreduce.Job: The url to track the job: http://s101:8088/proxy/application_1534851274873_0002/
    18/08/21 08:44:59 INFO mapreduce.Job: Running job: job_1534851274873_0002
    18/08/21 08:45:15 INFO mapreduce.Job: Job job_1534851274873_0002 running in uber mode : false
    18/08/21 08:45:15 INFO mapreduce.Job:  map 0% reduce 0%
    18/08/21 08:45:30 INFO mapreduce.Job:  map 100% reduce 0%
    18/08/21 08:45:45 INFO mapreduce.Job:  map 100% reduce 100%
    18/08/21 08:45:45 INFO mapreduce.Job: Job job_1534851274873_0002 completed successfully
    18/08/21 08:45:46 INFO mapreduce.Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=4469
            FILE: Number of bytes written=249693
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=3931
            HDFS: Number of bytes written=3315
            HDFS: Number of read operations=6
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=1
            Launched reduce tasks=1
            Data-local map tasks=1
            Total time spent by all maps in occupied slots (ms)=12763
            Total time spent by all reduces in occupied slots (ms)=12963
            Total time spent by all map tasks (ms)=12763
            Total time spent by all reduce tasks (ms)=12963
            Total vcore-milliseconds taken by all map tasks=12763
            Total vcore-milliseconds taken by all reduce tasks=12963
            Total megabyte-milliseconds taken by all map tasks=13069312
            Total megabyte-milliseconds taken by all reduce tasks=13274112
        Map-Reduce Framework
            Map input records=104
            Map output records=497
            Map output bytes=5733
            Map output materialized bytes=4469
            Input split bytes=114
            Combine input records=497
            Combine output records=288
            Reduce input groups=288
            Reduce shuffle bytes=4469
            Reduce input records=288
            Reduce output records=288
            Spilled Records=576
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=139
            CPU time spent (ms)=1610
            Physical memory (bytes) snapshot=439873536
            Virtual memory (bytes) snapshot=4216696832
            Total committed heap usage (bytes)=281018368
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=3817
        File Output Format Counters 
            Bytes Written=3315
    [yinzhengjie@s101 ~]$ 

    4>.通过webUI查看hdfs是否有数据产生

    5>.查看yarn的webUI的历史任务

    6>.查看历史记录

     

    7>.配置日志聚集功能

      详情请参考:https://www.cnblogs.com/yinzhengjie/p/9471921.html

  • 相关阅读:
    hdfs校验和
    hdfs读写策略
    hdfs架构
    hdfs数据块
    元数据
    集群的创建
    jQuery_DOM操作
    jQuery_简介_选择器
    Ajax
    MySQL整理_2_数据库操作
  • 原文地址:https://www.cnblogs.com/yinzhengjie/p/9466159.html
Copyright © 2020-2023  润新知