• Hadoop JobHistory


    hadoop jobhistory记录下已运行完的MapReduce作业信息并存放在指定的HDFS目录下,默认情况下是没有启动的,需要配置完后手工启动服务。

    mapred-site.xml添加如下配置

    <property>
      <name>mapreduce.jobhistory.address</name>
      <value>hadoop000:10020</value>
      <description>MapReduce JobHistory Server IPC host:port</description>
    </property>
    
    <property>
      <name>mapreduce.jobhistory.webapp.address</name>
      <value>hadoop000:19888</value>
      <description>MapReduce JobHistory Server Web UI host:port</description>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.done-dir</name>
        <value>/history/done</value>
    </property>
    
    <property>
        <name>mapreduce.jobhistory.intermediate-done-dir</name>
        <value>/history/done_intermediate</value></property>

    启动history-server:

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh start historyserver

    停止history-server:

    $HADOOP_HOME/sbin/mr-jobhistory-daemon.sh stop historyserver

    history-server启动之后,可以通过浏览器访问WEBUI: hadoop000:19888

    在hdfs上会生成两个目录

    hadoop fs -ls /history
    drwxrwx
    --- - spark supergroup 0 2014-10-11 15:11 /history/done drwxrwxrwt - spark supergroup 0 2014-10-11 15:16 /history/done_intermediate

    mapreduce.jobhistory.done-dir(/history/done): Directory where history files are managed by the MR JobHistory Server(已完成作业信息)
    mapreduce.jobhistory.intermediate-done-dir(/history/done_intermediate): Directory where history files are written by MapReduce jobs.(正在运行作业信息)

    测试:

    通过hive查询city表观察hdfs文件目录和hadoop000:19888

    hive> select id, name from city;

    观察hdfs文件目录:

    1)历史作业记录是按照年/月/日的形式分别存放在相应的目录(/history/done/2014/10/11/000000);

    2)每个作业有2个不同的后缀名的记录:jhist和xml

    hadoop fs -ls /history/done/2014/10/11/000000
    -rwxrwx--- 1 spark supergroup 22572 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002-1413012208648-spark-select+id%2C+name+from+city%28Stage%2D1%29-1413012224777-1-0-SUCCEEDED-root.spark-1413012216261.jhist -rwxrwx--- 1 spark supergroup 160149 2014-10-11 15:23 /history/done/2014/10/11/000000/job_1413011730351_0002_conf.xml

    观察WEBUI: hadoop000:19888

    在WEBUI中展现了每个job使用的Map/Reduce的数量、作业提交时间、作业启动时间、作业完成时间、Job ID、提交人User、队列等信息;

    点击【job_1413011730351_0002】弹出页面显示类似信息:Aggregation is not enabled. Try the nodemanager at ......

    解决方法: yarn-site.xml添加如下配置

    <property>  
        <name>yarn.log-aggregation-enable</name>  
        <value>true</value>  
    </property> 

    重启yarn即可。

    参考CDH文档:http://archive.cloudera.com/cdh5/cdh/5/hadoop-2.3.0-cdh5.0.0/hadoop-project-dist/hadoop-common/ClusterSetup.html

  • 相关阅读:
    mysql 查询某年某月数据~(如果数据表用时间戳)
    mongo_4.25 find() hasNext() next()
    在YII框架中有2中方法创建对象:
    bootsrap[$data]
    date
    cookie
    JavaScript shell, 可以用到 JS 的特性, forEach
    在 Yii框架中使用session 的笔记:
    mysql查询今天、昨天、7天、近30天、本月、上一月 数据
    Python 自定义异常练习
  • 原文地址:https://www.cnblogs.com/luogankun/p/4019303.html
Copyright © 2020-2023  润新知