• oozie 入门


    转自:http://blackproof.iteye.com/blog/1928122

    oozie概述:oozie能干什么

    oozie格式:怎么用oozie

    oozie执行:怎么运行oozie 

    oozie概述:

    oozie是基于hadoop的调度器,以xml的形式写调度流程,可以调度mr,pig,hive,shell,jar等等。

    主要的功能有

    Workflow: 顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)

    Coordinator,定时触发workflow

    Bundle Job,绑定多个coordinator

    oozie格式:

    写一个oozie,有两个是必要的:job.properties 和 workflow.xml(coordinator.xml,bundle.xml)

    一、job.properties里定义环境变量

    nameNode hdfs://xxx5:8020 hdfs地址
    jobTracker xxx5:8034 jobTracker地址
    queueName default oozie队列
    examplesRoot examples 全局目录
    oozie.usr.system.libpath true 是否加载用户lib库
    oozie.libpath share/lib/user 用户lib库
    oozie.wf.appication.path ${nameNode}/user/${user.name}/... oozie流程所在hdfs地址

    注意:

    workflow:oozie.wf.application.path

    coordinator:oozie.coord.application.path

    bundle:oozie.bundle.application.path

    二、XML

     1.workflow:

    Xml代码  收藏代码
    1. <workflow-app xmlns="uri:oozie:workflow:0.2" name="wf-example1">  
    2.   <start to="pig-node">  
    3.   <action name="pig-node">  
    4.       <pig>  
    5.            <job-tracker>${jobTracker}</job-tracker>  
    6.            <name-node>${nameNode}</name-node>  
    7.            <prepare>  
    8.               <delete path="hdfs://xxx5/user/hadoop/appresult" />  
    9.            </prepare>  
    10.            <configuration>  
    11.              <property>  
    12.                   <name>mapred.job.queue.name</name>  
    13.                    <value>default</value>  
    14.               <property>  
    15.              <property>  
    16.                   <name>mapred.compress.map.output</name>  
    17.                    <value>true</value>  
    18.               <property>  
    19.              <property>  
    20.                   <name>mapreduce.fileoutputcommitter.marksuccessfuljobs</name>  
    21.                    <value>false</value>  
    22.               <property>  
    23.            </configuration>  
    24.             <script>test.pig</script>  
    25.             <param>filepath=${filpath}</param>  
    26.       </pig>  
    27.        <ok to="end">  
    28.         <error to="fail">  
    29.   </action>  
    30.   <kill name="fail">  
    31.        <message>  
    32.                    Map/Reduce failed, error               message[${wf:errorMessage(wf:lastErrorNode())}]  
    33.          </message>  
    34.     </kill>  
    35.    <end name="end"/>  
    36. </workflow-app>  

    2.coordinator

    Xml代码  收藏代码
    1. <coordinator-app name="cron-coord" frequence="${coord:hours(6)}" start="${start}" end="${end}"  
    2. timezoe="UTC" xmlns="uri:oozie:coordinator:0.2">  
    3. <action>  
    4. <workflow>  
    5.    <app-path>${nameNode}/user/{$coord:user()}/${examplesRoot}/wpath</app-path>  
    6.   <configuration>  
    7.            <property>  
    8.            <name>jobTracker</name>  
    9.            <value>${jobTracker}</value>  
    10.            </property>  
    11.            <property>  
    12.            <name>nameNode</name>  
    13.            <value>${nameNode}</value>  
    14.            </property>  
    15.            <property>  
    16.            <name>queueName</name>  
    17.            <value>${queueName}</value>  
    18.        </property>  
    19.   </configuration>  
    20. </workflow>  
    21. </action>  

     注意:coordinator设置的UTC,比北京时间晚8个小时,所以你要是把期望执行时间减8小时

    coordinator里面传值给workflow,example,时间设置为亚洲

    Java代码  收藏代码
    1. <coordinator-app name="gwk-hour-log-coord" frequency="${coord:hours(1)}" start="${hourStart}" end="${hourEnd}" timezone="Asia/Shanghai"  
    2.                  xmlns="uri:oozie:coordinator:0.2">  
    3.             
    4.     <action>  
    5.         <workflow>  
    6.             <app-path>${workflowHourLogAppUri}/gwk-workflow.xml</app-path>  
    7.             <configuration>  
    8.                 <property>  
    9.                     <name>yyyymmddhh</name>  
    10.                     <value>${coord:formatTime(coord:dateOffset(coord:nominalTime(),-1,'HOUR'), 'yyyyMMddHH')}</value>  
    11.                 </property>  
    12.             </configuration>  
    13.         </workflow>  
    14.   </action>         
    15. </coordinator-app>  

      

    3.bundle

    Java代码  收藏代码
    1. <bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'>   
    2.   <controls>  
    3.        <kick-off-time>${kickOffTime}</kick-off-time>  
    4.   </controls>  
    5.    <coordinator name='coordJobFromBundle1' >  
    6.        <app-path>${appPath}</app-path>  
    7.        <configuration>  
    8.          <property>  
    9.               <name>startTime1</name>  
    10.               <value>${START_TIME}</value>  
    11.           </property>  
    12.          <property>  
    13.               <name>endTime1</name>  
    14.               <value>${END_TIME}</value>  
    15.           </property>  
    16.       </configuration>  
    17.    </coordinator>  
    18.    <coordinator name='coordJobFromBundle2' >  
    19.        <app-path>${appPath2}</app-path>  
    20.        <configuration>  
    21.          <property>  
    22.               <name>startTime2</name>  
    23.               <value>${START_TIME2}</value>  
    24.           </property>  
    25.          <property>  
    26.               <name>endTime2</name>  
    27.               <value>${END_TIME2}</value>  
    28.           </property>  
    29.       </configuration>  
    30.    </coordinator>  
    31. </bundle-app>  

    oozie hive

    Java代码  收藏代码
    1. <action name="hive-app">  
    2.     <hive xmlns="uri:oozie:hive-action:0.2">  
    3.         <job-tracker>${jobTracker}</job-tracker>  
    4.         <name-node>${nameNode}</name-node>  
    5.         <job-xml>hive-site.xml</job-xml>  
    6.         <script>hivescript.q</script>  
    7.  <param>yyyymmdd=${yyyymmdd}</param>    
    8.  <param>yesterday=${yesterday}</param>  
    9.  <param>lastmonth=${lastmonth}</param>  
    10.     </hive>  
    11.     <ok to="result-stat-join"/>  
    12.     <error to="fail"/>  
    13. </action>  

      

    oozie运行

    启动任务:

    Java代码  收藏代码
    1. oozie job -oozie http://xxx5:11000/oozie -config job.properties -run  

    停止任务:

    oozie job -oozie http://localhost:8080/oozie -kill 14-20090525161321-oozie-joe

    注意:在停止任务的时候,有的时候会出现全线问题,需要修改oozie-site.xml文件

    hadoop.proxyuser.oozie.groups *

    hadoop.proxyuser.oozie.hosts *

    oozie.server.ProxyUserServer.proxyuser.hadoop.hosts *

    oozie.server.ProxyUserServer.proxyuser.hadoop.groups *

  • 相关阅读:
    HashMap源码分析
    ArrayList、LinkedList和Vector源码分析
    java序列化
    Python Web自动化测试入门与实战,从入门到入行
    Chrome 自带截图工具
    【转】chrome DEvTools 使用,进行定位元素
    偷懒大法好,用 selenium 做 web 端自动化测试
    Python代码覆盖率分析工具Coverage
    Jmeter分布式压力测试
    通过dockerfile制作镜像
  • 原文地址:https://www.cnblogs.com/cxzdy/p/5513320.html
Copyright © 2020-2023  润新知