• oozie说明(本文参考多处,自己留看)


    Oozie概述:

      Oozie是一个基于Hadoop工作流引擎,也可以称为调度器,它以xml的形式写调度流程,可以调度mr,pig,hive,shell,jar,spark等等。在实际工作中,遇到对数据进行一连串的操作的时候很实用,不需要自己写一些处理代码了,只需要定义好各个action,然后把他们串在一个工作流里面就可以自动执行了。对于大数据的分析工作非常有用. (以下介绍内容基于Oozie 4.1.0版本)

     Oozie有几个主要概念:

      workflow :工作流 ,顺序执行流程节点,支持fork(分支多个节点),join(合并多个节点为一个)。

      coordinator :多个workflow可以组成一个coordinator,可以把前几个workflow的输出作为后一个workflow的输入,也可以定义workflow的触发条件,来做定时触发。

      bundle: 是对一堆coordinator的抽象, 可绑定多个coordinator。

      job.properties:定义环境变量。

    oozie安装 略

    生命周期:

    在Oozie中,工作流的状态可能存在如下几种:

    状态

    含义说明

    PREP

    一个工作流Job第一次创建将处于PREP状态,表示工作流Job已经定义,但是没有运行。

    RUNNING

    当一个已经被创建的工作流Job开始执行的时候,就处于RUNNING状态。它不会达到结束状态,只能因为出错而结束,或者被挂起。

    SUSPENDED

    一个RUNNING状态的工作流Job会变成SUSPENDED状态,而且它会一直处于该状态,除非这个工作流Job被重新开始执行或者被杀死。

    SUCCEEDED

    当一个RUNNING状态的工作流Job到达了end节点,它就变成了SUCCEEDED最终完成状态。

    KILLED

    当一个工作流Job处于被创建后的状态,或者处于RUNNING、SUSPENDED状态时,被杀死,则工作流Job的状态变为KILLED状态。

    FAILED

    当一个工作流Job不可预期的错误失败而终止,就会变成FAILED状态。

    上述各种状态存在相应的转移(工作流程因为某些事件,可能从一个状态跳转到另一个状态),其中合法的状态转移有如下几种,如下表所示:

    转移前状态

    转移后状态集合

    未启动

    PREP

    PREP

    RUNNING、KILLED

    RUNNING

    SUSPENDED、SUCCEEDED、KILLED、FAILED

    SUSPENDED

    RUNNING、KILLED

    明确上述给出的状态转移空间以后,可以根据实际需要更加灵活地来控制工作流Job的运行。

     

    oozie格式:

    1.workflow:

     Oozie定义了一种基于XML的hPDL (Hadoop Process Definition Language)来描述workflow的DAG。在workflow中定义了控制流节点(Control Flow Nodes)、动作节点(Action Nodes)

    其中,控制流节点定义了流程的开始和结束(start、end),以及控制流程的执行路径(Execution Path),如decision、fork、join等;而动作节点包括Hadoop任务、SSH、HTTP、eMail和Oozie子流程等。

    Action Node定义了基本的工作任务节点。

    语法:

     

    <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    
      ...
    
        <start to="[NODE-NAME]"/>
    
       <action name="[NODE-NAME]">
    
              ....
         <ok to="[NODE-NAME]"/>
    
            <error to="[NODE-NAME]"/>
    
        </action> 
       <kill name="[NODE-NAME]"> <message>[MESSAGE-TO-LOG]</message> </kill>  
    
       <end name="[NODE-NAME]"/>
    
    </workflow-app>

    1.1 Map-Reduce Action

    map-reduce动作会在工作流Job中启动一个MapReduce Job任务运行,我们可以详细配置这个MapReduce Job。另外,可以通过map-reduce元素的子元素来配置一些其他的任务,如streaming、pipes、file、archive等等。

    语法:

    <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    
        ...
    
        <action name="[NODE-NAME]">
    
            <map-reduce>
    
                <job-tracker>[JOB-TRACKER]</job-tracker>
    
                <name-node>[NAME-NODE]</name-node>
    
                <prepare>
    
                    <delete path="[PATH]"/>
    
                    ...
    
                    <mkdir path="[PATH]"/>
    
                    ...
    
                </prepare>
    
                <streaming>
    
                    <mapper>[MAPPER-PROCESS]</mapper>
    
                    <reducer>[REDUCER-PROCESS]</reducer>
    
                    <record-reader>[RECORD-READER-CLASS]</record-reader>
    
                    <record-reader-mapping>[NAME=VALUE]</record-reader-mapping>
    
                    ...
    
                    <env>[NAME=VALUE]</env>
    
                    ...
    
                </streaming>
    
                                         <!-- Either streaming or pipes can be specified for an action, not both -->
    
                <pipes>
    
                    <map>[MAPPER]</map>
    
                    <reduce>[REDUCER]</reducer>
    
                    <inputformat>[INPUTFORMAT]</inputformat>
    
                    <partitioner>[PARTITIONER]</partitioner>
    
                    <writer>[OUTPUTFORMAT]</writer>
    
                    <program>[EXECUTABLE]</program>
    
                </pipes>
    
                <job-xml>[JOB-XML-FILE]</job-xml>
    
                <configuration>
    
                    <property>
    
                        <name>[PROPERTY-NAME]</name>
    
                        <value>[PROPERTY-VALUE]</value>
    
                    </property>
    
                    ...
    
                </configuration>
    
                <file>[FILE-PATH]</file>
    
                ...
    
                <archive>[FILE-PATH]</archive>
    
                ...
    
            </map-reduce>        <ok to="[NODE-NAME]"/>
    
            <error to="[NODE-NAME]"/>
    
        </action>
    
        ...
    
    </workflow-app>
    
    官网给出的例子:
    
    <workflow-app name="foo-wf" xmlns="uri:oozie:workflow:0.1">
    
        ...
    
        <action name="myfirstHadoopJob">
    
            <map-reduce>
    
                <job-tracker>foo:8021</job-tracker>
    
                <name-node>bar:8020</name-node>
    
                <prepare>
    
                    <delete path="hdfs://foo:8020/usr/tucu/output-data"/>
    
                </prepare>
    
                <job-xml>/myfirstjob.xml</job-xml>
    
                <configuration>
    
                    <property>
    
                        <name>mapred.input.dir</name>
    
                        <value>/usr/tucu/input-data</value>
    
                    </property>
    
                    <property>
    
                        <name>mapred.output.dir</name>
    
                        <value>/usr/tucu/input-data</value>
    
                    </property>
    
                    <property>
    
                        <name>mapred.reduce.tasks</name>
    
                        <value>${firstJobReducers}</value>
    
                    </property>
    
                    <property>
    
                        <name>oozie.action.external.stats.write</name>
    
                        <value>true</value>
    
                    </property>
    
                </configuration>
    
            </map-reduce>
    
            <ok to="myNextAction"/>
    
            <error to="errorCleanup"/>
    
        </action>
    
        ...
    
    </workflow-app>

    1.2 Ssh Action

    该动作主要是通过ssh登录到一台主机,能够执行一组shell命令.

    注意: SSH actions在 Oozie schema 0.1中使用, 在Oozie schema 0.2已被删除.

           ssh action将一个shell命令作为一个远程安全的shell在远程主机后台启动. 工作流工作将等到远程shell命令完成后再继续下一个动作。shell命令必须存在于远程计算机中,必须通过命令路径执行它。

    语法:

    <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">
    
        ...
    
        <action name="[NODE-NAME]">
    
            <ssh>
    
                <host>[USER]@[HOST]</host>
    
                <command>[SHELL]</command>
    
                <args>[ARGUMENTS]</args>
    
                ...
    
                <capture-output/>
    
            </ssh>
    
            <ok to="[NODE-NAME]"/>
    
            <error to="[NODE-NAME]"/>
    
        </action>
    
        ...
    
    </workflow-app>

    官网给出的例子:

    <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">
    
        ...
    
        <action name="myssjob">
    
            <ssh>
    
                <host>foo@bar.com<host>
    
                <command>uploaddata</command>
    
                <args>jdbc:derby://bar.com:1527/myDB</args>
    
                <args>hdfs://foobar.com:8020/usr/tucu/myData</args>
    
            </ssh>
    
            <ok to="myotherjob"/>
    
            <error to="errorcleanup"/>
    
        </action>
    
        ...
    
    </workflow-app>

     

    1.3 Java Action

    Oozie支持Java action ,Java action 会自动执行workflow任务中制定的java类中的 public static void main(String[] args)方法,会在hadoop集群上以单mapper task的形式执行一个map-reduce job.

    workflow任务会等待当前java程序执行完继续执行下一个action,这意味着我们可以写多个action以此来调用多个类.  当java类正确执行退出后,将会进入ok控制流;当发生异常时,将会进入error控制流。

    语法:

    <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1">

        ...

        <action name="[NODE-NAME]">

            <java>

                <job-tracker>[JOB-TRACKER]</job-tracker>

                <name-node>[NAME-NODE]</name-node>

                <prepare>

                   <delete path="[PATH]"/>

                   ...

                   <mkdir path="[PATH]"/>

                   ...

                </prepare>

                <job-xml>[JOB-XML]</job-xml>

                <configuration>

                    <property>

                        <name>[PROPERTY-NAME]</name>

                        <value>[PROPERTY-VALUE]</value>

                    </property>

                    ...

                </configuration>

                <main-class>[MAIN-CLASS]</main-class>

                                         <java-opts>[JAVA-STARTUP-OPTS]</java-opts>

                                         <arg>ARGUMENT</arg>

                ...

                <file>[FILE-PATH]</file>

                ...

                <archive>[FILE-PATH]</archive>

                ...

                <capture-output />

            </java>

            <ok to="[NODE-NAME]"/>

            <error to="[NODE-NAME]"/>

        </action>

        ...

    </workflow-app>

    官网给出的例子:

    <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">

        ...

        <action name="myfirstjavajob">

            <java>

                <job-tracker>foo:8021</job-tracker>

                <name-node>bar:8020</name-node>

                <prepare>

                    <delete path="${jobOutput}"/>

                </prepare>

                <configuration>

                    <property>

                        <name>mapred.queue.name</name>

                        <value>default</value>

                    </property>

                </configuration>

                <main-class>org.apache.oozie.MyFirstMainClass</main-class>

                <java-opts>-Dblah</java-opts>

                                         <arg>argument1</arg>

                                         <arg>argument2</arg>

            </java>

            <ok to="myotherjob"/>

            <error to="errorcleanup"/>

        </action>

        ...

    </workflow-app>

    1.4 shell action

    Shell动作可以执行Shell命令,并通过配置命令所需要的参数。

    语法:

    <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.4">

     

     ...

     

     <action name="[NODE-NAME]">

     

         <shell xmlns="uri:oozie:shell-action:0.2">

     

             <job-tracker>[JOB-TRACKER]</job-tracker>

     

             <name-node>[NAME-NODE]</name-node>

     

             <prepare>

     

                 <delete path="[PATH]" />

     

                 ...

     

                 <mkdir path="[PATH]" />

     

                 ...

     

             </prepare>

     

             <configuration>

     

                 <property>

     

                     <name>[PROPERTY-NAME]</name>

     

                     <value>[PROPERTY-VALUE]</value>

     

                 </property>

     

                 ...

     

             </configuration>

     

             <exec>[SHELL-COMMAND]</exec>

     

             <argument>[ARGUMENT-VALUE]</argument>

     

             <capture-output />

     

         </shell>

     

         <ok to="[NODE-NAME]" />

     

         <error to="[NODE-NAME]" />

     

    </action>

     ...

     

    </workflow-app>

    1.5 Spark action

        Oozie支持Spark action,不过支持的不是特别好。提交spark任务时,需要加载spark-assembly jar。

    语法:

    <workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.3">

        ...

        <action name="[NODE-NAME]">

            <spark xmlns="uri:oozie:spark-action:0.1">

                <job-tracker>[JOB-TRACKER]</job-tracker>

                <name-node>[NAME-NODE]</name-node>

                <prepare>

                   <delete path="[PATH]"/>

                   ...

                   <mkdir path="[PATH]"/>

                   ...

                </prepare>

                <job-xml>[SPARK SETTINGS FILE]</job-xml>

                <configuration>

                    <property>

                        <name>[PROPERTY-NAME]</name>

                        <value>[PROPERTY-VALUE]</value>

                    </property>

                    ...

                </configuration>

                <master>[SPARK MASTER URL]</master>

                <mode>[SPARK MODE]</mode>

                <name>[SPARK JOB NAME]</name>

                <class>[SPARK MAIN CLASS]</class>

                <jar>[SPARK DEPENDENCIES JAR / PYTHON FILE]</jar>

                <spark-opts>[SPARK-OPTIONS]</spark-opts>

                <arg>[ARG-VALUE]</arg>

                    ...

                <arg>[ARG-VALUE]</arg>

                ...

            </spark>

            <ok to="[NODE-NAME]"/>

            <error to="[NODE-NAME]"/>

        </action>

        ...

    </workflow-app>

    官网给的例子:

    <workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1">

        ...

        <action name="myfirstsparkjob">

            <spark xmlns="uri:oozie:spark-action:0.1">

                <job-tracker>foo:8021</job-tracker>

                <name-node>bar:8020</name-node>

                <prepare>

                    <delete path="${jobOutput}"/>

                </prepare>

                <configuration>

                    <property>

                        <name>mapred.compress.map.output</name>

                        <value>true</value>

                    </property>

                </configuration>

                <master>local[*]</master>

                <mode>client</mode>

                <name>Spark Example</name>

                <class>org.apache.spark.examples.mllib.JavaALS</class>

                <jar>/lib/spark-examples_2.10-1.1.0.jar</jar>

                <spark-opts>--executor-memory 20G --num-executors 50

                 --conf spark.executor.extraJavaOptions="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp"</spark-opts>

                <arg>inputpath=hdfs://localhost/input/file.txt</arg>

                <arg>value=2</arg>

            </spark>

            <ok to="myotherjob"/>

            <error to="errorcleanup"/>

        </action>

        ...

    </workflow-app>

    2.coordinator.xml

    语法:

     

    <coordinator-app name="[NAME]" frequency="[FREQUENCY]"

                        start="[DATETIME]" end="[DATETIME]" timezone="[TIMEZONE]"

                        xmlns="uri:oozie:coordinator:0.1">   

    #frequency:执行频率,小于五分钟要修改配置 start,end:开始与结束时间,若想跟北京时间一样也要修改配置文件,并修改时间格式

     

          <controls>

            <timeout>[TIME_PERIOD]</timeout>

            <concurrency>[CONCURRENCY]</concurrency>

            <execution>[EXECUTION_STRATEGY]</execution>

          </controls>

    .

          <datasets>    

            <include>[SHARED_DATASETS]</include>

            ...

    .

            <!-- Synchronous datasets --> #---数据生成目录

            <dataset name="[NAME]" frequency="[FREQUENCY]"

                     initial-instance="[DATETIME]" timezone="[TIMEZONE]">

              <uri-template>[URI_TEMPLATE]</uri-template>

            </dataset>

            ...

    .

          </datasets>

    .

          <input-events>    #----定义了数据触发条件

            <data-in name="[NAME]" dataset="[DATASET]">

              <instance>[INSTANCE]</instance>

              ...

            </data-in>

            ...

            <data-in name="[NAME]" dataset="[DATASET]">

              <start-instance>[INSTANCE]</start-instance>

              <end-instance>[INSTANCE]</end-instance>

            </data-in>

            ...

          </input-events>

          <output-events>

             <data-out name="[NAME]" dataset="[DATASET]">

               <instance>[INSTANCE]</instance>

             </data-out>

             ...

          </output-events>

          <action>

            <workflow>

              <app-path>[WF-APPLICATION-PATH]</app-path>    #---workflow.xml所在hdfs目录

              <configuration>

                <property>    #----定义传给workflow的参数

                  <name>[PROPERTY-NAME]</name>

                  <value>[PROPERTY-VALUE]</value>

                </property>

                ...

             </configuration>

           </workflow>

          </action>

       </coordinator-app>

     

    官网给出的例子:

     

    <coordinator-app name="hello-coord" frequency="${coord:days(1)}"

                        start="2009-01-02T08:00Z" end="2009-01-02T08:00Z"

                        timezone="America/Los_Angeles"

                        xmlns="uri:oozie:coordinator:0.1">

          <datasets>

            <dataset name="logs" frequency="${coord:days(1)}"

                     initial-instance="2009-01-02T08:00Z" timezone="America/Los_Angeles">

              <uri-template>hdfs://bar:8020/app/logs/${YEAR}${MONTH}/${DAY}/data</uri-template>

            </dataset>

            <dataset name="siteAccessStats" frequency="${coord:days(1)}"

                     initial-instance="2009-01-02T08:00Z" timezone="America/Los_Angeles">

              <uri-template>hdfs://bar:8020/app/stats/${YEAR}/${MONTH}/${DAY}/data</uri-template>

            </dataset>

          </datasets>

          <input-events>    

            <data-in name="input" dataset="logs">

              <instance>2009-01-02T08:00Z</instance>

            </data-in>

          </input-events>

          <output-events>

             <data-out name="output" dataset="siteAccessStats">

               <instance>2009-01-02T08:00Z</instance>

             </data-out>

          </output-events>

          <action>

            <workflow>

              <app-path>hdfs://bar:8020/usr/joe/logsprocessor-wf</app-path>   

              <configuration>

                <property>   

                  <name>wfInput</name>

                  <value>${coord:dataIn('input')}</value>

                </property>

                <property>

                  <name>wfOutput</name>

                  <value>${coord:dataOut('output')}</value>

                </property>

             </configuration>

           </workflow>

          </action>

       </coordinator-app>

     

     

    3.bundle.xml

     

    语法:

     

    <bundle-app name=[NAME]  xmlns='uri:oozie:bundle:0.1'>

      <controls>

           <kick-off-time>[DATETIME]</kick-off-time>    #运行时间

      </controls>

       <coordinator name=[NAME] >

           <app-path>[COORD-APPLICATION-PATH]</app-path> # coordinator.xml所在目录

              <configuration>                 #传给coordinator应用的参数

                <property>

                  <name>[PROPERTY-NAME]</name>  

                  <value>[PROPERTY-VALUE]</value>

                </property>

                ...

             </configuration>

       </coordinator>

       ...

    </bundle-app> 

     

    官网给出的例子(绑定两个coordinator):

     

    <bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance' xmlns='uri:oozie:bundle:0.1'>

      <controls>

           <kick-off-time>${kickOffTime}</kick-off-time>

      </controls>

       <coordinator name='coordJobFromBundle1' >

           <app-path>${appPath}</app-path>

           <configuration>

             <property>

                  <name>startTime1</name>

                  <value>${START_TIME}</value>

              </property>

             <property>

                  <name>endTime1</name>

                  <value>${END_TIME}</value>

              </property>

          </configuration>

       </coordinator>

       <coordinator name='coordJobFromBundle2' >

           <app-path>${appPath2}</app-path>

           <configuration>

             <property>

                  <name>startTime2</name>

                  <value>${START_TIME2}</value>

              </property>

             <property>

                  <name>endTime2</name>

                  <value>${END_TIME2}</value>

              </property>

          </configuration>

       </coordinator>

    </bundle-app>

     

    4,.job.properties:

     

    nameNode               hdfs://xxx:8020    hdfs地址

    jobTracker             xxx5:8034          jobTracker 地址

    queueName              default            oozie队列

    examplesRoot            examples           全局目录

    oozie.usr.system.libpath    true           是否加载用户lib库

    oozie.libpath            share/lib/user    用户lib库

    oozie.wf.appication.path   ${nameNode}/user/${user.name}/... oozie流程所在hdfs地址

     

    workflow:oozie.wf.application.path

    coordinator:oozie.coord.application.path

    bundle:oozie.bundle.application.path

    Oozie使用:

    写一个oozie,有两个是必要的:job.properties 和 workflow.xml(coordinator.xml,bundle.xml)

    如果想让任务可以定时自动运行,那么需要写coordinator.xml。

    如果想绑定多个coordinator.xml,那么需要写bundle.xml。

    Oozie实例:

    我们工作时的(简略版)实例:(本次以spark action为例)

    bundle.xml:

     

    <bundle-app name='APPNAME' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'

    xmlns='uri:oozie:bundle:0.2'>

        <coordinator name='coordJobFromBundle1' >

           <app-path>${appPath}</app-path>  

       </coordinator>

       <coordinator name='coordJobFromBundle2' >

           <app-path>${appPath2}</app-path>

       </coordinator>

     

    </bundle-app>

     

    coordinator.xml:

     

    <coordinator-app name="cron-coord" frequency="${coord:minutes(6)}" start="${start}"

    end="${end}" timezone="Asia/Shanghai" xmlns="uri:oozie:coordinator:0.2">

        <action>

            <workflow>

                <app-path>${workflowAppUri}</app-path>

                <configuration>

                    <property>

                        <name>jobTracker</name>

                        <value>${jobTracker}</value>

                    </property>

                    <property>

                        <name>nameNode</name>

                        <value>${nameNode}</value>

                    </property>

                    <property>

                        <name>queueName</name>

                        <value>${queueName}</value>

                    </property>

                    <property>

                        <name>mainClass</name>

                        <value>com.ocn.itv.rinse.ErrorCollectRinse</value>

                    </property>

                    <property>

                        <name>mainClass2</name>

                        <value>com.ocn.itv.rinse.UserCollectRinse</value>

                    </property>

                    <property>

                        <name>jarName</name>

                        <value>ocn-itv-spark-3.0.3-rc1.jar</value>

                    </property>

                </configuration>

            </workflow>

        </action>

    </coordinator-app>

     

    workflow.xml:

     

    <workflow-app  name="spark-example1" xmlns="uri:oozie:workflow:0.5"> 

        <start to="forking"/>

        <fork name="forking">

            <path start="firstparalleljob"/>

            <path start="secondparalleljob"/>

        </fork>   

        <action name="firstparalleljob">

            <spark xmlns="uri:oozie:spark-action:0.2"> 

                <job-tracker>${jobTracker}</job-tracker> 

                <name-node>${nameNode}</name-node>

                <configuration> 

                    <property> 

                        <name>mapred.job.queue.name</name> 

                        <value>${queueName}</value> 

                    </property>                 

                </configuration>           

                <master>yarn-cluster</master>

                <mode>cluster</mode>

                <name>Spark Example</name>

                <class>${mainClass}</class>           

                <jar>${jarName}</jar>

                <spark-opts>${sparkopts}</spark-opts>

                <arg>${input}</arg>           

            </spark >  

            <ok to="joining"/>

            <error to="fail"/>   

        </action>

        <action name="secondparalleljob">

             <spark xmlns="uri:oozie:spark-action:0.2"> 

                <job-tracker>${jobTracker}</job-tracker> 

                <name-node>${nameNode}</name-node>

                <configuration> 

                    <property> 

                        <name>mapred.job.queue.name</name> 

                        <value>${queueName}</value> 

                    </property>                 

                </configuration>           

                <master>yarn-cluster</master>

                <mode>cluster</mode>

                <name>Spark Example2</name>

                <class>${mainClass2}</class>           

                <jar>${jarName}</jar>

                <spark-opts>${sparkopts}</spark-opts>

                <arg>${input}</arg>           

            </spark > 

            <ok to="joining"/>

            <error to="fail"/>   

        </action>  

        <join name="joining" to="end"/>

          <kill name="fail"> 

           <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 

        </kill> 

       <end name="end"/> 

    </workflow-app>

     

    job.properties

     

    nameNode=hdfs://hgdp-001:8020     #hsfs端口地址

    jobTracker=hgdp-001:8032        #resourceManager的端口

    queueName=default            #oozie队列

    input=2017-05-09             #输入参数

    hdfspath=user/root           #自定义目录

    examplesRoot=ocn-itv-oozie      #自定义全局目录

    oozie.use.system.libpath=True    #是否启动系统lib库

    sparkopts=--executor-memory 1G    #参数设置

    start=2017-09-04T00:05+0800    #coordinator任务开始时间

    end=2017-09-04T00:36+0800      #coordinator任务结束时间

    start2=2017-09-01T00:06+0800

    end2=2017-09-04T00:36+0800

    oozie.libpath=${nameNode}/${hdfspath}/${examplesRoot}/lib/          #用户自定义lib库(存放jar包)

    workflowAppUri=${nameNode}/${hdfspath}/${examplesRoot}/wf/spark/fork/

    workflowAppUri2=${nameNode}/${hdfspath}/${examplesRoot}/wf/spark/single/  #coordinator定时调度对应的workflow.xml所在目录

    appPath=${nameNode}/${hdfspath}/${examplesRoot}/cd/single/

    appPath2=${nameNode}/${hdfspath}/${examplesRoot}/cd/single1/        #bundle调用对应的coordinator.xml所在目录

    oozie.bundle.application.path=${nameNode}/${hdfspath}/${examplesRoot}/bd/bd1/    #bundle.xml所在目录

    #一个bundle调用多个coordinator

     

     

     

    最后运行:

     

      启动任务:oozie job -config  job.properties  -run  -oozie http://192.168.2.11 (地址):11000/oozie

     

     

     需要注意的地方:

    一.  coordinator中timezone的时区配置

    Cloudera oozie默认时区是UTC,在开发oozie任务时必须在期望执行的时间上减去8小时,不方便。可以修改时区的配置操作。

    1.在oozie的配置文件中添加如下属性:

    <property>

     <name>oozie.processing.timezone</name>

     <value>GMT+0800</value>

    </property>

    2.如果使用了hue,进入Oozie web ui,选择Settings,然后在Timezone里选择CST(Asia/Shanghai)

    3.coordinator中的timeone设置为:timezone="Asia/Shanghai"

    4.修改时间格式,例如:2017-09-05T15:16+0800

    二.oozie.xx.application.path

    oozie.xx.application.path在job.properties里只能有一个。

    workflow:oozie.wf.application.path

    coordinator:oozie.coord.application.path

    bundle:oozie.bundle.application.path

    三.命名及存放位置问题

    其中workflow.xml,coordinator.xml,bundle.xml名字都不可以修改,要放到hdfs目录中,而job.properties名字可以修改,放在本地即可。

    四.关于workflow.xml 中action的问题:

    可以写多个action依次执行,如下示例所示:

     

     

    <workflow-app  name="java-example1" xmlns="uri:oozie:workflow:0.5"> 

        <start to="java-Action"/> 

        <action name="java-Action">

         ....

            <ok to="java-Action2"/>

            <error to="fail"/>   

        </action>

        <action name="java-Action2">

           ....

            <ok to="end"/>

            <error to="fail"/>   

        </action>  

          <kill name="fail"> 

           <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 

        </kill> 

       <end name="end"/> 

    </workflow-app>

     

     

    也可以设置多个任务并发执行,需要添加fork和join节点,fork节点把任务切分成多个并行任务,join则合并多个并行任务。fork和join节点必须是成对出现的。join节点合并的任务,必须是通一个fork出来的子任务才行。示例如下:

     

    <workflow-app  name="java-example1" xmlns="uri:oozie:workflow:0.5"> 

        <start to="forking"/>

        <fork name="forking">

            <path start="firstparalleljob"/>

            <path start="secondparalleljob"/>

        </fork>   

        <action name="firstparalleljob">

                .....

            <ok to="joining"/>

            <error to="fail"/>    

        </action>

        <action name="secondparalleljob">

                ....

            <ok to="joining"/>

            <error to="fail"/>   

        </action>  

        <join name="joining" to="end"/>

          <kill name="fail"> 

           <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> 

        </kill> 

       <end name="end"/> 

    </workflow-app>

     

  • 相关阅读:
    Python与Hack之window下运行带参数的Python脚本,实现一个简单的端口扫描器
    Python与Hack之守护进程
    Python与Hack之Zip文件口令破解
    Mysql远程连接,并解决wordp主题添加问题
    基于PHP以及Mysql,使用WordPress搭建站点
    AngularJS学习之输入验证
    微信公众平台搭建
    innobackupex: fatal error: no ‘innodb_buffer_pool_filename’解决方法
    Python多线程
    tpcc
  • 原文地址:https://www.cnblogs.com/jjSmileEveryDay/p/7520334.html
Copyright © 2020-2023  润新知