oozie可以用fork和join节点进行多任务并行处理,同时fork和join也是同时出现,缺一不可.
语法:
<workflow-app name="[WF-DEF-NAME]" xmlns="uri:oozie:workflow:0.1"> ... <fork name="[FORK-NODE-NAME]"> <path start="[NODE-NAME]" /> ... <path start="[NODE-NAME]" /> </fork> ... <join name="[JOIN-NODE-NAME]" to="[NODE-NAME]" /> ... </workflow-app>
官网给出的例子:
<workflow-app name="sample-wf" xmlns="uri:oozie:workflow:0.1"> ... <fork name="forking"> <path start="firstparalleljob"/> <path start="secondparalleljob"/> </fork> <action name="firstparallejob"> <map-reduce> <job-tracker>foo:8021</job-tracker> <name-node>bar:8020</name-node> <job-xml>job1.xml</job-xml> </map-reduce> <ok to="joining"/> <error to="kill"/> </action> <action name="secondparalleljob"> <map-reduce> <job-tracker>foo:8021</job-tracker> <name-node>bar:8020</name-node> <job-xml>job2.xml</job-xml> </map-reduce> <ok to="joining"/> <error to="kill"/> </action> <join name="joining" to="nextaction"/> ... </workflow-app>
工作时写的:
<workflow-app name="java-example1" xmlns="uri:oozie:workflow:0.5"> <start to="forking"/> <fork name="forking"> <path start="firstparalleljob"/> <path start="secondparalleljob"/> </fork> <action name="firstparalleljob"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>java</exec> <argument>-cp</argument> <argument>test1.OzzieTest1</argument> <argument>-jar</argument> <argument>test.jar</argument> </shell> <ok to="joining"/> <error to="fail"/> </action> <action name="secondparalleljob"> <shell xmlns="uri:oozie:shell-action:0.1"> <job-tracker>${jobTracker}</job-tracker> <name-node>${nameNode}</name-node> <configuration> <property> <name>mapred.job.queue.name</name> <value>${queueName}</value> </property> </configuration> <exec>java</exec> <argument>-cp</argument> <argument>test1.OzzieTest</argument> <argument>-jar</argument> <argument>test.jar</argument> </shell> <ok to="joining"/> <error to="fail"/> </action> <join name="joining" to="end"/> <kill name="fail"> <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message> </kill> <end name="end"/> </workflow-app>
fork节点把任务切分成多个并行任务,join则合并多个并行任务。fork和join节点必须是成对出现的。join节点合并的任务,必须是通一个fork出来的子任务才行。