• 1.7-1.12 MapReduce Workflow


    一、案例运行MapReduce Workflow

    1、准备examples

    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# pwd
    /opt/cdh-5.3.6/oozie-4.0.0-cdh5.3.6
    
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# tar zxf oozie-examples.tar.gz    //此压缩包默认存在
    
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# cd examples/
    
    [root@hadoop-senior examples]# ls
    apps  input-data  src

    2、将examples目录上传到hdfs

    ##上传
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# /opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -put examples examples
    
    
    ##查看
    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# bin/hdfs dfs -ls /user/root |grep examples
    drwxr-xr-x   - root supergroup          0 2019-05-10 14:01 /user/root/examples

    3、修改配置

    ##先启动yarn、historyserver
    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# sbin/yarn-daemon.sh start resourcemanager
    
    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# sbin/yarn-daemon.sh start nodemanager
    
    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# sbin/mr-jobhistory-daemon.sh start historyserver
     
    
    ##看一下hdfs上examples里的目录结构
    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# bin/hdfs dfs -ls /user/root/examples/apps/map-reduce
    Found 5 items
    -rw-r--r--   1 root supergroup       1028 2019-05-10 14:01 /user/root/examples/apps/map-reduce/job-with-config-class.properties
    -rw-r--r--   1 root supergroup       1012 2019-05-10 14:01 /user/root/examples/apps/map-reduce/job.properties
    drwxr-xr-x   - root supergroup          0 2019-05-10 14:01 /user/root/examples/apps/map-reduce/lib
    -rw-r--r--   1 root supergroup       2274 2019-05-10 14:01 /user/root/examples/apps/map-reduce/workflow-with-config-class.xml
    -rw-r--r--   1 root supergroup       2559 2019-05-10 14:01 /user/root/examples/apps/map-reduce/workflow.xml
    
    说明:workflow.xml文件必须在hdfs上; job.properties文件在本地有也可以
    
    
    
    
    ####修改 job.properties
    
    nameNode=hdfs://hadoop-senior.ibeifeng.com:8020
    jobTracker=hadoop-senior.ibeifeng.com:8032
    queueName=default
    examplesRoot=examples
    
    oozie.coord.application.path=${nameNode}/user/${user.name}/${examplesRoot}/apps/map-reduce/workflow.xml
    outputDir=map-reduce
    
    
    ##更新一下hdfs的文件内容,不更新应该也可以
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# /opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -rm  examples/apps/map-reduce/job.properties
    
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# /opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -put examples/apps/map-reduce/job.properties examples/apps/map-reduce/

    4、

    ##
     [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# bin/oozie help
    
    
    ##运行一个MapReduce job
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# bin/oozie job -oozie http://localhost:11000/oozie -config examples/apps/map-reduce/job.properties -run
    job: 0000000-190510134749297-oozie-root-W
    
    
    ##
    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# bin/hdfs dfs -ls /user/root/examples/output-data/map-reduce
    Found 2 items
    -rw-r--r--   1 root supergroup          0 2019-05-10 16:27 /user/root/examples/output-data/map-reduce/_SUCCESS
    -rw-r--r--   1 root supergroup       1547 2019-05-10 16:27 /user/root/examples/output-data/map-reduce/part-00000
    
    oozie其实就是一个MapReduce,可以在yarn的web页面中看见,在oozie的页面中也可以看见;
    
    
    ##用命令行查看命令运行结果
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# bin/oozie job -oozie http://localhost:11000/oozie -info 0000000-190510134749297-oozie-root-W

    二、自定义Workflow

    1、关于workflow

    工作流引擎Oozie(驭象者),用于管理Hadoop任务(支持MapReduce、Spark、Pig、Hive),把这些任务以DAG(有向无环图)方式串接起来。

    Oozie任务流包括:coordinator、workflow;workflow描述任务执行顺序的DAG,而coordinator则用于定时任务触发,相当于workflow的定时管理器,其触发条件包括两类:
         1.  数据文件生成
         2.  时间条件

    workflow定义语言是基于XML的,它被称为hPDL(Hadoop过程定义语言)。
    
    workflow节点:
        控制流节点(Control Flow Nodes)
        动作节点(Action Nodes)
    
    
    其中,控制流节点定义了流程的开始和结束(start、end),以及控制流程的执行路径(Execution Path),如decision、fork、join等;
    而动作节点包括Hadoop任务、SSH、HTTP、eMail和Oozie子流程等。
    
    节点名称和转换必须符合以下模式=[a-zA-Z][-_a-zA-Z0-0]*=,最多20个字符。

    image

    start—>action—(ok)-->end

    start—>action—(error)-->end

    2、Workflow Action Nodes

    Action Computation/Processing Is Always Remote 
    
    Actions Are Asynchronous 
    
    Actions Have 2 Transitions, ok and error
    
    Action Recovery

    三、MapReduce action

    1、workflow

    Oozie中WorkFlow包括job.properties、workflow.xml 、lib 目录(依赖jar包)三部分组成。
    job.properties配置文件中包括nameNode、jobTracker、queueName、oozieAppsRoot、oozieDataRoot、oozie.wf.application.path、inputDir、outputDir,
    其关键点是指向workflow.xml文件所在的HDFS位置。
    
    ##############
    job.properties
    
    关键点:指向workflow.xml文件所在的HDFS位置
    
    workflow.xml (该文件需存放在HDFS上)
    包含几点:
      *start
      *action 
      *MapReduce、Hive、Sqoop、Shell 
        ok
        error
      *kill
      *end
    
    lib 目录 (该目录需存放在HDFS上)
    
    依赖jar包

    2、MapReduce action

    可以将map-reduce操作配置为在启动map reduce作业之前执行文件系统清理和目录创建,MapReduce的输入目录不能存在;
    
    工作流作业将等待Hadoop map/reduce作业完成,然后继续工作流执行路径中的下一个操作。
    
    Hadoop作业的计数器和作业退出状态(=FAILED=、kill或succeed)必须在Hadoop作业结束后对工作流作业可用。
    
    map-reduce操作必须配置所有必要的Hadoop JobConf属性来运行Hadoop map/reduce作业。

    四、新API中MapReduce Action

    1、准备目录

    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# mkdir -p oozie-apps/mr-wordcount-wf/lib
    
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# ls oozie-apps/mr-wordcount-wf/
    job.properties  lib  workflow.xml    //job.properties    workflow.xml这两个文件可以从其他地方copy过来再修改

    2、job.properties

    nameNode=hdfs://hadoop-senior.ibeifeng.com:8020
    jobTracker=hadoop-senior.ibeifeng.com:8032
    queueName=default
    oozieAppsRoot=user/root/oozie-apps
    oozieDataRoot=user/root/oozie/datas
    
    oozie.wf.application.path=${nameNode}/${oozieAppsRoot}/mr-wordcount-wf/workflow.xml
    
    inputDir=mr-wordcount-wf/input
    outputDir=mr-wordcount-wf/output

    3、workflow.xml

    <workflow-app xmlns="uri:oozie:workflow:0.5" name="mr-wordcount-wf">
        <start to="mr-node-wordcount"/>
        <action name="mr-node-wordcount">
            <map-reduce>
                <job-tracker>${jobTracker}</job-tracker>
                <name-node>${nameNode}</name-node>
                <prepare>
                    <delete path="${nameNode}/${oozieDataRoot}/${outputDir}"/>
                </prepare>
                <configuration>
                    <property>
                        <name>mapred.mapper.new-api</name>
                        <value>true</value>
                    </property>
                    <property>
                        <name>mapred.reducer.new-api</name>
                        <value>true</value>
                    </property>
                    <property>
                        <name>mapreduce.job.queuename</name>
                        <value>${queueName}</value>
                    </property>
                    <property>
                        <name>mapreduce.job.map.class</name>
                        <value>com.ibeifeng.hadoop.senior.mapreduce.WordCount$WordCountMapper</value>
                    </property>
                    <property>
                        <name>mapreduce.job.reduce.class</name>
                        <value>com.ibeifeng.hadoop.senior.mapreduce.WordCount$WordCountReducer</value>
                    </property>
                    
                    <property>
                        <name>mapreduce.map.output.key.class</name>
                        <value>org.apache.hadoop.io.Text</value>
                    </property>    
                    <property>
                        <name>mapreduce.map.output.value.class</name>
                        <value>org.apache.hadoop.io.IntWritable</value>
                    </property>    
                    <property>
                        <name>mapreduce.job.output.key.class</name>
                        <value>org.apache.hadoop.io.Text</value>
                    </property>
                    <property>
                        <name>mapreduce.job.output.value.class</name>
                        <value>org.apache.hadoop.io.IntWritable</value>
                    </property>
                    <property>
                        <name>mapreduce.input.fileinputformat.inputdir</name>
                        <value>${nameNode}/${oozieDataRoot}/${inputDir}</value>
                    </property>
                    <property>
                        <name>mapreduce.output.fileoutputformat.outputdir</name>
                        <value>${nameNode}/${oozieDataRoot}/${outputDir}</value>
                    </property>
                </configuration>
            </map-reduce>
            <ok to="end"/>
            <error to="fail"/>
        </action>
        <kill name="fail">
            <message>Map/Reduce failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
        </kill>
        <end name="end"/>
    </workflow-app>

    4、创建hdfs目录和数据,并运行

    ##
    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# bin/hdfs dfs -mkdir -p /user/root/oozie/datas/mr-wordcount-wf/input
    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# bin/hdfs dfs -put /opt/datas/wc.input /user/root/oozie/datas/mr-wordcount-wf/input
    
    
    ##把oozie-apps目录上传到hdfs上
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# /opt/cdh-5.3.6/hadoop-2.5.0-cdh5.3.6/bin/hdfs dfs -put oozie-apps/ oozie-apps
    
    
    ##执行oozie job
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# export OOZIE_URL=http://hadoop-senior.ibeifeng.com:11000/oozie/
    [root@hadoop-senior oozie-4.0.0-cdh5.3.6]# bin/oozie job -config oozie-apps/mr-wordcount-wf/job.properties -run
    
    此时可以在oozie 和yarn的web上看到job
     
    ##运行成功,查看运行结果

    [root@hadoop-senior hadoop-2.5.0-cdh5.3.6]# bin/hdfs dfs -text /user/root/oozie/datas/mr-wordcount-wf/output/part-r-00000
    hadoop    4
    hdfs    1
    hive    1
    hue    1
    mapreduce    1

    五、workflow编程要点

    如何定义一个WorkFlow:
        *job.properties
            关键点:指向workflow.xml文件所在的HDFS位置
        *workflow.xml
            定义文件
            XML文件
            包含几点
                *start
                *action
                    MapReduce、Hive、Sqoop、Shelll
                    *ok
                    *fail
                *kil1
                *end
    
        *1ib目录
            依赖的jar包
    
    
    
    workflow.xml编写:
        *流程控制节点
        *Action节点
    
    
    
    MapReduce Action:
        如何使用ooize调度MapReduce程序
        关键点:
        将以前Java MapReduce程序中的【Driver】部分
                 ||
        configuration
     
    ##使用新API的配置
    <property>
      <name>mapred.mapper.new-api</name>
      <value>true</value>
    </property>
    <property>
      <name>mapred.reducer.new-api</name>
      <value>true</value>
    </property>
    
  • 相关阅读:
    python3 练习题 day04
    python3 装饰器
    python3 生成器和生成器表达式
    python3 列表/字典/集合推导式
    python3 迭代器
    python3 day04 大纲
    ES6 的数值扩展
    ES6中的解构赋值
    ES6中 let与const 的区别
    react的基本配置安装及使用babel
  • 原文地址:https://www.cnblogs.com/weiyiming007/p/10849406.html
Copyright © 2020-2023  润新知