• 【原创】大数据基础之Oozie(1)简介、源代码解析


    Oozie4.3

    一 简介

     

    1 官网

    http://oozie.apache.org/

    Apache Oozie Workflow Scheduler for Hadoop

    Hadoop生态的工作流调度器

    Overview

    Oozie is a workflow scheduler system to manage Apache Hadoop jobs.

    Oozie Workflow jobs are Directed Acyclical Graphs (DAGs) of actions.

    Oozie Coordinator jobs are recurrent Oozie Workflow jobs triggered by time (frequency) and data availability.

    Oozie is integrated with the rest of the Hadoop stack supporting several types of Hadoop jobs out of the box (such as Java map-reduce, Streaming map-reduce, Pig, Hive, Sqoop and Distcp) as well as system specific jobs (such as Java programs and shell scripts).

    Oozie is a scalable, reliable and extensible system.

     

    2 部署

     

    3 数据库表结构

     

    wf_jobs:工作流实例

    wf_actions:任务实例

    coord_jobs:调度实例

    coord_actions:调度任务实例

    4 概念

    l  Control Node:工作流的开始、结束以及决定Workflow的执行路径的节点(start、end、kill、decision、fork/join)

    l  Action Node:工作流执行的计算任务,支持的类型包括(HDFS、MapReduce、Java、Shell、SSH、Pig、Hive、E-Mail、Sub-Workflow、Sqoop、Distcp),即任务

    l  Workflow:由Control Node以及一系列Action Node组成的工作流,即工作流

    l  Coordinator:根据指定Cron信息触发workflow,即调度

    l  Bundle:按照组的方式批量管理Coordinator任务,实现集中的启停

    二 代码解析

    1 启动过程

    加载配置的所有service:

    ServicesLoader.contextInitialized

             Services.init

                      Services.loadServices (oozie.services, oozie.services.ext)

    Service结构:

    Service

             org.apache.oozie.service.SchedulerService,

             org.apache.oozie.service.InstrumentationService,

             org.apache.oozie.service.MemoryLocksService,

             org.apache.oozie.service.UUIDService,

             org.apache.oozie.service.ELService,

             org.apache.oozie.service.AuthorizationService,

             org.apache.oozie.service.UserGroupInformationService,

             org.apache.oozie.service.HadoopAccessorService,

             org.apache.oozie.service.JobsConcurrencyService,

             org.apache.oozie.service.URIHandlerService,

             org.apache.oozie.service.DagXLogInfoService,

             org.apache.oozie.service.SchemaService,

             org.apache.oozie.service.LiteWorkflowAppService,

             org.apache.oozie.service.JPAService,

             org.apache.oozie.service.StoreService,

             org.apache.oozie.service.SLAStoreService,

             org.apache.oozie.service.DBLiteWorkflowStoreService,

             org.apache.oozie.service.CallbackService,

             org.apache.oozie.service.ActionService,

             org.apache.oozie.service.ShareLibService,

             org.apache.oozie.service.CallableQueueService,

             org.apache.oozie.service.ActionCheckerService,

             org.apache.oozie.service.RecoveryService,

             org.apache.oozie.service.PurgeService,

             org.apache.oozie.service.CoordinatorEngineService,

             org.apache.oozie.service.BundleEngineService,

             org.apache.oozie.service.DagEngineService,

             org.apache.oozie.service.CoordMaterializeTriggerService,

             org.apache.oozie.service.StatusTransitService,

             org.apache.oozie.service.PauseTransitService,

             org.apache.oozie.service.GroupsService,

             org.apache.oozie.service.ProxyUserService,

             org.apache.oozie.service.XLogStreamingService,

             org.apache.oozie.service.JvmPauseMonitorService,

             org.apache.oozie.service.SparkConfigurationService

    2 核心引擎

    BaseEngine

             DAGEngine (负责workflow执行

             CoordinatorEngine 负责coordinator执行

             BundleEngine 负责bundle执行

    3 workflow提交执行过程

    DAGEngine.submitJob| submitJobFromCoordinator (提交workflow)

             SubmitXCommand.call

                      execute

                              LiteWorkflowAppService.parseDef (解析得到WorkflowApp)

                                       LiteWorkflowLib.parseDef

                                                LiteWorkflowAppParser.validateAndParse

                                                         parse

                              WorkflowLib.createInstance (创建WorkflowInstance)

                              BatchQueryExecutor.executeBatchInsertUpdateDelete (保存WorkflowJobBean 到wf_jobs)

             StartXCommand.call

                      SignalXCommand.call

                              execute

                                       WorkflowInstance.start

                                                LiteWorkflowInstance.start

                                                         signal

                                                                 NodeHandler.enter

                                                                          ActionNodeHandler.enter

                                                                                   start

                                                                                            LiteWorkflowStoreService.liteExecute (添加WorkflowActionBean到ACTIONS_TO_START)

                                       WorkflowStoreService.getActionsToStart (从ACTIONS_TO_START取Action)

                                                ActionStartXCommand.call

                                                         ActionExecutor.start

                                                         WorkflowNotificationXCommand.call

                                                BatchQueryExecutor.executeBatchInsertUpdateDelete (保存WorkflowActionBean到wf_actions)

    ActionExecutor.start是异步的,还需要检查Action执行状态来推进流程,有两种情况:

    一种是Oozie Server正常运行:利用JobEndNotification

    CallbackServlet.doGet

             DagEngine.processCallback

                      CompletedActionXCommand.call

                              ActionCheckXCommand.call

                                       ActionExecutor.check

    ActionEndXCommand.call

                                                SignalXCommand.call

    一种是Oozie Server重启:利用ActionCheckerService

    ActionCheckerService.init

             ActionCheckRunnable.run

                      runWFActionCheck (GET_RUNNING_ACTIONS, oozie.service.ActionCheckerService.action.check.delay=600)

                              ActionCheckXCommand.call

                                        ActionExecutor.check

                                       ActionEndXCommand.call

                                                SignalXCommand.call

                      runCoordActionCheck

    4 coordinator提交执行过程

    CoordinatorEngine.submitJob(提交coordinator)

             CoordSubmitXCommand.call

                      submit

                              submitJob

                                       storeToDB

                                                CoordJobQueryExecutor.insert (保存CoordinatorJobBean到coord_jobs)

                                       queueMaterializeTransitionXCommand

                                                CoordMaterializeTransitionXCommand.call

                                                         execute

                                                                  materialize

                                                                          materializeActions

                                                                                   CoordCommandUtils.materializeOneInstance(创建CoordinatorActionBean)

                                                                                    storeToDB

                                                                 performWrites

                                                                          BatchQueryExecutor.executeBatchInsertUpdateDelete(保存CoordinatorActionBean到coord_actions)

                                                                          CoordActionInputCheckXCommand.call

                                                                                   CoordActionReadyXCommand.call

                                                                                            CoordActionStartXCommand.call

                                                                                                    DAGEngine.submitJobFromCoordinator

    定时任务触发Materialize:

    CoordMaterializeTriggerService.init

             CoordMaterializeTriggerRunnable.run

                      CoordMaterializeTriggerService.runCoordJobMatLookup

                              materializeCoordJobs (GET_COORD_JOBS_OLDER_FOR_MATERIALIZATION)

                                       CoordMaterializeTransitionXCommand.call

    5 分布式

    有些内部任务只能启动一个,单server环境Oozie中通过MemoryLocksService来保证,多server环境Oozie通过ZKLocksService来保证,要开启ZK,需要开启一些service:

    org.apache.oozie.service.ZKLocksService,

    org.apache.oozie.service.ZKXLogStreamingService,

    org.apache.oozie.service.ZKJobsConcurrencyService,

    org.apache.oozie.service.ZKUUIDService

    同时需要配置oozie.zookeeper.connection.string

    6 任务执行过程

    ActionExecutor是任务执行的核心抽象基类,所有的具体任务都是这个类的子类

    ActionExecutor

             JavaActionExecutor

             SshActionExecutor

             FsActionExecutor

             SubWorkflowActionExecutor

    其中JavaActionExecutor是最重要的一个子类,很多其他的任务都是这个类的子类(比如HiveActionExecutor、SparkActionExecutor等)

    JavaActionExecutor.start

             prepareActionDir

             submitLauncher

                      JobClient.getJob

                      injectLauncherCallback

                              ActionExecutor.Context.getCallbackUrl

                                       job.end.notification.url

                      createLauncherConf

                              LauncherMapperHelper.setupLauncherInfo

                      JobClient.submitJob

             check

    JavaActionExecutor执行时会提交一个map任务到yarn,即LauncherMapper,

    LauncherMapper.map

             LauncherMain.main

    LauncherMain是具体任务的执行类

    LauncherMain

             JavaMain

             HiveMain

             Hive2Main

             SparkMain

             ShellMain

             SqoopMain

  • 相关阅读:
    HHUOJ 1321
    数据结构应用
    数据结构应用
    数据结构与算法分析
    数据结构与算法分析
    CSS -- 字体样式
    CSS -- 选择器
    CSS
    HTML -- 表单元素2
    HTML -- 表单元素1
  • 原文地址:https://www.cnblogs.com/barneywill/p/9895091.html
Copyright © 2020-2023  润新知