• TaskTracker任务初始化及启动task源码级分析


      在监听器初始化Job、JobTracker相应TaskTracker心跳、调度器分配task源码级分析中我们分析的Tasktracker发送心跳的机制,这一节我们分析TaskTracker接受JobTracker的响应信息后的工作内容。

      TaskTracker中的transmitHeartBeat方法通过调用JobTracker.heartbeat方法获得心跳的响应信息HeartbeatResponse,然后返回给TaskTracker.offerService()方法。HeartbeatResponse中包含了以下几个重要的信息:

      (1)可能包含一个cleanup task或者一个setup task,一个心跳只能包含一个这种类型的task。优先考虑map的cleanup,然后map的setup,然后reduce的cleanup,然后reduce的setup;

      (2)调度器分配的MapTask(可以有多个,最多有一个非本地的Map(而且一旦有此种类的Map,则会停止分配Map,返回Map列表))或者ReduceTask(一次心跳最多分配1个);

      (3)TaskTracker上对应的一些应该被Kill的Task;

      (4)TaskTracker上对应的一些应该被Kill的Job;

      (5)TaskTracker上可以保存数据的Task;

      (6)下一次的心跳间隔;

      (7)如果JobTracker重启了,还会有需要恢复的Job列表;

      (8)还有就是只返回重启命令ReinitTrackerAction。如果TaskTracker不是第一次发送心跳链接JobTracker,且JobTracker也没重启,并且没有此TaskTracker上一次心跳信息,说明可能存在严重的问题,因此让此tasktracker重新初始化。

      TaskTracker.offerService()方法是一个while循环,始终是执行等待心跳时间发送心跳,接受响应信息,分析响应信息中的任务。接受到响应信息HeartbeatResponse之后:

      一、获取恢复作业列表(如果响应信息中有要恢复的作业),重置各个Job的状态,然后将所有正在运行的处于SHUFFLE阶段的Reduce Task回滚放入shouldReset中;

      二、然后调用HeartbeatResponse的getActions()函数获得JobTracker传过来的所有指令即一个TaskTrackerAction数组:TaskTrackerAction[] actions = heartbeatResponse.getActions()。

      三、如果actions是重新初始化命令则会直接返回State.STALE到run()中,会跳出内层while循环,然后外层while继续执行,调用initialize()方法进行初始化,并再次执行offerService()。

      四、重置心跳间隔heartbeatInterval = heartbeatResponse.getHeartbeatInterval()

      五、置justStarted、justInited都为false表示已经启动服务,并已连接JobTracker

      六、遍历actions数组:

      (1)如果是LaunchTaskAction,则调用addToTaskQueue((LaunchTaskAction)action)将Action添加到任务队列中,加入TaskLauncher线程的执行队列。addToTaskQueue方法会根据LaunchTaskAction的类型将这个action加入mapLauncher或者reduceLauncher,这两个launcher都是TaskLauncher extends Thread的对象,这两个线程对象都是在initialize()时初始化,会通过addToTaskQueue(action)方法将action加入 List<TaskInProgress> tasksToLaunch列表,注意这个TaskInProgress是TaskTracker.TaskInProgress,而非MapRed包中的 TaskInProgress类。TaskLauncher类的run方法会始终监控tasksToLaunch,一旦发现有新的任务,就获取第一个task,并检查是否可以运行此task等待有足够的slot来运行此task,还要判断(canBeLaunched()方法)此task的运行状态必须是UNASSIGNED、FAILED_UNCLEAN、KILLED_UNCLEAN三者之一才可以执行。最终通过startNewTask(tip)方法来执行。

      (2)如果是CommitTaskAction,就加入commitResponses.add(commitAction.getTaskID()),这类任务指的是处理完数据之后,将最终结果从临时目录转移到最终目录的过程,只有将输出结果直接写到HDFS上的任务才会经历这个过程,只有两类任务:reduce task和map-only类型的map task。不管是map task、Reduce task、setup task、cleanup job task、cleanup task task执行完后都会调用done(umbilical, reporter)该方法会通过层层调用找到commitResponses等待JobTracker的commit命令。

      (3)其他则直接加入tasksToCleanup.put(action),包括杀死任务或作业。taskCleanupThread线程会始终监控tasksToCleanup队列,从中take一个TaskTrackerAction action,如果这个action是KillJobAction类型,就调用方法purgeJob((KillJobAction) action)来处理,这个方法会从runningJobs获取对应的RunningJob,如果允许清理文件会将这个job对应的文件都删除,将这个RunningJob对应的所有task清空;如果这个action是KillTaskAction,就调用processKillTaskAction((KillTaskAction) action)来处理:会从tasks中获取对应的TaskInProgress,然后从runningJobs中找到对应的RunningJob,并从RunningJob中的task列表中删除这个task。

      七、markUnresponsiveTasks(),杀死一定时间没没有汇报进度的task

      八、killOverflowingTasks(),当剩余磁盘空间小于mapred.local.dir.minspacekill(默认为0)时,寻找合适的任务将其杀掉以释放空间

      九、到这已经做了清理和恢复工作,所以如果acceptNewTasks==false并且此tasktracker处于空闲,就将acceptNewTasks=true,可以接受新的任务了

      十、checkJettyPort(server.getPort()),官方给的解释是:为了谨慎,因为有些情况获得的jetty端口不一致。检查是如果端口号小于0,shuttingDown = true这样会使得run中的两层循环、offerService()中的while循环都退出,致使main()结束运行,该tasktracker关闭。

      上面的六中介绍了各种类型的任务,其中map task和reduce task都是通过startNewTask(tip)方法来启动的。这个方法对每个TaskTracker.TaskInProgress都会启动一个单独的线程来执行,这个线程的run方法主要工作是,一旦运行过程出错,异常处理会将这个tip杀死,并清理相对于的一些数据。:  

    1       RunningJob rjob = localizeJob(tip);    
    2           tip.getTask().setJobFile(rjob.getLocalizedJobConf().toString());
    3           // Localization is done. Neither rjob.jobConf nor rjob.ugi can be null
    4           launchTaskForJob(tip, new JobConf(rjob.getJobConf()), rjob); //执行task

      (1)localizeJob(tip)方法是确保首先对作业进行本地化,即第一个tip要对作业进行本地化,后续的tip只对任务本地化。会调用initializeJob(t, rjob, ttAddr)方法对作业进行本地化,会从HDFS下载JobToken和job.xml到本地,然后通过TaskController.initializeJob方法完成剩余的工作,默认是DefaultTaskController,这个initializeJob方法会在本地创建一些目录,并下载job.jar到本地,创建job-acls.xml保存作业访问控制权限等信息。在这个方法中除了作业初始化其他的任务初始化基本没做什么工作。

      (2)launchTaskForJob(tip, new JobConf(rjob.getJobConf()), rjob)方法来执行,会调用TaskTracker.TaskInProgress的launchTask()函数启动Task,如果这个task的状态是UNASSIGNED、FAILED_UNCLEAN、KILLED_UNCLEAN三者之一,就调用方法对localizeTask(task)对task做一些配置信息,然后创建一个TaskRunner,如果是map类型的任务会创建MapTaskRunner,如果是reduce类型的任务会创建ReduceTaskRunner,但任务的启动最终均是其父类TaskRunner.run()方法完成。启动TaskRunner。TaskRunner是一个线程类,其run()方法代码如下:  

      1   @Override
      2   public final void run() {
      3     String errorInfo = "Child Error";
      4     try {
      5       
      6       //before preparing the job localize 
      7       //all the archives
      8       TaskAttemptID taskid = t.getTaskID();
      9       final LocalDirAllocator lDirAlloc = new LocalDirAllocator("mapred.local.dir");
     10       //simply get the location of the workDir and pass it to the child. The
     11       //child will do the actual dir creation
     12       final File workDir =
     13       new File(new Path(localdirs[rand.nextInt(localdirs.length)], 
     14           TaskTracker.getTaskWorkDir(t.getUser(), taskid.getJobID().toString(), 
     15           taskid.toString(),
     16           t.isTaskCleanupTask())).toString());
     17       
     18       String user = tip.getUGI().getUserName();
     19       
     20       // Set up the child task's configuration. After this call, no localization
     21       // of files should happen in the TaskTracker's process space. Any changes to
     22       // the conf object after this will NOT be reflected to the child.
     23       // setupChildTaskConfiguration(lDirAlloc);
     24 
     25       if (!prepare()) {
     26         return;
     27       }
     28       
     29       // Accumulates class paths for child.
     30       List<String> classPaths = getClassPaths(conf, workDir,
     31                                               taskDistributedCacheManager);
     32 
     33       long logSize = TaskLog.getTaskLogLength(conf);
     34       
     35       //  Build exec child JVM args.
     36       Vector<String> vargs = getVMArgs(taskid, workDir, classPaths, logSize);
     37       
     38       tracker.addToMemoryManager(t.getTaskID(), t.isMapTask(), conf);
     39 
     40       // set memory limit using ulimit if feasible and necessary ...
     41       String setup = getVMSetupCmd();
     42       // Set up the redirection of the task's stdout and stderr streams
     43       File[] logFiles = prepareLogFiles(taskid, t.isTaskCleanupTask());
     44       File stdout = logFiles[0];
     45       File stderr = logFiles[1];
     46       tracker.getTaskTrackerInstrumentation().reportTaskLaunch(taskid, stdout,
     47                  stderr);
     48       
     49       Map<String, String> env = new HashMap<String, String>();
     50       errorInfo = getVMEnvironment(errorInfo, user, workDir, conf, env, taskid,
     51                                    logSize);
     52       
     53       // flatten the env as a set of export commands
     54       List <String> setupCmds = new ArrayList<String>();
     55       for(Entry<String, String> entry : env.entrySet()) {
     56         StringBuffer sb = new StringBuffer();
     57         sb.append("export ");
     58         sb.append(entry.getKey());
     59         sb.append("="");
     60         sb.append(entry.getValue());
     61         sb.append(""");
     62         setupCmds.add(sb.toString());
     63       }
     64       setupCmds.add(setup);
     65       
     66       launchJvmAndWait(setupCmds, vargs, stdout, stderr, logSize, workDir);
     67       tracker.getTaskTrackerInstrumentation().reportTaskEnd(t.getTaskID());
     68       if (exitCodeSet) {
     69         if (!killed && exitCode != 0) {
     70           if (exitCode == 65) {
     71             tracker.getTaskTrackerInstrumentation().taskFailedPing(t.getTaskID());
     72           }
     73           throw new IOException("Task process exit with nonzero status of " +
     74               exitCode + ".");
     75         }
     76       }
     77     } catch (FSError e) {
     78       LOG.fatal("FSError", e);
     79       try {
     80         tracker.fsErrorInternal(t.getTaskID(), e.getMessage());
     81       } catch (IOException ie) {
     82         LOG.fatal(t.getTaskID()+" reporting FSError", ie);
     83       }
     84     } catch (Throwable throwable) {
     85       LOG.warn(t.getTaskID() + " : " + errorInfo, throwable);
     86       Throwable causeThrowable = new Throwable(errorInfo, throwable);
     87       ByteArrayOutputStream baos = new ByteArrayOutputStream();
     88       causeThrowable.printStackTrace(new PrintStream(baos));
     89       try {
     90         tracker.reportDiagnosticInfoInternal(t.getTaskID(), baos.toString());
     91       } catch (IOException e) {
     92         LOG.warn(t.getTaskID()+" Reporting Diagnostics", e);
     93       }
     94     } finally {
     95       
     96       // It is safe to call TaskTracker.TaskInProgress.reportTaskFinished with
     97       // *false* since the task has either
     98       // a) SUCCEEDED - which means commit has been done
     99       // b) FAILED - which means we do not need to commit
    100       tip.reportTaskFinished(false);
    101     }
    102   }
    View Code

      run方法主要是做一些准备工作,包括通过getVMArgs方法获取JVM的参数信息、通过getVMEnvironment获得环境变量信息然后组合成启动命令setupCmds;最终通过launchJvmAndWait(setupCmds, vargs, stdout, stderr, logSize, workDir)交给jvmManager对象启动一个JVM。

      JvmManager负责管理TaskTracker上所有正在使用的JVM,包括启动、停止、杀死JVM等。一般来说map和Reduce占用的资源量不同,所以JvmManager使用mapJvmManager和reduceJvmManager来分别管理两种类型的task对应的JVM。且要满足:

      A、两种task对应的slot的数量均不能超过此TaskTracker中各自最大slot数量;

      B、每个JVM只能同时运行一个任务;

      C、JVM可复用,且有次数限制和仅限同一个作业的同类型任务使用。

      launchJvmAndWait方法会调用jvmManager.launchJvm(this, jvmManager.constructJvmEnv(setup, vargs, stdout,stderr, logSize, workDir, conf))来启动task。这个方法会根据task的类型,选择mapJvmManager或者reduceJvmManager的reapJvm(t, env)来启动JVM,两种类型(mapJvmManager、reduceJvmManager)使用的是同一个方法。该方法代码如下:  

     1     private synchronized void reapJvm( 
     2         TaskRunner t, JvmEnv env) throws IOException, InterruptedException {
     3       if (t.getTaskInProgress().wasKilled()) {
     4         //the task was killed in-flight
     5         //no need to do the rest of the operations
     6         return;
     7       }
     8       boolean spawnNewJvm = false;
     9       JobID jobId = t.getTask().getJobID();
    10       //Check whether there is a free slot to start a new JVM.
    11       //,or, Kill a (idle) JVM and launch a new one
    12       //When this method is called, we *must* 
    13       // (1) spawn a new JVM (if we are below the max) 
    14       // (2) find an idle JVM (that belongs to the same job), or,
    15       // (3) kill an idle JVM (from a different job) 
    16       // (the order of return is in the order above)
    17       int numJvmsSpawned = jvmIdToRunner.size();
    18       JvmRunner runnerToKill = null;
    19       if (numJvmsSpawned >= maxJvms) {
    20         //go through the list of JVMs for all jobs.
    21         Iterator<Map.Entry<JVMId, JvmRunner>> jvmIter = 
    22           jvmIdToRunner.entrySet().iterator();
    23         
    24         while (jvmIter.hasNext()) {
    25           JvmRunner jvmRunner = jvmIter.next().getValue();
    26           JobID jId = jvmRunner.jvmId.getJobId();
    27           //look for a free JVM for this job; if one exists then just break
    28           if (jId.equals(jobId) && !jvmRunner.isBusy() && !jvmRunner.ranAll()){
    29             setRunningTaskForJvm(jvmRunner.jvmId, t); //reserve the JVM
    30             LOG.info("No new JVM spawned for jobId/taskid: " + 
    31                      jobId+"/"+t.getTask().getTaskID() +
    32                      ". Attempting to reuse: " + jvmRunner.jvmId);
    33             return;
    34           }
    35           //Cases when a JVM is killed: 
    36           // (1) the JVM under consideration belongs to the same job 
    37           //     (passed in the argument). In this case, kill only when
    38           //     the JVM ran all the tasks it was scheduled to run (in terms
    39           //     of count).
    40           // (2) the JVM under consideration belongs to a different job and is
    41           //     currently not busy
    42           //But in both the above cases, we see if we can assign the current
    43           //task to an idle JVM (hence we continue the loop even on a match)
    44           if ((jId.equals(jobId) && jvmRunner.ranAll()) ||
    45               (!jId.equals(jobId) && !jvmRunner.isBusy())) {
    46             runnerToKill = jvmRunner;
    47             spawnNewJvm = true;
    48           }
    49         }
    50       } else {
    51         spawnNewJvm = true;
    52       }
    53 
    54       if (spawnNewJvm) {
    55         if (runnerToKill != null) {
    56           LOG.info("Killing JVM: " + runnerToKill.jvmId);
    57           killJvmRunner(runnerToKill);
    58         }
    59         //888888888888888888888**********************************
    60         spawnNewJvm(jobId, env, t);  //在此运行Child
    61         return;
    62       }
    63       //*MUST* never reach this
    64       LOG.fatal("Inconsistent state!!! " +
    65               "JVM Manager reached an unstable state " +
    66             "while reaping a JVM for task: " + t.getTask().getTaskID()+
    67             " " + getDetails() + ". Aborting. ");
    68       System.exit(-1);
    69     }
    View Code

      A、先检查已启动的JVM数是否低于对应类型(map、reduce)的slot的上限,低于的话直接启动一个JVM,否则执行B;

      B、检查所有已启动的JVM(jvmIdToRunner)找到满足:(1)当前状态为空对应jvmRunner.isBusy();(2)复用次数未超过上限对应jvmRunner.ranAll();(3)与将要启动的任务同属一个作业对应jId.equals(jobId);这样的JVM,则可直接复用不需启动新的JVM,保留此JVM对应setRunningTaskForJvm(jvmRunner.jvmId, t)。

      C、查找当前TaskTracker所有已启动的JVM,满足一下之一:(1)复用次数已达上限且与新任务同属一个作业;(2)当前处于空闲状态但与新任务不属于一个作业;就直接杀死该JVM对应方法killJvmRunner(runnerToKill),并启动一个新的JVM

      通过spawnNewJvm(jobId, env, t)创建一个JvmRunner线程,将其加入jvmIdToRunner,调用setRunningTaskForJvm修改一些数据结构,启动这个JvmRunner。其runn方法直接调用runChild(env),代码如下:  

     1  public void runChild(JvmEnv env) throws IOException, InterruptedException{
     2         int exitCode = 0;
     3         try {
     4           env.vargs.add(Integer.toString(jvmId.getId()));
     5           TaskRunner runner = jvmToRunningTask.get(jvmId);
     6           if (runner != null) {
     7             Task task = runner.getTask();
     8             //Launch the task controller to run task JVM
     9             String user = task.getUser();
    10             TaskAttemptID taskAttemptId = task.getTaskID();
    11             String taskAttemptIdStr = task.isTaskCleanupTask() ? 
    12                 (taskAttemptId.toString() + TaskTracker.TASK_CLEANUP_SUFFIX) :
    13                   taskAttemptId.toString(); 
    14                 exitCode = tracker.getTaskController().launchTask(user,//DefaultTaskController++++++++++++++执行任务
    15                     jvmId.jobId.toString(), taskAttemptIdStr, env.setup,
    16                     env.vargs, env.workDir, env.stdout.toString(),
    17                     env.stderr.toString());
    18           }
    19         } catch (IOException ioe) {
    20           // do nothing
    21           // error and output are appropriately redirected
    22         } finally { // handle the exit code
    23           // although the process has exited before we get here,
    24           // make sure the entire process group has also been killed.
    25           kill();
    26           updateOnJvmExit(jvmId, exitCode);
    27           LOG.info("JVM : " + jvmId + " exited with exit code " + exitCode
    28               + ". Number of tasks it ran: " + numTasksRan);
    29           deleteWorkDir(tracker, firstTask);
    30         }
    31       }
    View Code

      最重要的是tracker.getTaskController().launchTask,该方法代码如下(默认是DefaultTaskController):  

     1 /**
     2    * Create all of the directories for the task and launches the child jvm.
     3    * @param user the user name
     4    * @param attemptId the attempt id
     5    * @throws IOException
     6    */
     7   @Override
     8   public int launchTask(String user, 
     9                                   String jobId,
    10                                   String attemptId,
    11                                   List<String> setup,
    12                                   List<String> jvmArguments,
    13                                   File currentWorkDirectory,
    14                                   String stdout,
    15                                   String stderr) throws IOException {
    16     ShellCommandExecutor shExec = null;
    17     try {                    
    18       FileSystem localFs = FileSystem.getLocal(getConf());
    19       
    20       //create the attempt dirs
    21       new Localizer(localFs, 
    22           getConf().getStrings(JobConf.MAPRED_LOCAL_DIR_PROPERTY)).
    23           initializeAttemptDirs(user, jobId, attemptId);
    24       
    25       // create the working-directory of the task 
    26       if (!currentWorkDirectory.mkdir()) {
    27         throw new IOException("Mkdirs failed to create " 
    28                     + currentWorkDirectory.toString());
    29       }
    30       
    31       //mkdir the loglocation
    32       String logLocation = TaskLog.getAttemptDir(jobId, attemptId).toString();
    33       if (!localFs.mkdirs(new Path(logLocation))) {
    34         throw new IOException("Mkdirs failed to create " 
    35                    + logLocation);
    36       }
    37       //read the configuration for the job
    38       FileSystem rawFs = FileSystem.getLocal(getConf()).getRaw();
    39       long logSize = 0; //TODO MAPREDUCE-1100
    40       // get the JVM command line.
    41       String cmdLine = 
    42         TaskLog.buildCommandLine(setup, jvmArguments,
    43             new File(stdout), new File(stderr), logSize, true);
    44 
    45       // write the command to a file in the
    46       // task specific cache directory
    47       // TODO copy to user dir
    48       Path p = new Path(allocator.getLocalPathForWrite(
    49           TaskTracker.getPrivateDirTaskScriptLocation(user, jobId, attemptId),
    50           getConf()), COMMAND_FILE);        //"taskjvm.sh"文件
    51 
    52       String commandFile = writeCommand(cmdLine, rawFs, p);//将命令写入"taskjvm.sh",p是文件名
    53       rawFs.setPermission(p, TaskController.TASK_LAUNCH_SCRIPT_PERMISSION);
    54       shExec = new ShellCommandExecutor(new String[]{
    55           "bash", "-c", commandFile},
    56           currentWorkDirectory);
    57       shExec.execute();
    58     } catch (Exception e) {
    59       if (shExec == null) {
    60         return -1;
    61       }
    62       int exitCode = shExec.getExitCode();
    63       LOG.warn("Exit code from task is : " + exitCode);
    64       LOG.info("Output from DefaultTaskController's launchTask follows:");
    65       logOutput(shExec.getOutput());
    66       return exitCode;
    67     }
    68     return 0;
    69   }
    View Code

      launchTask方法首先会在磁盘上创建任务工作目录,接着讲任务启动命令写入shell脚本”taskjvm.sh“中,并构造一个ShellCommandExecutor对象调用其execute()方法通过ProcessBuilder执行命令"bash -c taskjvm.sh",这样就启动了一个JVM来执行task。脚本最终会启动一个org.apache.hadoop.mapred.Child类来运行任务的。其main方法内容较长代码如下:

      1 //真正的map task和reduce task都是在Child进程中运行的,Child的main函数的主要逻辑如下
      2   public static void main(String[] args) throws Throwable {
      3     LOG.debug("Child starting");
      4 //创建RPC Client,启动日志同步线程
      5     final JobConf defaultConf = new JobConf();
      6     String host = args[0];
      7     int port = Integer.parseInt(args[1]);
      8     final InetSocketAddress address = NetUtils.makeSocketAddr(host, port);
      9     final TaskAttemptID firstTaskid = TaskAttemptID.forName(args[2]);
     10     final String logLocation = args[3];
     11     final int SLEEP_LONGER_COUNT = 5;
     12     int jvmIdInt = Integer.parseInt(args[4]);
     13     JVMId jvmId = new JVMId(firstTaskid.getJobID(),firstTaskid.isMap(),jvmIdInt);
     14     String prefix = firstTaskid.isMap() ? "MapTask" : "ReduceTask";
     15     
     16     cwd = System.getenv().get(TaskRunner.HADOOP_WORK_DIR);
     17     if (cwd == null) {
     18       throw new IOException("Environment variable " + 
     19                              TaskRunner.HADOOP_WORK_DIR + " is not set");
     20     }
     21 
     22     // file name is passed thru env
     23     String jobTokenFile = 
     24       System.getenv().get(UserGroupInformation.HADOOP_TOKEN_FILE_LOCATION);
     25     Credentials credentials = 
     26       TokenCache.loadTokens(jobTokenFile, defaultConf);
     27     LOG.debug("loading token. # keys =" +credentials.numberOfSecretKeys() + 
     28         "; from file=" + jobTokenFile);
     29     
     30     Token<JobTokenIdentifier> jt = TokenCache.getJobToken(credentials);
     31     SecurityUtil.setTokenService(jt, address);
     32     UserGroupInformation current = UserGroupInformation.getCurrentUser();
     33     current.addToken(jt);
     34 
     35     UserGroupInformation taskOwner 
     36      = UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());
     37     taskOwner.addToken(jt);
     38     
     39     // Set the credentials
     40     defaultConf.setCredentials(credentials);
     41     
     42     final TaskUmbilicalProtocol umbilical = 
     43       taskOwner.doAs(new PrivilegedExceptionAction<TaskUmbilicalProtocol>() {
     44         @Override
     45         public TaskUmbilicalProtocol run() throws Exception {
     46           return (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,
     47               TaskUmbilicalProtocol.versionID,
     48               address,
     49               defaultConf);
     50         }
     51     });
     52     
     53     int numTasksToExecute = -1; //-1 signifies "no limit"
     54     int numTasksExecuted = 0;
     55     Runtime.getRuntime().addShutdownHook(new Thread() {
     56       public void run() {
     57         try {
     58           if (taskid != null) {
     59             TaskLog.syncLogs
     60               (logLocation, taskid, isCleanup, currentJobSegmented);
     61           }
     62         } catch (Throwable throwable) {
     63         }
     64       }
     65     });
     66     Thread t = new Thread() {
     67       public void run() {
     68         //every so often wake up and syncLogs so that we can track
     69         //logs of the currently running task
     70         while (true) {
     71           try {
     72             Thread.sleep(5000);
     73             if (taskid != null) {
     74               TaskLog.syncLogs
     75                 (logLocation, taskid, isCleanup, currentJobSegmented);
     76             }
     77           } catch (InterruptedException ie) {
     78           } catch (IOException iee) {
     79             LOG.error("Error in syncLogs: " + iee);
     80             System.exit(-1);
     81           }
     82         }
     83       }
     84     };
     85     t.setName("Thread for syncLogs");
     86     t.setDaemon(true);
     87     t.start();
     88     
     89     String pid = "";
     90     if (!Shell.WINDOWS) {
     91       pid = System.getenv().get("JVM_PID");
     92     }
     93     JvmContext context = new JvmContext(jvmId, pid);
     94     int idleLoopCount = 0;
     95     Task task = null;
     96     
     97     UserGroupInformation childUGI = null;
     98 
     99     final JvmContext jvmContext = context;
    100     try {
    101       while (true) {//不断询问TaskTracker,以获得新任务
    102         taskid = null;
    103         currentJobSegmented = true;
    104         //从TaskTracker通过网络通信得到JvmTask对象 
    105         JvmTask myTask = umbilical.getTask(context);//获取新任务
    106         if (myTask.shouldDie()) {//JVM所属作业不存在或者被杀死
    107           break;
    108         } else {
    109           if (myTask.getTask() == null) {    //暂时没有新任务
    110             taskid = null;
    111             currentJobSegmented = true;
    112             //等待一段时间继续询问TaskTracker
    113             if (++idleLoopCount >= SLEEP_LONGER_COUNT) {
    114               //we sleep for a bigger interval when we don't receive
    115               //tasks for a while
    116               Thread.sleep(1500);
    117             } else {
    118               Thread.sleep(500);
    119             }
    120             continue;
    121           }
    122         }
    123         //有新任务,进行本地化
    124         idleLoopCount = 0;
    125         task = myTask.getTask();
    126         task.setJvmContext(jvmContext);
    127         taskid = task.getTaskID();
    128 
    129         // Create the JobConf and determine if this job gets segmented task logs
    130         final JobConf job = new JobConf(task.getJobFile());
    131         currentJobSegmented = logIsSegmented(job);
    132 
    133         isCleanup = task.isTaskCleanupTask();
    134         // reset the statistics for the task
    135         FileSystem.clearStatistics();
    136         
    137         // Set credentials
    138         job.setCredentials(defaultConf.getCredentials());
    139         //forcefully turn off caching for localfs. All cached FileSystems
    140         //are closed during the JVM shutdown. We do certain
    141         //localfs operations in the shutdown hook, and we don't
    142         //want the localfs to be "closed"
    143         job.setBoolean("fs.file.impl.disable.cache", false);
    144 
    145         // set the jobTokenFile into task
    146         task.setJobTokenSecret(JobTokenSecretManager.
    147             createSecretKey(jt.getPassword()));
    148 
    149         // setup the child's mapred-local-dir. The child is now sandboxed and
    150         // can only see files down and under attemtdir only.
    151         TaskRunner.setupChildMapredLocalDirs(task, job);
    152         
    153         // setup the child's attempt directories
    154         localizeTask(task, job, logLocation);
    155 
    156         //setupWorkDir actually sets up the symlinks for the distributed
    157         //cache. After a task exits we wipe the workdir clean, and hence
    158         //the symlinks have to be rebuilt.
    159         TaskRunner.setupWorkDir(job, new File(cwd));
    160         
    161         //create the index file so that the log files 
    162         //are viewable immediately
    163         TaskLog.syncLogs
    164           (logLocation, taskid, isCleanup, logIsSegmented(job));
    165         
    166         numTasksToExecute = job.getNumTasksToExecutePerJvm();
    167         assert(numTasksToExecute != 0);
    168 
    169         task.setConf(job);
    170 
    171         // Initiate Java VM metrics
    172         initMetrics(prefix, jvmId.toString(), job.getSessionId());
    173 
    174         LOG.debug("Creating remote user to execute task: " + job.get("user.name"));
    175         childUGI = UserGroupInformation.createRemoteUser(job.get("user.name"));
    176         // Add tokens to new user so that it may execute its task correctly.
    177         for(Token<?> token : UserGroupInformation.getCurrentUser().getTokens()) {
    178           childUGI.addToken(token);
    179         }
    180         
    181         // Create a final reference to the task for the doAs block
    182         final Task taskFinal = task;
    183         childUGI.doAs(new PrivilegedExceptionAction<Object>() {
    184           @Override
    185           public Object run() throws Exception {
    186             try {
    187               // use job-specified working directory
    188               FileSystem.get(job).setWorkingDirectory(job.getWorkingDirectory());
    189               taskFinal.run(job, umbilical);        // run the task,启动任务
    190             } finally {
    191               TaskLog.syncLogs
    192                 (logLocation, taskid, isCleanup, logIsSegmented(job));
    193               TaskLogsTruncater trunc = new TaskLogsTruncater(defaultConf);
    194               trunc.truncateLogs(new JVMInfo(
    195                   TaskLog.getAttemptDir(taskFinal.getTaskID(),
    196                     taskFinal.isTaskCleanupTask()), Arrays.asList(taskFinal)));
    197             }
    198 
    199             return null;
    200           }
    201         });
    202         //如果JVM服用次数达到上限数目,则直接退出
    203         if (numTasksToExecute > 0 && ++numTasksExecuted == numTasksToExecute) {
    204           break;
    205         }
    206       }
    207     } catch (FSError e) {
    208       LOG.fatal("FSError from child", e);
    209       umbilical.fsError(taskid, e.getMessage(), jvmContext);
    210     } catch (Exception exception) {
    211       LOG.warn("Error running child", exception);
    212       try {
    213         if (task != null) {
    214           // do cleanup for the task
    215           if(childUGI == null) {
    216             task.taskCleanup(umbilical);
    217           } else {
    218             final Task taskFinal = task;
    219             childUGI.doAs(new PrivilegedExceptionAction<Object>() {
    220               @Override
    221               public Object run() throws Exception {
    222                 taskFinal.taskCleanup(umbilical);
    223                 return null;
    224               }
    225             });
    226           }
    227         }
    228       } catch (Exception e) {
    229         LOG.info("Error cleaning up", e);
    230       }
    231       // Report back any failures, for diagnostic purposes
    232       ByteArrayOutputStream baos = new ByteArrayOutputStream();
    233       exception.printStackTrace(new PrintStream(baos));
    234       if (taskid != null) {
    235         umbilical.reportDiagnosticInfo(taskid, baos.toString(), jvmContext);
    236       }
    237     } catch (Throwable throwable) {
    238       LOG.fatal("Error running child : "
    239                 + StringUtils.stringifyException(throwable));
    240       if (taskid != null) {
    241         Throwable tCause = throwable.getCause();
    242         String cause = tCause == null 
    243                        ? throwable.getMessage() 
    244                        : StringUtils.stringifyException(tCause);
    245         umbilical.fatalError(taskid, cause, jvmContext);
    246       }
    247     } finally {
    248       RPC.stopProxy(umbilical);
    249       shutdownMetrics();
    250       // Shutting down log4j of the child-vm... 
    251       // This assumes that on return from Task.run() 
    252       // there is no more logging done.
    253       LogManager.shutdown();
    254     }
    255   }
    View Code

      上述代码涉及的任务本地化内容有:(1)将任务相关的一些配置参数添加到作业配置JobConf中,有同名则覆盖,形成任务自己的配置JobConf,并采用轮询的方式选择一个目录存放对应任务对象的配置文件,也就是任务配置文件由两部分组成:一个是作业的JobConf一个是任务自己的特定的参数;(2)在目录中建立指向分布式缓存中所有数据文件的链接,以便能够直接使用这些文件。taskFinal.run(job,umbilical)方法会调用相应的MapTask或者ReduceTask的run方法来执行,这以后再分析。

      上述reapJvm方法中的A和C都会启动一个JVM,B使用的是旧的JVM,那是如何执行的呢?答案就在Child的main方法中,其中int jvmIdInt = Integer.parseInt(args[4]);这个Id是一个整数类型,是父进程最初创建该jvmRunner时生成的,他是一个随机数,联合jobID一起标示了一个运行特定job任务的特定进程;然后main中的while循环会通过JvmTask myTask = umbilical.getTask(context)不断的去通过jvmManager.getTaskForJvm(jvmId)获取TaskTracker上关于指定的JVM上的新的task,从而使得复用的JVM中的task执行。

      到目前为止tasktracker端接受Jobtracker的心跳相应信息并对各种任务类型的启动过程有了初步的了解,下一步就是map和reduce的执行过程了。

      参考:1、董西成,《hadoop技术内幕---深入理解MapReduce架构设计与实现原理》

           2、http://guoyunsky.iteye.com/blog/1729457 ,这有关于复用JVM的说明

  • 相关阅读:
    广播发送和接受者
    contentProvider 内容提供者
    自定义控件,开关左右滑动
    手指多点触控事件
    GO语言练习:第一个Go语言工程--排序
    GO语言练习:不定参数函数
    GO语言练习:多返回值函数
    GO语言练习:for基本用法
    GO语言练习:switch基本用法
    GO语言练习:map基本用法
  • 原文地址:https://www.cnblogs.com/lxf20061900/p/3780062.html
Copyright © 2020-2023  润新知