TaskTracker任务初始化及启动task源码级分析

　　在监听器初始化Job、JobTracker相应TaskTracker心跳、调度器分配task源码级分析中我们分析的Tasktracker发送心跳的机制，这一节我们分析TaskTracker接受JobTracker的响应信息后的工作内容。

　　TaskTracker中的transmitHeartBeat方法通过调用JobTracker.heartbeat方法获得心跳的响应信息HeartbeatResponse，然后返回给TaskTracker.offerService()方法。HeartbeatResponse中包含了以下几个重要的信息：

　　(1)可能包含一个cleanup task或者一个setup task，一个心跳只能包含一个这种类型的task。优先考虑map的cleanup，然后map的setup，然后reduce的cleanup，然后reduce的setup；

　　(2)调度器分配的MapTask(可以有多个，最多有一个非本地的Map(而且一旦有此种类的Map，则会停止分配Map，返回Map列表))或者ReduceTask(一次心跳最多分配1个)；

　　(3)TaskTracker上对应的一些应该被Kill的Task；

　　(4)TaskTracker上对应的一些应该被Kill的Job；

　　(5)TaskTracker上可以保存数据的Task；

　　(6)下一次的心跳间隔；

　　(7)如果JobTracker重启了，还会有需要恢复的Job列表；

　　(8)还有就是只返回重启命令ReinitTrackerAction。如果TaskTracker不是第一次发送心跳链接JobTracker，且JobTracker也没重启，并且没有此TaskTracker上一次心跳信息，说明可能存在严重的问题，因此让此tasktracker重新初始化。

　　TaskTracker.offerService()方法是一个while循环，始终是执行等待心跳时间发送心跳，接受响应信息，分析响应信息中的任务。接受到响应信息HeartbeatResponse之后：

　　一、获取恢复作业列表(如果响应信息中有要恢复的作业)，重置各个Job的状态，然后将所有正在运行的处于SHUFFLE阶段的Reduce Task回滚放入shouldReset中；

　　二、然后调用HeartbeatResponse的getActions()函数获得JobTracker传过来的所有指令即一个TaskTrackerAction数组：TaskTrackerAction[] actions = heartbeatResponse.getActions()。

　　三、如果actions是重新初始化命令则会直接返回State.STALE到run()中，会跳出内层while循环，然后外层while继续执行，调用initialize()方法进行初始化，并再次执行offerService()。

　　四、重置心跳间隔heartbeatInterval = heartbeatResponse.getHeartbeatInterval()

　　五、置justStarted、justInited都为false表示已经启动服务，并已连接JobTracker

　　六、遍历actions数组：

　　(1)如果是LaunchTaskAction，则调用addToTaskQueue((LaunchTaskAction)action)将Action添加到任务队列中,加入TaskLauncher线程的执行队列。addToTaskQueue方法会根据LaunchTaskAction的类型将这个action加入mapLauncher或者reduceLauncher，这两个launcher都是TaskLauncher extends Thread的对象，这两个线程对象都是在initialize()时初始化，会通过addToTaskQueue(action)方法将action加入 List<TaskInProgress> tasksToLaunch列表，注意这个TaskInProgress是TaskTracker.TaskInProgress，而非MapRed包中的 TaskInProgress类。TaskLauncher类的run方法会始终监控tasksToLaunch，一旦发现有新的任务，就获取第一个task，并检查是否可以运行此task等待有足够的slot来运行此task，还要判断（canBeLaunched()方法）此task的运行状态必须是UNASSIGNED、FAILED_UNCLEAN、KILLED_UNCLEAN三者之一才可以执行。最终通过startNewTask(tip)方法来执行。

　　(2)如果是CommitTaskAction，就加入commitResponses.add(commitAction.getTaskID())，这类任务指的是处理完数据之后，将最终结果从临时目录转移到最终目录的过程，只有将输出结果直接写到HDFS上的任务才会经历这个过程，只有两类任务：reduce task和map-only类型的map task。不管是map task、Reduce task、setup task、cleanup job task、cleanup task task执行完后都会调用done(umbilical, reporter)该方法会通过层层调用找到commitResponses等待JobTracker的commit命令。

　　(3)其他则直接加入tasksToCleanup.put(action)，包括杀死任务或作业。taskCleanupThread线程会始终监控tasksToCleanup队列，从中take一个TaskTrackerAction action，如果这个action是KillJobAction类型，就调用方法purgeJob((KillJobAction) action)来处理，这个方法会从runningJobs获取对应的RunningJob，如果允许清理文件会将这个job对应的文件都删除，将这个RunningJob对应的所有task清空；如果这个action是KillTaskAction，就调用processKillTaskAction((KillTaskAction) action)来处理：会从tasks中获取对应的TaskInProgress，然后从runningJobs中找到对应的RunningJob，并从RunningJob中的task列表中删除这个task。

。

　　七、markUnresponsiveTasks()，杀死一定时间没没有汇报进度的task

　　八、killOverflowingTasks()，当剩余磁盘空间小于mapred.local.dir.minspacekill(默认为0)时，寻找合适的任务将其杀掉以释放空间

　　九、到这已经做了清理和恢复工作，所以如果acceptNewTasks==false并且此tasktracker处于空闲，就将acceptNewTasks=true，可以接受新的任务了

　　十、checkJettyPort(server.getPort())，官方给的解释是：为了谨慎，因为有些情况获得的jetty端口不一致。检查是如果端口号小于0，shuttingDown = true这样会使得run中的两层循环、offerService()中的while循环都退出，致使main()结束运行，该tasktracker关闭。

　　上面的六中介绍了各种类型的任务，其中map task和reduce task都是通过startNewTask(tip)方法来启动的。这个方法对每个TaskTracker.TaskInProgress都会启动一个单独的线程来执行，这个线程的run方法主要工作是，一旦运行过程出错，异常处理会将这个tip杀死，并清理相对于的一些数据。：　　

 　　　　　　RunningJob rjob = localizeJob(tip);

           tip.getTask().setJobFile(rjob.getLocalizedJobConf().toString());

           // Localization is done. Neither rjob.jobConf nor rjob.ugi can be null

           launchTaskForJob(tip, new JobConf(rjob.getJobConf()), rjob); //执行task

　　(1)localizeJob(tip)方法是确保首先对作业进行本地化，即第一个tip要对作业进行本地化，后续的tip只对任务本地化。会调用initializeJob(t, rjob, ttAddr)方法对作业进行本地化，会从HDFS下载JobToken和job.xml到本地，然后通过TaskController.initializeJob方法完成剩余的工作，默认是DefaultTaskController，这个initializeJob方法会在本地创建一些目录，并下载job.jar到本地，创建job-acls.xml保存作业访问控制权限等信息。在这个方法中除了作业初始化其他的任务初始化基本没做什么工作。

　　(2)launchTaskForJob(tip, new JobConf(rjob.getJobConf()), rjob)方法来执行，会调用TaskTracker.TaskInProgress的launchTask()函数启动Task，如果这个task的状态是UNASSIGNED、FAILED_UNCLEAN、KILLED_UNCLEAN三者之一，就调用方法对localizeTask(task)对task做一些配置信息，然后创建一个TaskRunner，如果是map类型的任务会创建MapTaskRunner，如果是reduce类型的任务会创建ReduceTaskRunner，但任务的启动最终均是其父类TaskRunner.run()方法完成。启动TaskRunner。TaskRunner是一个线程类，其run()方法代码如下：　　

   @Override

   public final void run() {

     String errorInfo = "Child Error";

     try {

       //before preparing the job localize

       //all the archives

       TaskAttemptID taskid = t.getTaskID();

       final LocalDirAllocator lDirAlloc = new LocalDirAllocator("mapred.local.dir");

       //simply get the location of the workDir and pass it to the child. The

       //child will do the actual dir creation

       final File workDir =

       new File(new Path(localdirs[rand.nextInt(localdirs.length)],

           TaskTracker.getTaskWorkDir(t.getUser(), taskid.getJobID().toString(),

           taskid.toString(),

           t.isTaskCleanupTask())).toString());

       String user = tip.getUGI().getUserName();

       // Set up the child task's configuration. After this call, no localization

       // of files should happen in the TaskTracker's process space. Any changes to

       // the conf object after this will NOT be reflected to the child.

       // setupChildTaskConfiguration(lDirAlloc);

       if (!prepare()) {

         return;

       }

       // Accumulates class paths for child.

       List<String> classPaths = getClassPaths(conf, workDir,

                                               taskDistributedCacheManager);

       long logSize = TaskLog.getTaskLogLength(conf);

       //  Build exec child JVM args.

       Vector<String> vargs = getVMArgs(taskid, workDir, classPaths, logSize);

       tracker.addToMemoryManager(t.getTaskID(), t.isMapTask(), conf);

       // set memory limit using ulimit if feasible and necessary ...

       String setup = getVMSetupCmd();

       // Set up the redirection of the task's stdout and stderr streams

       File[] logFiles = prepareLogFiles(taskid, t.isTaskCleanupTask());

       File stdout = logFiles[0];

       File stderr = logFiles[1];

       tracker.getTaskTrackerInstrumentation().reportTaskLaunch(taskid, stdout,

                  stderr);

       Map<String, String> env = new HashMap<String, String>();

       errorInfo = getVMEnvironment(errorInfo, user, workDir, conf, env, taskid,

                                    logSize);

       // flatten the env as a set of export commands

       List <String> setupCmds = new ArrayList<String>();

       for(Entry<String, String> entry : env.entrySet()) {

         StringBuffer sb = new StringBuffer();

         sb.append("export ");

         sb.append(entry.getKey());

         sb.append("=\"");

         sb.append(entry.getValue());

         sb.append("\"");

         setupCmds.add(sb.toString());

       }

       setupCmds.add(setup);

       launchJvmAndWait(setupCmds, vargs, stdout, stderr, logSize, workDir);

       tracker.getTaskTrackerInstrumentation().reportTaskEnd(t.getTaskID());

       if (exitCodeSet) {

         if (!killed && exitCode != 0) {

           if (exitCode == 65) {

             tracker.getTaskTrackerInstrumentation().taskFailedPing(t.getTaskID());

           }

           throw new IOException("Task process exit with nonzero status of " +

               exitCode + ".");

         }

       }

     } catch (FSError e) {

       LOG.fatal("FSError", e);

       try {

         tracker.fsErrorInternal(t.getTaskID(), e.getMessage());

       } catch (IOException ie) {

         LOG.fatal(t.getTaskID()+" reporting FSError", ie);

       }

     } catch (Throwable throwable) {

       LOG.warn(t.getTaskID() + " : " + errorInfo, throwable);

       Throwable causeThrowable = new Throwable(errorInfo, throwable);

       ByteArrayOutputStream baos = new ByteArrayOutputStream();

       causeThrowable.printStackTrace(new PrintStream(baos));

       try {

         tracker.reportDiagnosticInfoInternal(t.getTaskID(), baos.toString());

       } catch (IOException e) {

         LOG.warn(t.getTaskID()+" Reporting Diagnostics", e);

       }

     } finally {

       // It is safe to call TaskTracker.TaskInProgress.reportTaskFinished with

       // *false* since the task has either

       // a) SUCCEEDED - which means commit has been done

       // b) FAILED - which means we do not need to commit

       tip.reportTaskFinished(false);

     }

   }

　　run方法主要是做一些准备工作，包括通过getVMArgs方法获取JVM的参数信息、通过getVMEnvironment获得环境变量信息然后组合成启动命令setupCmds；最终通过launchJvmAndWait(setupCmds, vargs, stdout, stderr, logSize, workDir)交给jvmManager对象启动一个JVM。

　　JvmManager负责管理TaskTracker上所有正在使用的JVM，包括启动、停止、杀死JVM等。一般来说map和Reduce占用的资源量不同，所以JvmManager使用mapJvmManager和reduceJvmManager来分别管理两种类型的task对应的JVM。且要满足：

　　A、两种task对应的slot的数量均不能超过此TaskTracker中各自最大slot数量；

　　B、每个JVM只能同时运行一个任务；

　　C、JVM可复用，且有次数限制和仅限同一个作业的同类型任务使用。

　　launchJvmAndWait方法会调用jvmManager.launchJvm(this, jvmManager.constructJvmEnv(setup, vargs, stdout,stderr, logSize, workDir, conf))来启动task。这个方法会根据task的类型，选择mapJvmManager或者reduceJvmManager的reapJvm(t, env)来启动JVM，两种类型(mapJvmManager、reduceJvmManager)使用的是同一个方法。该方法代码如下：　　

     private synchronized void reapJvm(

         TaskRunner t, JvmEnv env) throws IOException, InterruptedException {

       if (t.getTaskInProgress().wasKilled()) {

         //the task was killed in-flight

         //no need to do the rest of the operations

         return;

       }

       boolean spawnNewJvm = false;

       JobID jobId = t.getTask().getJobID();

       //Check whether there is a free slot to start a new JVM.

       //,or, Kill a (idle) JVM and launch a new one

       //When this method is called, we *must*

       // (1) spawn a new JVM (if we are below the max)

       // (2) find an idle JVM (that belongs to the same job), or,

       // (3) kill an idle JVM (from a different job)

       // (the order of return is in the order above)

       int numJvmsSpawned = jvmIdToRunner.size();

       JvmRunner runnerToKill = null;

       if (numJvmsSpawned >= maxJvms) {

         //go through the list of JVMs for all jobs.

         Iterator<Map.Entry<JVMId, JvmRunner>> jvmIter =

           jvmIdToRunner.entrySet().iterator();

         while (jvmIter.hasNext()) {

           JvmRunner jvmRunner = jvmIter.next().getValue();

           JobID jId = jvmRunner.jvmId.getJobId();

           //look for a free JVM for this job; if one exists then just break

           if (jId.equals(jobId) && !jvmRunner.isBusy() && !jvmRunner.ranAll()){

             setRunningTaskForJvm(jvmRunner.jvmId, t); //reserve the JVM

             LOG.info("No new JVM spawned for jobId/taskid: " +

                      jobId+"/"+t.getTask().getTaskID() +

                      ". Attempting to reuse: " + jvmRunner.jvmId);

             return;

           }

           //Cases when a JVM is killed:

           // (1) the JVM under consideration belongs to the same job

           //     (passed in the argument). In this case, kill only when

           //     the JVM ran all the tasks it was scheduled to run (in terms

           //     of count).

           // (2) the JVM under consideration belongs to a different job and is

           //     currently not busy

           //But in both the above cases, we see if we can assign the current

           //task to an idle JVM (hence we continue the loop even on a match)

           if ((jId.equals(jobId) && jvmRunner.ranAll()) ||

               (!jId.equals(jobId) && !jvmRunner.isBusy())) {

             runnerToKill = jvmRunner;

             spawnNewJvm = true;

           }

         }

       } else {

         spawnNewJvm = true;

       }

       if (spawnNewJvm) {

         if (runnerToKill != null) {

           LOG.info("Killing JVM: " + runnerToKill.jvmId);

           killJvmRunner(runnerToKill);

         }

         //888888888888888888888**********************************

         spawnNewJvm(jobId, env, t);  //在此运行Child

         return;

       }

       //*MUST* never reach this

       LOG.fatal("Inconsistent state!!! " +

               "JVM Manager reached an unstable state " +

             "while reaping a JVM for task: " + t.getTask().getTaskID()+

             " " + getDetails() + ". Aborting. ");

       System.exit(-1);

     }

　　A、先检查已启动的JVM数是否低于对应类型(map、reduce)的slot的上限，低于的话直接启动一个JVM，否则执行B；

　　B、检查所有已启动的JVM(jvmIdToRunner)找到满足：(1)当前状态为空对应jvmRunner.isBusy()；(2)复用次数未超过上限对应jvmRunner.ranAll()；(3)与将要启动的任务同属一个作业对应jId.equals(jobId)；这样的JVM，则可直接复用不需启动新的JVM，保留此JVM对应setRunningTaskForJvm(jvmRunner.jvmId, t)。

　　C、查找当前TaskTracker所有已启动的JVM，满足一下之一：(1)复用次数已达上限且与新任务同属一个作业；(2)当前处于空闲状态但与新任务不属于一个作业；就直接杀死该JVM对应方法killJvmRunner(runnerToKill)，并启动一个新的JVM

　　通过spawnNewJvm(jobId, env, t)创建一个JvmRunner线程，将其加入jvmIdToRunner，调用setRunningTaskForJvm修改一些数据结构，启动这个JvmRunner。其runn方法直接调用runChild(env)，代码如下：　　

  public void runChild(JvmEnv env) throws IOException, InterruptedException{

         int exitCode = 0;

         try {

           env.vargs.add(Integer.toString(jvmId.getId()));

           TaskRunner runner = jvmToRunningTask.get(jvmId);

           if (runner != null) {

             Task task = runner.getTask();

             //Launch the task controller to run task JVM

             String user = task.getUser();

             TaskAttemptID taskAttemptId = task.getTaskID();

             String taskAttemptIdStr = task.isTaskCleanupTask() ?

                 (taskAttemptId.toString() + TaskTracker.TASK_CLEANUP_SUFFIX) :

                   taskAttemptId.toString();

                 exitCode = tracker.getTaskController().launchTask(user,//DefaultTaskController++++++++++++++执行任务

                     jvmId.jobId.toString(), taskAttemptIdStr, env.setup,

                     env.vargs, env.workDir, env.stdout.toString(),

                     env.stderr.toString());

           }

         } catch (IOException ioe) {

           // do nothing

           // error and output are appropriately redirected

         } finally { // handle the exit code

           // although the process has exited before we get here,

           // make sure the entire process group has also been killed.

           kill();

           updateOnJvmExit(jvmId, exitCode);

           LOG.info("JVM : " + jvmId + " exited with exit code " + exitCode

               + ". Number of tasks it ran: " + numTasksRan);

           deleteWorkDir(tracker, firstTask);

         }

       }

　　最重要的是tracker.getTaskController().launchTask，该方法代码如下(默认是DefaultTaskController)：　　

 /**

    * Create all of the directories for the task and launches the child jvm.

    * @param user the user name

    * @param attemptId the attempt id

    * @throws IOException

    */

   @Override

   public int launchTask(String user,

                                   String jobId,

                                   String attemptId,

                                   List<String> setup,

                                   List<String> jvmArguments,

                                   File currentWorkDirectory,

                                   String stdout,

                                   String stderr) throws IOException {

     ShellCommandExecutor shExec = null;

     try {

       FileSystem localFs = FileSystem.getLocal(getConf());

       //create the attempt dirs

       new Localizer(localFs,

           getConf().getStrings(JobConf.MAPRED_LOCAL_DIR_PROPERTY)).

           initializeAttemptDirs(user, jobId, attemptId);

       // create the working-directory of the task

       if (!currentWorkDirectory.mkdir()) {

         throw new IOException("Mkdirs failed to create "

                     + currentWorkDirectory.toString());

       }

       //mkdir the loglocation

       String logLocation = TaskLog.getAttemptDir(jobId, attemptId).toString();

       if (!localFs.mkdirs(new Path(logLocation))) {

         throw new IOException("Mkdirs failed to create "

                    + logLocation);

       }

       //read the configuration for the job

       FileSystem rawFs = FileSystem.getLocal(getConf()).getRaw();

       long logSize = 0; //TODO MAPREDUCE-1100

       // get the JVM command line.

       String cmdLine =

         TaskLog.buildCommandLine(setup, jvmArguments,

             new File(stdout), new File(stderr), logSize, true);

       // write the command to a file in the

       // task specific cache directory

       // TODO copy to user dir

       Path p = new Path(allocator.getLocalPathForWrite(

           TaskTracker.getPrivateDirTaskScriptLocation(user, jobId, attemptId),

           getConf()), COMMAND_FILE);        //"taskjvm.sh"文件

       String commandFile = writeCommand(cmdLine, rawFs, p);//将命令写入"taskjvm.sh"，p是文件名

       rawFs.setPermission(p, TaskController.TASK_LAUNCH_SCRIPT_PERMISSION);

       shExec = new ShellCommandExecutor(new String[]{

           "bash", "-c", commandFile},

           currentWorkDirectory);

       shExec.execute();

     } catch (Exception e) {

       if (shExec == null) {

         return -1;

       }

       int exitCode = shExec.getExitCode();

       LOG.warn("Exit code from task is : " + exitCode);

       LOG.info("Output from DefaultTaskController's launchTask follows:");

       logOutput(shExec.getOutput());

       return exitCode;

     }

     return 0;

   }

　　launchTask方法首先会在磁盘上创建任务工作目录，接着讲任务启动命令写入shell脚本”taskjvm.sh“中，并构造一个ShellCommandExecutor对象调用其execute()方法通过ProcessBuilder执行命令"bash -c taskjvm.sh"，这样就启动了一个JVM来执行task。脚本最终会启动一个org.apache.hadoop.mapred.Child类来运行任务的。其main方法内容较长代码如下：

 //真正的map task和reduce task都是在Child进程中运行的，Child的main函数的主要逻辑如下

   public static void main(String[] args) throws Throwable {

     LOG.debug("Child starting");

 //创建RPC Client，启动日志同步线程

     final JobConf defaultConf = new JobConf();

     String host = args[0];

     int port = Integer.parseInt(args[1]);

     final InetSocketAddress address = NetUtils.makeSocketAddr(host, port);

     final TaskAttemptID firstTaskid = TaskAttemptID.forName(args[2]);

     final String logLocation = args[3];

     final int SLEEP_LONGER_COUNT = 5;

     int jvmIdInt = Integer.parseInt(args[4]);

     JVMId jvmId = new JVMId(firstTaskid.getJobID(),firstTaskid.isMap(),jvmIdInt);

     String prefix = firstTaskid.isMap() ? "MapTask" : "ReduceTask";

     cwd = System.getenv().get(TaskRunner.HADOOP_WORK_DIR);

     if (cwd == null) {

       throw new IOException("Environment variable " +

                              TaskRunner.HADOOP_WORK_DIR + " is not set");

     }

     // file name is passed thru env

     String jobTokenFile =

       System.getenv().get(UserGroupInformation.HADOOP_TOKEN_FILE_LOCATION);

     Credentials credentials =

       TokenCache.loadTokens(jobTokenFile, defaultConf);

     LOG.debug("loading token. # keys =" +credentials.numberOfSecretKeys() +

         "; from file=" + jobTokenFile);

     Token<JobTokenIdentifier> jt = TokenCache.getJobToken(credentials);

     SecurityUtil.setTokenService(jt, address);

     UserGroupInformation current = UserGroupInformation.getCurrentUser();

     current.addToken(jt);

     UserGroupInformation taskOwner

      = UserGroupInformation.createRemoteUser(firstTaskid.getJobID().toString());

     taskOwner.addToken(jt);

     // Set the credentials

     defaultConf.setCredentials(credentials);

     final TaskUmbilicalProtocol umbilical =

       taskOwner.doAs(new PrivilegedExceptionAction<TaskUmbilicalProtocol>() {

         @Override

         public TaskUmbilicalProtocol run() throws Exception {

           return (TaskUmbilicalProtocol)RPC.getProxy(TaskUmbilicalProtocol.class,

               TaskUmbilicalProtocol.versionID,

               address,

               defaultConf);

         }

     });

     int numTasksToExecute = -1; //-1 signifies "no limit"

     int numTasksExecuted = 0;

     Runtime.getRuntime().addShutdownHook(new Thread() {

       public void run() {

         try {

           if (taskid != null) {

             TaskLog.syncLogs

               (logLocation, taskid, isCleanup, currentJobSegmented);

           }

         } catch (Throwable throwable) {

         }

       }

     });

     Thread t = new Thread() {

       public void run() {

         //every so often wake up and syncLogs so that we can track

         //logs of the currently running task

         while (true) {

           try {

             Thread.sleep(5000);

             if (taskid != null) {

               TaskLog.syncLogs

                 (logLocation, taskid, isCleanup, currentJobSegmented);

             }

           } catch (InterruptedException ie) {

           } catch (IOException iee) {

             LOG.error("Error in syncLogs: " + iee);

             System.exit(-1);

           }

         }

       }

     };

     t.setName("Thread for syncLogs");

     t.setDaemon(true);

     t.start();

     String pid = "";

     if (!Shell.WINDOWS) {

       pid = System.getenv().get("JVM_PID");

     }

     JvmContext context = new JvmContext(jvmId, pid);

     int idleLoopCount = 0;

     Task task = null;

     UserGroupInformation childUGI = null;

     final JvmContext jvmContext = context;

     try {

       while (true) {//不断询问TaskTracker，以获得新任务

         taskid = null;

         currentJobSegmented = true;

         //从TaskTracker通过网络通信得到JvmTask对象

         JvmTask myTask = umbilical.getTask(context);//获取新任务

         if (myTask.shouldDie()) {//JVM所属作业不存在或者被杀死

           break;

         } else {

           if (myTask.getTask() == null) {    //暂时没有新任务

             taskid = null;

             currentJobSegmented = true;

             //等待一段时间继续询问TaskTracker

             if (++idleLoopCount >= SLEEP_LONGER_COUNT) {

               //we sleep for a bigger interval when we don't receive

               //tasks for a while

               Thread.sleep(1500);

             } else {

               Thread.sleep(500);

             }

             continue;

           }

         }

         //有新任务，进行本地化

         idleLoopCount = 0;

         task = myTask.getTask();

         task.setJvmContext(jvmContext);

         taskid = task.getTaskID();

         // Create the JobConf and determine if this job gets segmented task logs

         final JobConf job = new JobConf(task.getJobFile());

         currentJobSegmented = logIsSegmented(job);

         isCleanup = task.isTaskCleanupTask();

         // reset the statistics for the task

         FileSystem.clearStatistics();

         // Set credentials

         job.setCredentials(defaultConf.getCredentials());

         //forcefully turn off caching for localfs. All cached FileSystems

         //are closed during the JVM shutdown. We do certain

         //localfs operations in the shutdown hook, and we don't

         //want the localfs to be "closed"

         job.setBoolean("fs.file.impl.disable.cache", false);

         // set the jobTokenFile into task

         task.setJobTokenSecret(JobTokenSecretManager.

             createSecretKey(jt.getPassword()));

         // setup the child's mapred-local-dir. The child is now sandboxed and

         // can only see files down and under attemtdir only.

         TaskRunner.setupChildMapredLocalDirs(task, job);

         // setup the child's attempt directories

         localizeTask(task, job, logLocation);

         //setupWorkDir actually sets up the symlinks for the distributed

         //cache. After a task exits we wipe the workdir clean, and hence

         //the symlinks have to be rebuilt.

         TaskRunner.setupWorkDir(job, new File(cwd));

         //create the index file so that the log files

         //are viewable immediately

         TaskLog.syncLogs

           (logLocation, taskid, isCleanup, logIsSegmented(job));

         numTasksToExecute = job.getNumTasksToExecutePerJvm();

         assert(numTasksToExecute != 0);

         task.setConf(job);

         // Initiate Java VM metrics

         initMetrics(prefix, jvmId.toString(), job.getSessionId());

         LOG.debug("Creating remote user to execute task: " + job.get("user.name"));

         childUGI = UserGroupInformation.createRemoteUser(job.get("user.name"));

         // Add tokens to new user so that it may execute its task correctly.

         for(Token<?> token : UserGroupInformation.getCurrentUser().getTokens()) {

           childUGI.addToken(token);

         }

         // Create a final reference to the task for the doAs block

         final Task taskFinal = task;

         childUGI.doAs(new PrivilegedExceptionAction<Object>() {

           @Override

           public Object run() throws Exception {

             try {

               // use job-specified working directory

               FileSystem.get(job).setWorkingDirectory(job.getWorkingDirectory());

               taskFinal.run(job, umbilical);        // run the task，启动任务

             } finally {

               TaskLog.syncLogs

                 (logLocation, taskid, isCleanup, logIsSegmented(job));

               TaskLogsTruncater trunc = new TaskLogsTruncater(defaultConf);

               trunc.truncateLogs(new JVMInfo(

                   TaskLog.getAttemptDir(taskFinal.getTaskID(),

                     taskFinal.isTaskCleanupTask()), Arrays.asList(taskFinal)));

             }

             return null;

           }

         });

         //如果JVM服用次数达到上限数目，则直接退出

         if (numTasksToExecute > 0 && ++numTasksExecuted == numTasksToExecute) {

           break;

         }

       }

     } catch (FSError e) {

       LOG.fatal("FSError from child", e);

       umbilical.fsError(taskid, e.getMessage(), jvmContext);

     } catch (Exception exception) {

       LOG.warn("Error running child", exception);

       try {

         if (task != null) {

           // do cleanup for the task

           if(childUGI == null) {

             task.taskCleanup(umbilical);

           } else {

             final Task taskFinal = task;

             childUGI.doAs(new PrivilegedExceptionAction<Object>() {

               @Override

               public Object run() throws Exception {

                 taskFinal.taskCleanup(umbilical);

                 return null;

               }

             });

           }

         }

       } catch (Exception e) {

         LOG.info("Error cleaning up", e);

       }

       // Report back any failures, for diagnostic purposes

       ByteArrayOutputStream baos = new ByteArrayOutputStream();

       exception.printStackTrace(new PrintStream(baos));

       if (taskid != null) {

         umbilical.reportDiagnosticInfo(taskid, baos.toString(), jvmContext);

       }

     } catch (Throwable throwable) {

       LOG.fatal("Error running child : "

                 + StringUtils.stringifyException(throwable));

       if (taskid != null) {

         Throwable tCause = throwable.getCause();

         String cause = tCause == null

                        ? throwable.getMessage()

                        : StringUtils.stringifyException(tCause);

         umbilical.fatalError(taskid, cause, jvmContext);

       }

     } finally {

       RPC.stopProxy(umbilical);

       shutdownMetrics();

       // Shutting down log4j of the child-vm...

       // This assumes that on return from Task.run()

       // there is no more logging done.

       LogManager.shutdown();

     }

   }

　　上述代码涉及的任务本地化内容有：(1)将任务相关的一些配置参数添加到作业配置JobConf中，有同名则覆盖，形成任务自己的配置JobConf，并采用轮询的方式选择一个目录存放对应任务对象的配置文件，也就是任务配置文件由两部分组成：一个是作业的JobConf一个是任务自己的特定的参数；(2)在目录中建立指向分布式缓存中所有数据文件的链接，以便能够直接使用这些文件。taskFinal.run(job,umbilical)方法会调用相应的MapTask或者ReduceTask的run方法来执行，这以后再分析。

　　上述reapJvm方法中的A和C都会启动一个JVM，B使用的是旧的JVM，那是如何执行的呢？答案就在Child的main方法中，其中int jvmIdInt = Integer.parseInt(args[4]);这个Id是一个整数类型，是父进程最初创建该jvmRunner时生成的，他是一个随机数，联合jobID一起标示了一个运行特定job任务的特定进程；然后main中的while循环会通过JvmTask myTask = umbilical.getTask(context)不断的去通过jvmManager.getTaskForJvm(jvmId)获取TaskTracker上关于指定的JVM上的新的task，从而使得复用的JVM中的task执行。

　　到目前为止tasktracker端接受Jobtracker的心跳相应信息并对各种任务类型的启动过程有了初步的了解，下一步就是map和reduce的执行过程了。

　　参考：1、董西成，《hadoop技术内幕---深入理解MapReduce架构设计与实现原理》

　　　　 2、http://guoyunsky.iteye.com/blog/1729457 ，这有关于复用JVM的说明

TaskTracker任务初始化及启动task源码级分析

TaskTracker任务初始化及启动task源码级分析的相关教程结束。

相关推荐

Jenkins服务开机自启动

Spring MVC工作原理及源码解析（三） HandlerMapping和HandlerAdapter实现原理及源码解析

Spring-Session实现Session共享实现原理以及源码解析

机器学习实战（Machine Learning in Action）学习笔记————03.决策树原理、源码解析及测试

【Spring】Spring IOC原理及源码解析之scope=request、session

Spring核心框架 - AOP的原理及源码解析

RocketMQ原理及源码解析

ORB原理与源码解析