一、OutputFormat
OutputFormat描述的是MapReduce的输出格式,它主要的任务是:
1.验证job输出格式的有效性,如:检查输出的目录是否存在。
2.通过实现RecordWriter,将输出的结果写到文件系统的文件中。
OutputFormat的主要是由三个抽象方法组成,下面根据源代码介绍每个方法的功能,源代码详解如下:
1 public abstract class OutputFormat<K, V> { 2 3 /** 4 * Get the {@link RecordWriter} for the given task. 5 * 得到给定任务的K-V对,即RecordWriter。 6 * @param context the information about the current task. 7 * @return a {@link RecordWriter} to write the output for the job. 8 * @throws IOException 9 */ 10 public abstract RecordWriter<K, V> getRecordWriter(TaskAttemptContext context) 11 throws IOException, InterruptedException; 12 13 /** 14 * Check for validity of the output-specification for the job. 15 * 为job检查输出格式的有效性。 16 * <p>This is to validate the output specification for the job when it is 17 * a job is submitted. Typically checks that it does not already exist, 18 * throwing an exception when it already exists, so that output is not 19 * overwritten.</p> 20 * 这里,当job被提交时验证输出格式。实际上检查输出目录是否已经存在,当存在时抛出exception。 21 * 以至于原来的输出不会被覆盖。 22 * @param context information about the job 23 * @throws IOException when output should not be attempted 24 */ 25 public abstract void checkOutputSpecs(JobContext context) throws IOException, InterruptedException; 26 27 /** 28 * Get the output committer for this output format. This is responsible 29 * for ensuring the output is committed correctly. 30 * 获得一个OutPutCommitter对象。这是用来确保输出被正确的提交。 31 * @param context the task context 32 * @return an output committer 33 * @throws IOException 34 * @throws InterruptedException 35 */ 36 public abstract OutputCommitter getOutputCommitter(TaskAttemptContext context) 37 throws IOException, InterruptedException; 38 }