• Hadoop源码分析36 Child的Reduce分析


    分析任务reduce_1

    args =[127.0.0.1, 42767, attempt_201405060431_0003_r_000001_0,/opt/hadoop-1.0.0/logs/userlogs/job_201405060431_0003/attempt_201405060431_0003_r_000001_0,1844231936]


    myTask = JvmTask{ shouldDie=false, t=ReduceTask{jobFile="/tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405060431_0003/job.xml", taskId=attempt_201405020918_0003_r_000001_0,taskProgress=reduce,taskStatus=ReduceTaskStatus{UNASSIGNED}} }


    job=JobConf{Configuration:core-default.xml, core-site.xml, mapred-default.xml,mapred-site.xml, hdfs-default.xml, hdfs-site.xml,/tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405060431_0003/job.xml}


    outputFormat = TextOutputFormat@51386c70


    committer = FileOutputCommitter{outputFileSystem=DFSClient,

    outputPath=/user/admin/out/123 ,

    workPath=hdfs://server1:9000/user/admin/out/123/_temporary/_attempt_201405060431_0003_r_000001_0}


    ReduceCopierworkDir=/tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405020918_0003/_attempt_201405060431_0003_r_000001_0


    ReduceCopierjar=/tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405020918_0003/jars/job.jar


    ReduceCopierjobCacheDir=/tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405020918_0003/jars


    ReduceCopiernumCopiers  = 5


    ReduceCopiermaxInFlight= 20


    ReduceCopiercombinerRunner=CombinerRunner{ job={Configuration: core-default.xml,core-site.xml, mapred-default.xml, mapred-site.xml,hdfs-default.xml, hdfs-site.xml,/tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405020918_0003/job.xml,committer=null, keyClass=org.apache.hadoop.io.Text,valueClass=org.apache.hadoop.io.IntWritable},


    ReduceCopiercombineCollector = Task$CombineOutputCollector@72447399{progressBar=10000,}


    ReduceCopierioSortFactor =10


    ReduceCopiermaxInMemOutputs =1000


    ReduceCopiermaxInMemOutputs =0.66


    ReduceCopiermaxRedPer =0.0


    ReduceCopierramManager=ReduceTask$ReduceCopier$ShuffleRamManager@46e9d255 {maxSize=141937872, maxSingleShuffleLimit=35484468}


    ReduceCopier的线程copiers={

    [Thread[MapOutputCopierattempt_201405020918_0003_r_000000_1.0,5,main],

    Thread[MapOutputCopierattempt_201405020918_0003_r_000000_1.1,5,main],

    Thread[MapOutputCopierattempt_201405020918_0003_r_000000_1.2,5,main],

    Thread[MapOutputCopierattempt_201405020918_0003_r_000000_1.3,5,main],

    Thread[MapOutputCopierattempt_201405020918_0003_r_000000_1.4,5,main]]}


    ReduceCopier的线程localFSMergerThread=Thread[Threadfor merging on-disk files,5,main]


    ReduceCopier的线程inMemFSMergeThread=Thread[Threadfor merging in memory files,5,main]


    ReduceCopier的线程getMapEventsThread=Thread[Threadfor polling Map Completion Events,5,main]


    线程getMapEventsThreadRPC请求:getMapCompletionEvents(JobID=job_201405060431_0003, fromEventId=2, MAX_EVENTS_TO_FETCH= 10000, TaskID=attempt_201405060431_0003_r_000001_0,jvmContext={jvmId=jvm_201405060431_0003_r_1844231936,pid= 10727 })


    线程getMapEventsThreadRPC响应:

    [Task Id : attempt_201405060431_0003_m_000001_0,Status : SUCCEEDED,

    Task Id : attempt_201405060431_0003_m_000000_0,Status : SUCCEEDED]


    放入mapLocations={

    server2=[ReduceTask$ReduceCopier$MapOutputLocation{taskAttemptId=attempt_201405060431_0003_m_000000_0,taskId=task_201405060431_0003_m_000000,taskOutput=http://server2:50060/mapOutput?job=job_201405060431_0003&map=attempt_201405060431_0003_m_000000_0&reduce=0}],

    server3=[ReduceTask$ReduceCopier$MapOutputLocation{taskAttemptId=attempt_201405060431_0003_m_000001_0,taskId=task_201405060431_0003_m_000001,taskOutput=http://server3:50060/mapOutput?job=job_201405060431_0003&map=attempt_201405060431_0003_m_000001_0&reduce=0}]

    }


    混洗server,打乱server的顺序:

    hostList.addAll(mapLocations.keySet());            

    Collections.shuffle(hostList,this.random);


    再一个个放入容器:

    uniqueHosts.add(host);

    scheduledCopies.add(loc);


    主线程唤醒 MapOutputCopier线程: scheduledCopies.notifyAll()


    线程MapOutputCopierHTTP请求:http://server3:50060/mapOutput?job=job_201405060431_0003&map=attempt_201405060431_0003_m_000000_0&reduce=1


    因数据比较少,写到内存中(shuffleData->mapOutput.data->ReduceCopier.mapOutputsFilesInMemory 


    线程MapOutputCopierHTTP请求:http://server2:50060/mapOutput?job=job_201405060431_0003&map=attempt_201405060431_0003_m_000001_0&reduce=1


    因数据比较少,写到内存中(shuffleData->mapOutput.data->ReduceCopier.mapOutputsFilesInMemory 


    更新copyResults={CopyResult{MapOutputLocation=http://server3:50060/mapOutput?job=job_201405060431_0003&map=attempt_201405060431_0003_m_000000_0&reduce=1}CopyResult{MapOutputLocation=http://server2:50060/mapOutput?job=job_201405060431_0003&map=attempt_201405060431_0003_m_000001_0&reduce=1}}


    Merge内容:reduceCopier.createKVIterator(job,rfs, reporter)


    合并两部分到: /tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405060431_0003/attempt_201405060431_0003_r_000001_0/output/map_0.out

    这里用的也是优先级队列(小根堆Heap)


    添加到mapOutputFilesOnDisk = {/tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405060431_0003/attempt_201405060431_0003_r_000001_0/output/map_0.out}


    因为只有一个文件,故而不需要继续Merge


    运行reducereducer.run(reducerContext)


    RPC请求:commitPending(taskId=attempt_201405060431_0003_r_000001_0,taskStatus=COMMIT_PENDING,jvmContext);

    RPC响应:无

     


    RPC请求:canCommit(taskId=attempt_201405060431_0003_r_000001_0jvmContext);

    RPC响应:true


    提交任务:

    复制hdfs://server1:9000/user/admin/out/123/_temporary/_attempt_201405060431_0003_r_000001_0/part-r-00001

    : /user/admin/out/123/part-r-00001


     

    最后的CleanUp Task

    [127.0.0.1, 42767,attempt_201405060431_0003_m_000002_0,/opt/hadoop-1.0.0/logs/userlogs/job_201405060431_0003/attempt_201405060431_0003_m_000002_0,47579841]


    JvmTask={ shouldDie=falset=MapTask {jobCleanup=true,jobFile="/tmp/hadoop-admin/mapred/local/taskTracker/admin/jobcache/job_201405060431_0003/job.xml}}


    删除文件:/user/admin/out/123/_temporary


    创建文件:/user/admin/out/123/_SUCCESS


    删除文件:hdfs://server1:9000/tmp/hadoop-admin/mapred/staging/admin/.staging/job_201405060431_0003

     

     

     

     

  • 相关阅读:
    Mybatis的XML中数字不为空的判断
    初步使用VUE
    Vue中实现菜单下拉、收起的动画效果
    Docker For Windows时间不对的问题
    freemarker使用自定义的模板加载器通过redis加载模板
    .net core 打印请求和响应的内容
    codedecision P1113 同颜色询问 题解 线段树动态开点
    洛谷P2486 [SDOI2011]染色 题解 树链剖分+线段树
    洛谷P3150 pb的游戏(1)题解 博弈论入门
    codedecision P1112 区间连续段 题解 线段树
  • 原文地址:https://www.cnblogs.com/leeeee/p/7276475.html
Copyright © 2020-2023  润新知