• 2.28 MapReduce在实际应用中常见的优化


    一、优化的点

    • Reduce Task Number
    • Map Task输出压缩
    • Shuffle Phase 参数
    • map、reduce分配的虚拟CPU

    二、Reduce Task Number

    Reduce Task 默认是一个;

    Reduce Task的数目也不是越多越好,实际中需要测试调整,以调整到最优的个数, 如下;

    job.setNumReduceTasks(2);

     

    三、Map Task输出压缩

    上一节已经讲到了;

    四、Shuffle Phase 参数

    具体可参考:mapred-default.xml

    可调的有如下几点:

    mapreduce.task.io.sort.factor:

    <property>
      <name>mapreduce.task.io.sort.factor</name>
      <value>10</value>
      <description>The number of streams to merge at once while sorting
      files.  This determines the number of open file handles.</description>
    </property>

    mapreduce.task.io.sort.mb:

    <property>
      <name>mapreduce.task.io.sort.mb</name>
      <value>100</value>
      <description>The total amount of buffer memory to use while sorting 
      files, in megabytes.  By default, gives each merge stream 1MB, which
      should minimize seeks.</description>
    </property>

    mapreduce.map.sort.spill.percent:

    <property>
      <name>mapreduce.map.sort.spill.percent</name>
      <value>0.80</value>
      <description>The soft limit in the serialization buffer. Once reached, a
      thread will begin to spill the contents to disk in the background. Note that
      collection will not block if this threshold is exceeded while a spill is
      already in progress, so spills may be larger than this threshold when it is
      set to less than .5</description>
    </property>

    五、map、reduce分配的虚拟CPU

    默认都是一个虚拟CPU,实际中也可以调整;

    1、map

    mapreduce.map.cpu.vcores:

    <property>
      <name>mapreduce.map.cpu.vcores</name>
      <value>1</value>
      <description>
          The number of virtual cores required for each map task.
      </description>
    </property>

    2、reduce

    mapreduce.reduce.cpu.vcores:

    <property>
      <name>mapreduce.reduce.cpu.vcores</name>
      <value>1</value>
      <description>
          The number of virtual cores required for each reduce task.
      </description>
    </property>
  • 相关阅读:
    rand()和srand()关系很简单——一看就明白(通过一个可移植的源码)
    opencart配置mail服务
    dedecms mysql连接错误:#1040
    自动获取访客QQ
    apache虚拟目录设置
    在XAMPP上建立多个域名的站点
    QQ互联不能使用的通用解决方法
    织梦系统与discuz论坛整合方法
    DEDECMS整站复制
    DEDECMS模板中dede标签使用php和if判断语句的方法
  • 原文地址:https://www.cnblogs.com/weiyiming007/p/10717143.html
Copyright © 2020-2023  润新知