一、优化的点
- Reduce Task Number
- Map Task输出压缩
- Shuffle Phase 参数
- map、reduce分配的虚拟CPU
二、Reduce Task Number
Reduce Task 默认是一个;
Reduce Task的数目也不是越多越好,实际中需要测试调整,以调整到最优的个数, 如下;
job.setNumReduceTasks(2);
三、Map Task输出压缩
上一节已经讲到了;
四、Shuffle Phase 参数
具体可参考:mapred-default.xml
可调的有如下几点:
mapreduce.task.io.sort.factor:
<property> <name>mapreduce.task.io.sort.factor</name> <value>10</value> <description>The number of streams to merge at once while sorting files. This determines the number of open file handles.</description> </property>
mapreduce.task.io.sort.mb:
<property> <name>mapreduce.task.io.sort.mb</name> <value>100</value> <description>The total amount of buffer memory to use while sorting files, in megabytes. By default, gives each merge stream 1MB, which should minimize seeks.</description> </property>
mapreduce.map.sort.spill.percent:
<property> <name>mapreduce.map.sort.spill.percent</name> <value>0.80</value> <description>The soft limit in the serialization buffer. Once reached, a thread will begin to spill the contents to disk in the background. Note that collection will not block if this threshold is exceeded while a spill is already in progress, so spills may be larger than this threshold when it is set to less than .5</description> </property>
五、map、reduce分配的虚拟CPU
默认都是一个虚拟CPU,实际中也可以调整;
1、map
<property> <name>mapreduce.map.cpu.vcores</name> <value>1</value> <description> The number of virtual cores required for each map task. </description> </property>
2、reduce
<property> <name>mapreduce.reduce.cpu.vcores</name> <value>1</value> <description> The number of virtual cores required for each reduce task. </description> </property>