• hive 调优(三)tez优化


    我们采用亚马逊emr构建的集群,用hive查询的时候报错,FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask,查看了下面的参数,挺有帮助的 我是设置了这个参数set hive.tez.auto.reducer.parallelism=true;

    Tez内存优化

    1、AM、Container大小设置

    tez.am.resource.memory.mb

    参数说明:Set tez.am.resource.memory.mb tobe the same as yarn.scheduler.minimum-allocation-mb the YARNminimum container size.

    hive.tez.container.size

    参数说明:Set hive.tez.container.size to be the same as or a small multiple(1 or 2 times that) of YARN container size yarn.scheduler.minimum-allocation-mb but NEVER more than yarn.scheduler.maximum-allocation-mb.

    2、AM、Container JVM参数设置

    tez.am.launch.cmd-opts 

    默认值:80%*tez.am.resource.memory.mb

    参数说明:一般不需要调整

    hive.tez.java.ops

        默认值:80%*hive.tez.container.size

           参数说明:Hortonworks建议“–server –Djava.net.preferIPv4Stack=true–XX:NewRatio=8 –XX:+UseNUMA –XX:UseG1G”

    tez.container.max.java.heap.fraction

        默认值:0.8

           参数说明:taskAM占用JVM Xmx的比例,该参数建议调整,需根据具体业务情况修改;

    3、Hive内存Map Join参数设置

    tez.runtime.io.sort.mb

    默认值:100

    参数说明:输出排序需要的内存大小。建议值:40%*hive.tez.container.size,一般不超过2G;

    hive.auto.convert.join.noconditionaltask

    默认值:true

    参数说明:是否将多个mapjoin合并为一个,使用默认值

    hive.auto.convert.join.noconditionaltask.size

    默认值:

    参数说明:多个mapjoin转换为1个时,所有小表的文件大小总和的最大值,这个值只是限制输入的表文件的大小,并不代表实际mapjoin时hashtable的大小。 建议值:1/3* hive.tez.container.size

    tez.runtime.unordered.output.buffer.size-mb

    默认值:100

    参数说明:Size of the buffer to use if not writing directly to disk.。 建议值:10%* hive.tez.container.size

    4、Container重用设置

    tez.am.container.reuse.enabled

        默认值:true

        参数说明:Container重用开关

    Mapper/Reducer优化

    1、Mapper数设置

    tez.grouping.min-size

    默认值:50*1024*1024

    参数说明:Lower bound on thesize (in bytes) of a grouped split, to avoid generating too many small splits.

    tez.grouping.max-size

    默认值:1024*1024*1024

    参数说明:Upper bound on thesize (in bytes) of a grouped split, to avoid generating excessively largesplits.

    ;

    2、Reducer数设置

    hive.tez.auto.reducer.parallelism

    默认值:false

    参数说明:Turn on Tez' autoreducer parallelism feature. When enabled, Hive will still estimate data sizesand set parallelism estimates. Tez will sample source vertices' output sizesand adjust the estimates at runtime as necessary.

    建议设置为true.

    hive.tex.min.partition.factor

    默认值:0.25

    参数说明:When auto reducerparallelism is enabled this factor will be used to put a lower limit to thenumber of reducers that Tez specifies.

    hive.tez.max.partition.factor

    默认值:2.0

    参数说明:When auto reducerparallelism is enabled this factor will be used to over-partition data inshuffle edges.

    hive.exec.reducers.bytes.per.reducer

    默认值:256,000,000

    参数说明:Sizeper reducer. The default in Hive 0.14.0 and earlier is 1 GB, that is, if theinput size is 10 GB then 10 reducers will be used. In Hive 0.14.0 and later thedefault is 256 MB, that is, if the input size is 1 GB then 4 reducers willbe used.

    以下公式确认Reducer个数:

    Max(1, Min(hive.exec.reducers.max [1009], ReducerStage estimate/hive.exec.reducers.bytes.per.reducer))x hive.tez.max.partition.factor [2]

    3、Shuffle参数设置

    tez.shuffle-vertex-manager.min-src-fraction

    默认值:0.25

    参数说明:thefraction of source tasks which should complete before tasks for the currentvertex are scheduled.

    tez.shuffle-vertex-manager.max-src-fraction

    默认值:0.75

    参数说明:oncethis fraction of source tasks have completed, all tasks on the current vertexcan be scheduled. Number of tasks ready for scheduling on the current vertexscales linearly between min-fraction and max-fraction.

     

    例子:

    hive.exec.reducers.bytes.per.reducer=1073741824;// 1gb

    tez.shuffle-vertex-manager.min-src-fraction=0.25;

    tez.shuffle-vertex-manager.max-src-fraction=0.75;

    This indicates thatthe decision will be made between 25% of mappers finishing and 75% of mappersfinishing, provided there's at least 1Gb of data being output (i.e if 25% ofmappers don't send 1Gb of data, we will wait till at least 1Gb is sent out).

    骚年希望能帮助你

  • 相关阅读:
    第八章 Python 对象和类
    第七章 Python 盒子:模块、包和程序
    第六章 Python 函数(二)
    第五章 Python 函数(一)
    VS的32位、64位预处理定义;
    python 3D散点绘图;
    基于生长的棋盘格角点检测算法解读
    C++11: std::function<void()> func;
    有关pyinstaller打包程序后,转到其他电脑报“Failed to excute script"的问题;
    Qt: 监听文件夹QFileSystemWatcher;
  • 原文地址:https://www.cnblogs.com/mobiwangyue/p/8405780.html
Copyright © 2020-2023  润新知