• [Hadoop]


    在运行mapreduce的时候,出现Error: GC overhead limit exceeded,查看log日志,发现异常信息为

    2015-12-11 11:48:44,716 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.io.DataInputStream.readUTF(DataInputStream.java:661)
        at java.io.DataInputStream.readUTF(DataInputStream.java:564)
        at xxxx.readFields(DateDimension.java:186)
        at xxxx.readFields(StatsUserDimension.java:67)
        at xxxx.readFields(StatsBrowserDimension.java:68)
        at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java:158)
        at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java:158)
        at org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java:239)
        at xxx.reduce(BrowserReducer.java:37)
        at xxx.reduce(BrowserReducer.java:16)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:171)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:627)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
        at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
        at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)

    从异常中我们可以看到,在reduce读取一下个数据的时候,出现内存不够的问题,从代码中我发现再reduce端使用了读个map集合,这样会导致内存不够的问题。在hadoop2.x中默认Container的yarn child jvm堆大小为200M,通过参数mapred.child.java.opts指定,可以在job提交的时候给定,是一个客户端生效的参数,配置在mapred-site.xml文件中,通过将该参数修改为-Xms200m -Xmx1000m来更改jvm堆大小,异常解决。

    参数名称 默认值 描述
    mapred.child.java.opts -Xmx200m 定义mapreduce执行的container容器的执行jvm参数
    mapred.map.child.java.opts   单独指定map阶段的执行jvm参数
    mapred.reduce.child.java.opts   单独指定reduce阶段的执行jvm参数
    mapreduce.admin.map.child.java.opts
    -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
    管理员指定map阶段执行的jvm参数
    mapreduce.admin.reduce.child.java.opts
    -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
    管理员指定reduce阶段的执行jvm参数

     上述五个参数生效的分别执行顺序为:

      map阶段:mapreduce.admin.map.child.java.opts < mapred.child.java.opts < mapred.map.child.java.opts, 也就是说最终会采用mapred.map.child.java.opts定义的jvm参数,如果有冲突的话。

      reduce阶段:mapreduce.admin.reduce.child.java.opts < mapred.child.java.opts < mapred.reduce.child.java.opts

     hadoop源码参考:org.apache.hadoop.mapred.MapReduceChildJVM.getChildJavaOpts方法。

    private static String getChildJavaOpts(JobConf jobConf, boolean isMapTask) {
        String userClasspath = "";
        String adminClasspath = "";
        if (isMapTask) {
            userClasspath = jobConf.get(JobConf.MAPRED_MAP_TASK_JAVA_OPTS,
                    jobConf.get(JobConf.MAPRED_TASK_JAVA_OPTS,
                            JobConf.DEFAULT_MAPRED_TASK_JAVA_OPTS));
            adminClasspath = jobConf.get(
                    MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS,
                    MRJobConfig.DEFAULT_MAPRED_ADMIN_JAVA_OPTS);
        } else {
            userClasspath = jobConf.get(JobConf.MAPRED_REDUCE_TASK_JAVA_OPTS,
                    jobConf.get(JobConf.MAPRED_TASK_JAVA_OPTS,
                            JobConf.DEFAULT_MAPRED_TASK_JAVA_OPTS));
            adminClasspath = jobConf.get(
                    MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS,
                    MRJobConfig.DEFAULT_MAPRED_ADMIN_JAVA_OPTS);
        }
    
        // Add admin classpath first so it can be overridden by user.
        return adminClasspath + " " + userClasspath;
    }
  • 相关阅读:
    如何用conda安装软件|处理conda安装工具的动态库问题
    用 Anaconda 完美解决 Python2 和 python3 共存问题
    转录组数据库介绍
    突变注释工具SnpEff,Annovar,VEP,oncotator比较分析--转载
    BioConda--转载
    生信软件的好帮手-bioconda--转载
    一些WGS健康体检网站和公司
    基于R进行相关性分析--转载
    R语言 sub与gsub函数的区别
    Docker安装
  • 原文地址:https://www.cnblogs.com/liuming1992/p/5040169.html
Copyright © 2020-2023  润新知