• 【原创】大数据基础之Spark(2)Spark on Yarn:container memory allocation容器内存分配


    spark 2.1.1

    最近spark任务(spark on yarn)有一个报错

    Diagnostics: Container [pid=5901,containerID=container_1542879939729_30802_01_000001] is running beyond physical memory limits. Current usage: 11.0 GB of 11 GB physical memory used; 12.2 GB of 23.1 GB virtual memory used. Killing container.
    Dump of the process-tree for container_1542879939729_30802_01_000001 :
    |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
    |- 5901 5899 5901 5901 (bash) 3 4 115843072 361 /bin/bash -c LD_LIBRARY_PATH=/export/App/hadoop-2.6.1/lib/native::/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native::/export/App/hadoop-2.6.1/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native:/usr/lib/hadoop/lib/native::/export/App/hadoop-2.6.1/lib/native:/export/App/hadoop-2.6.1/lib/native /export/App/jdk1.8.0_60/bin/java -server -Xmx10240m -Djava.io.tmpdir=/export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/tmp '-XX:+PrintGCDetails' '-XX:+UseG1GC' '-XX:G1HeapRegionSize=32M' '-XX:+UseGCOverheadLimit' '-XX:+ExplicitGCInvokesConcurrent' '-XX:+HeapDumpOnOutOfMemoryError' '-XX:-UseCompressedClassPointers' '-XX:CompressedClassSpaceSize=3G' '-XX:+PrintGCTimeStamps' '-Xloggc:/export/Logs/hadoop/g1gc.log' -Dspark.yarn.app.container.log.dir=/export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class 'app.package.AppClass' --jar file:/jarpath/app.jar --properties-file /export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/__spark_conf__/__spark_conf__.properties 1> /export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001/stdout 2> /export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001/stderr
    |- 6406 5901 5901 5901 (java) 1834301 372741 13026095104 2888407 /export/App/jdk1.8.0_60/bin/java -server -Xmx10240m -Djava.io.tmpdir=/export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/tmp -XX:+PrintGCDetails -XX:+UseG1GC -XX:G1HeapRegionSize=32M -XX:+UseGCOverheadLimit -XX:+ExplicitGCInvokesConcurrent -XX:+HeapDumpOnOutOfMemoryError -XX:-UseCompressedClassPointers -XX:CompressedClassSpaceSize=3G -XX:+PrintGCTimeStamps -Xloggc:/export/Logs/hadoop/g1gc.log -Dspark.yarn.app.container.log.dir=/export/Logs/hadoop/userlogs/application_1542879939729_30802/container_1542879939729_30802_01_000001 org.apache.spark.deploy.yarn.ApplicationMaster --class app.package.AppClass --jar file:/jarpath/app.jar --properties-file /export/Data/tmp/hadoop-tmp/nm-local-dir/usercache/hadoop/appcache/application_1542879939729_30802/container_1542879939729_30802_01_000001/__spark_conf__/__spark_conf__.properties
    Container killed on request. Exit code is 143
    Container exited with a non-zero exit code 143
    Failing this attempt

    从containerID=container_1542879939729_30802_01_000001,以及org.apache.spark.deploy.yarn.ApplicationMaster,可知这个是yarn的ApplicationMaster,运行的是spark的driver,

    问题是提交spark任务时参数为 --driver-moery 10g,而且进程启动命令中也确实是 -Xmx10240m,为什么container被kill是因为超过11g?

    Container [pid=5901,containerID=container_1542879939729_30802_01_000001] is running beyond physical memory limits. Current usage: 11.0 GB of 11 GB physical memory used;

    跟进spark任务提交过程,详见:https://www.cnblogs.com/barneywill/p/9820684.html

    org.apache.spark.launcher.SparkSubmitCommandBuilder

          String tsMemory =
            isThriftServer(mainClass) ? System.getenv("SPARK_DAEMON_MEMORY") : null;
          String memory = firstNonEmpty(tsMemory, config.get(SparkLauncher.DRIVER_MEMORY),
            System.getenv("SPARK_DRIVER_MEMORY"), System.getenv("SPARK_MEM"), DEFAULT_MEM);
          cmd.add("-Xmx" + memory);

    这里会取driver memory的值,取的地方有一个优先级,firstNonEmpty

    org.apache.spark.deploy.SparkSubmit

        // In yarn-cluster mode, use yarn.Client as a wrapper around the user class
        if (isYarnCluster) {
          childMainClass = "org.apache.spark.deploy.yarn.Client"

    如果--master yarn时,会提交Client类

    org.apache.spark.deploy.yarn.Client

      // AM related configurations
      private val amMemory = if (isClusterMode) {
        sparkConf.get(DRIVER_MEMORY).toInt
      } else {
        sparkConf.get(AM_MEMORY).toInt
      }
      private val amMemoryOverhead = {
        val amMemoryOverheadEntry = if (isClusterMode) DRIVER_MEMORY_OVERHEAD else AM_MEMORY_OVERHEAD
        sparkConf.get(amMemoryOverheadEntry).getOrElse(
          math.max((MEMORY_OVERHEAD_FACTOR * amMemory).toLong, MEMORY_OVERHEAD_MIN)).toInt
      }
      private val amCores = if (isClusterMode) {
        sparkConf.get(DRIVER_CORES)
      } else {
        sparkConf.get(AM_CORES)
      }
    
      // Executor related configurations
      private val executorMemory = sparkConf.get(EXECUTOR_MEMORY)
      private val executorMemoryOverhead = sparkConf.get(EXECUTOR_MEMORY_OVERHEAD).getOrElse(
        math.max((MEMORY_OVERHEAD_FACTOR * executorMemory).toLong, MEMORY_OVERHEAD_MIN)).toInt

    其中会设置amMemoryOverhead 和executorMemoryOverhead 

        val capability = Records.newRecord(classOf[Resource])
        capability.setMemory(amMemory + amMemoryOverhead)
        capability.setVirtualCores(amCores)

    然后会根据amMemory+amMemoryOverhead的值来向yarn申请资源;

    一些默认值和配置如下:

    org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

    object YarnSparkHadoopUtil {
      // Additional memory overhead
      // 10% was arrived at experimentally. In the interest of minimizing memory waste while covering
      // the common cases. Memory overhead tends to grow with container size.
    
      val MEMORY_OVERHEAD_FACTOR = 0.10
      val MEMORY_OVERHEAD_MIN = 384L

    org.apache.spark.deploy.yarn.config

      private[spark] val DRIVER_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.driver.memoryOverhead")
        .bytesConf(ByteUnit.MiB)
        .createOptional
    
      private[spark] val EXECUTOR_MEMORY_OVERHEAD = ConfigBuilder("spark.yarn.executor.memoryOverhead")
        .bytesConf(ByteUnit.MiB)
        .createOptional

    所以默认的driver memory申请方式为

    1 spark.yarn.driver.memoryOverhead 配置优先

    2 driverMemory + overhead

    其中 overhead = math.max((0.1 * driverMemory).toLong, 384))

    所以--driver-memory 10g时向yarn申请的container内存是11g

  • 相关阅读:
    Linux 配置 nginx + php
    Laravel 网站项目目录结构规划
    配置服务器 Ubuntu 记录+踩坑
    JavaScript 单例,Hash,抛异常
    易理解版八皇后
    获取bing每日图片
    OpenGL 学习笔记 01 环境配置
    [瞎JB写] C++多态
    c++ initialize_list
    最长上升子序列的二分优化
  • 原文地址:https://www.cnblogs.com/barneywill/p/10102353.html
Copyright © 2020-2023  润新知