问题一
ERROR storage.DiskBlockObjectWriter: Uncaught exception while reverting partial writes to file /hadoop/application_1415632483774_448143/spark-local-20141127115224-9ca8/04/shuffle_1_1562_27 java.io.FileNotFoundException: /hadoop/application_1415632483774_448143/spark-local-20141127115224-9ca8/04/shuffle_1_1562_27 (No such file or directory)
解决方法:表面上看是因为shuffle没有地方写了,如果后面的stack是local space 的问题,那么清一下磁盘就好了。上面这种问题,是因为一个excutor给分配的内存不够,此时,减少excutor-core的数量,加大excutor-memory的值应该就没有问题。
问题二
ERROR executor.CoarseGrainedExecutorBackend: Driver Disassociated [akka.tcp://sparkExecutor@pc-jfqdfx31:48586] -> [akka.tcp://sparkDriver@pc-jfqdfx30:41656] disassociated! Shutting down. 15/07/23 10:50:56 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM
解决方法:这个错误比较隐晦,从信息上看来不知道是什么问题,但是归根结底还是内存的问题,有两个方法可以解决这个错误,方法一:如上面所说,加大excutor-memory的值,减少executor-cores的数量,问题可以解决。方法二:加大executor.overhead的值,但是这样其实并没有解决掉根本的问题。所以如果集群的资源是支持的话,就用方法一的办法吧。
另外,这个错误也出现在partitionBy(new HashPartition(partiton-num))时,如果partiton-num太大或者太小的时候会报这种错误,说白了也是内存的原因,不过这个时候增加内存和overhead没有什么用,得去调整这个partiton-num的值。
问题三
Container运行超出物理内存限制
查看hive的虚拟内存,默认的是2.1G
hive> set yarn.nodemanager.vmem-pmem-ratio;
yarn.nodemanager.vmem-pmem-ratio=2.1
解决方法:
Step1:更改yarn的配置属性,同时在ResourceManager Default Group和Gateway Default Group中添加配置内容如下:
<property>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>10</value>
</property>
Step2:更改yarn中以下系统配置
将mapreduce.map.memory.mb 由0G改为4G
将mapreduce.reduce.memory.mb由0G改为4G
Step3:重启过期配置,在Hive中查看配置:
hive> set yarn.nodemanager.vmem-pmem-ratio;
yarn.nodemanager.vmem-pmem-ratio=10
hive> set mapreduce.map.memory.mb;
mapreduce.map.memory.mb=4
hive> set mapreduce.reduce.memory.mb;
mapreduce.reduce.memory.mb=4
总结:
1. 当spark console打印的堆栈很可能只是表面现象,导致问题出现的堆栈信息很可能在yarn的日志里面
2. yarn日志里面的堆栈错误,要优先排查解决
3.注意mapreduce.map.memory.mb、 mapreduce.reduce.memory.mb、yarn.scheduler.minimum-allocation-mb、mapreduce.reduce.java.opts 、mapreduce.map.java.opts指标值的设定
There are memory settings that can be set at the Yarn container level and also at the mapper and reducer level. Memory is requested in increments of the Yarn container size. Mapper and reducer tasks run inside a container. mapreduce.map.memory.mb and mapreduce.reduce.memory.mb above parameters describe upper memory limit for the map-reduce task and if memory subscribed by this task exceeds this limit, the corresponding container will be killed. These parameters determine the maximum amount of memory that can be assigned to mapper and reduce tasks respectively. Let us look at an example: Mapper is bound by an upper limit for memory which is defined in the configuration parameter mapreduce.map.memory.mb. However, if the value for yarn.scheduler.minimum-allocation-mb is greater than this value of mapreduce.map.memory.mb, then the yarn.scheduler.minimum-allocation-mb is respected and the containers of that size are given out. This parameter needs to be set carefully and if not set properly, this could lead to bad performance or OutOfMemory errors. mapreduce.reduce.java.opts and mapreduce.map.java.opts This property value needs to be less than the upper bound for map/reduce task as defined in mapreduce.map.memory.mb/mapreduce.reduce.memory.mb, as it should fit within the memory allocation for the map/reduce task.