问题: Task not Serialable
查找问题: 通过查看Spark job日志,确定问题
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:558)
at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:558)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
at org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:260)
at org.apache.spark.streaming.dstream.DStream.map(DStream.scala:557)
at org.apache.spark.streaming.api.java.JavaDStreamLike$class.map(JavaDStreamLike.scala:155)
at org.apache.spark.streaming.api.java.AbstractJavaDStreamLike.map(JavaDStreamLike.scala:42)
at cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl.streaming(SparkStreamServiceImpl.java:78)
at cn.com.conversant.swiftsight.streaming.SwiftCoderStreamToHBase.main(SwiftCoderStreamToHBase.java:28)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558)
Caused by: java.io.NotSerializableException: cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl
Serialization stack:
- object not serializable (class: cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl, value: cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl
@33280368)
- field (class: cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl$1, name: this$0, type: class cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceI
mpl)
- object (class cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl$1, cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl$1@630984ec)
- field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
- object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 19 more
问题解决:
SparkStreamServiceImpl 不能序列化,在SparkStreamServiceImpl 加上
implements java.io. 由于在yarn集群上执行,需要将任务序列化到ApplicationMaster 中执行,因此在提交任务阶段用到的class都需要能被序列化.