• [Spark]Task not serializable


    问题: Task not Serialable
    查找问题: 通过查看Spark job日志,确定问题
     
    org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:558)
    at org.apache.spark.streaming.dstream.DStream$$anonfun$map$1.apply(DStream.scala:558)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.SparkContext.withScope(SparkContext.scala:714)
    at org.apache.spark.streaming.StreamingContext.withScope(StreamingContext.scala:260)
    at org.apache.spark.streaming.dstream.DStream.map(DStream.scala:557)
    at org.apache.spark.streaming.api.java.JavaDStreamLike$class.map(JavaDStreamLike.scala:155)
    at org.apache.spark.streaming.api.java.AbstractJavaDStreamLike.map(JavaDStreamLike.scala:42)
    at cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl.streaming(SparkStreamServiceImpl.java:78)
    at cn.com.conversant.swiftsight.streaming.SwiftCoderStreamToHBase.main(SwiftCoderStreamToHBase.java:28)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:558)
    Caused by: java.io.NotSerializableException: cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl
    Serialization stack:
    - object not serializable (class: cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl, value: cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl
    @33280368)
    - field (class: cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl$1, name: this$0, type: class cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceI
    mpl)
    - object (class cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl$1, cn.com.conversant.swiftsight.streaming.service.SparkStreamServiceImpl$1@630984ec)
    - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
    - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
    ... 19 more
     
    问题解决:
    SparkStreamServiceImpl 不能序列化,在SparkStreamServiceImpl 加上
    implements java.io. 由于在yarn集群上执行,需要将任务序列化到ApplicationMaster 中执行,因此在提交任务阶段用到的class都需要能被序列化.
  • 相关阅读:
    Codeforces 813F Bipartite Checking 线段树 + 并查集
    Codeforces 263E Rhombus (看题解)
    Codeforces 173E Camping Groups hash
    Codeforces 311C Fetch the Treasure 取模意义下的最短路 (看题解)
    R 培训之 Table
    Docker命令详解
    Celery的实践指南
    Using Celery with Djang
    PostgreSQL
    改时区参考
  • 原文地址:https://www.cnblogs.com/lily-tiantian/p/7614862.html
Copyright © 2020-2023  润新知