• 【Spark】object not serializable (class: A)


    异常信息如下:

    Exception in thread "main" org.apache.spark.SparkException: Task not serializable
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:345)
        at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:335)
        at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
        at org.apache.spark.SparkContext.clean(SparkContext.scala:2299)
        at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:371)
        at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:370)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
        at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
        at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
        at org.apache.spark.rdd.RDD.map(RDD.scala:370)
        at com.sangfor.sdp.hbase.bulkload.BulkLoadData$$anonfun$main$1.apply$mcVI$sp(BulkLoadData.scala:86)
        at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
        at com.sangfor.sdp.hbase.bulkload.BulkLoadData$.main(BulkLoadData.scala:84)
        at com.sangfor.sdp.hbase.bulkload.BulkLoadData.main(BulkLoadData.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
        at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.io.NotSerializableException: com.sdp.hbase.entity.IndexRowkeyMetaData
    Serialization stack:
        - object not serializable (class: com.sangfor.sdp.hbase.entity.IndexRowkeyMetaData, value: com.sangfor.sdp.hbase.entity.IndexRowkeyMetaData@4745bcc6)
        - field (class: com.sangfor.sdp.hbase.bulkload.BulkLoadData$$anonfun$main$1$$anonfun$5, name: indexRowkeyMetaData$1, type: class com.sangfor.sdp.hbase.entity.IndexRowkeyMetaData)
        - object (class com.sangfor.sdp.hbase.bulkload.BulkLoadData$$anonfun$main$1$$anonfun$5, <function1>)
        at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
        at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:46)
        at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
        at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:342)
        ... 23 more

    这是因为spark本身分发任务的时候,对象本身需要做序列化操作。如果没做,则在服务之间的无法做远程对象通信RPC。

    有两种解决的方案:

    一种是实体类集成

    java.io.Serializable 接口

    另一种是:

    sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer")

    sparkConf.registerKryoClasses(Array(classOf[com.sdp.hbase.entity.IndexRowkeyMetaData]))

    来指定spark序列化的方案和对象。

  • 相关阅读:
    gzip 压缩格式的网站处理方法---sina.com 分类: python python基础学习 2013-07-16 17:40 362人阅读 评论(0) 收藏
    自定义系统命令缩写 分类: ubuntu 2013-07-15 17:42 344人阅读 评论(0) 收藏
    线程 ing 分类: python 2013-07-15 14:28 197人阅读 评论(0) 收藏
    [模板]排序
    [BFS] [洛谷] P1032 字串变换
    [二分答案][洛谷] P1316 丢瓶盖
    [二分] [POJ] 2456 Aggressive cows
    [贪心] [STL] [51nod] 做任务三
    [BFS] [记忆化] [洛谷] P1141 01迷宫
    [DFS] [记忆化] [洛谷] P1434 [SHOI2002]滑雪
  • 原文地址:https://www.cnblogs.com/yankang/p/10582686.html
Copyright © 2020-2023  润新知