• ERROR BatchJobMain: Task not serializable


    spark在class中使用log4j报错,无法序列化的问题

    报错信息

    21/06/16 11:45:22 ERROR BatchJobMain: Task not serializable
    org.apache.spark.SparkException: Task not serializable
    	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    	at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
    	at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
    	at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
    	at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:341)
    	at org.apache.spark.rdd.RDD$$anonfun$filter$1.apply(RDD.scala:340)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    	at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    	at org.apache.spark.rdd.RDD.filter(RDD.scala:340)
    	at com.winner.clu.spark.batch.analysis.AccPresetConditionData.mainFun(AccPresetConditionData.scala:110)
    	at com.winner.clu.spark.batch.BatchJobMain$.main(BatchJobMain.scala:53)
    	at com.winner.clu.spark.batch.BatchJobMain.main(BatchJobMain.scala)
    Caused by: java.io.NotSerializableException: org.apache.log4j.Logger
    Serialization stack:
    	- object not serializable (class: org.apache.log4j.Logger, value: org.apache.log4j.Logger@2d728a9c)
    	- field (class: com.winner.clu.spark.batch.analysis.AccPresetConditionData, name: log, type: class org.apache.log4j.Logger)
    	- object (class com.winner.clu.spark.batch.analysis.AccPresetConditionData, com.winner.clu.spark.batch.analysis.AccPresetConditionData@67599bae)
    	- field (class: com.winner.clu.spark.batch.analysis.AccPresetConditionData$$anonfun$9, name: $outer, type: class com.winner.clu.spark.batch.analysis.AccPresetConditionData)
    	- object (class com.winner.clu.spark.batch.analysis.AccPresetConditionData$$anonfun$9, <function1>)
    	at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    	at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    	at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
    	at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
    	... 12 more
    

    问题描述

    在使用IDEA调试程序的时候,程序中在class中使用了log4j的包之后,报错无法序列化的问题,但是,在object中使用log4j的时候就是没有问题的

    问题原因

    因为该该类中log4j的实例既不是静态的,也不是瞬态的,而且,并log4j并没有实现serializable或Externalizable接口,所以要处理这个问题,很简单,就必须要防止logger实例来自默认的序列化进程,或者说让该实例申明为静态或者瞬态的,这也就解释了为什么在object中定义该类的时候是没问题的。在这里最好是将其设置成静态不可变(static final)的,因为,如果将其设置为瞬态(transient),那么在反序列化的时候结果会为null。指定为static final,则你可以确保线程安全的以及所有的自定义的类可以共享同一个logger实例。

    解决方案

    • 将log4j定为瞬态即可:
    • 将log4j定义为静态不可变(首选)

    我这里使用第一种,定义为瞬态@transient

    class AccPresetConditionData extends Serializable {
    
      @transient lazy val log = Logger.getLogger(this.getClass.getSimpleName)
    
      var siteKey: String = _
      var execDate: String = _
    
      // 业务分析
      def mainFun(args: Array[String], sc: SparkContext): Unit = {}
    }
    
  • 相关阅读:
    Redis学习--命令执行过程中写AOF日志和同步从库顺序
    MySQL Innodb Engine--MVCC代码瞎猜
    MySQL Innodb Engine--DML操作时先生成Undo Log还是先生成Redo Log
    MySQL InnoDB Engine--自适应哈希索引总结
    MySQL InnoDB Engine--自适应哈希索引代码瞎猜03
    MySQL InnoDB Engine--自适应哈希索引代码瞎猜02
    MySQL InnoDB Engine--自适应哈希索引代码瞎猜01
    expect,传入数组进行SSH
    在底图不变的基础上,切换腾讯地图的文字注记(让底图更“干净”)
    十六进制透明颜色 转换成 RGBA 颜色(腾讯地图 与 微信地图组件所需颜色格式不同)
  • 原文地址:https://www.cnblogs.com/Gxiaobai/p/14891047.html
Copyright © 2020-2023  润新知