• spark报错小结


    1.需要加上转义字符
    java.util.regex.PatternSyntaxException: Unclosed character class near index 0
    java.util.regex.PatternSyntaxException: Unexpected internal error near index 1

    2.kafka中数据还没来得及消费,数据就已经丢失或者过期了;就是kafka的topic的offset超过range了,可能是maxratePerPartition的值设定小了 [https://blog.csdn.net/yxgxy270187133/article/details/53666760]
    org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions: {newsfeed-100-content-docidlog-1=103944288}


    3.内存参数太小 --executor-memory 8G --driver-memory 8G
    Application application_1547156777102_0243 failed 2 times due to AM Container for appattempt_1547156777102_0243_000002 exited with exitCode: -104
    For more detailed output, check the application tracking page:https://host-10-11-11-11:26001/cluster/app/application_1547156777102_0243 Then click on links to logs of each attempt.
    Diagnostics: Container [pid=5064,containerID=container_e62_1547156777102_0243_02_000001] is running beyond physical memory limits. Current usage: 4.6 GB of 4.5 GB physical memory used; 6.3 GB of 22.5 GB virtual memory used. Killing container.


    4.方法调用在方法定义之后
    forward reference extends over definition of value xxx

    *******************************************************************
    https://blog.csdn.net/appleyuchi/article/details/81633335
    pom中的provided指的是编译需要,发布不需要,当我们通过spark-submit提交时,spark会提供需要的streaming包,而Intellij是通过java提交的,在运行时依然需要streaming的包,所以需要去掉.
    1.解决方案:本地运行时注销掉<scope>provided</scope>,reimport maven projects
    java.lang.ClassNotFoundException: org.apache.spark.SparkConf

    2.

    [ERROR] E:git3_commit2hellohellosrcmainscalacomhello cmhello
    extcontenthello.scala:206: error: No org.json4s.Formats found. Try
    to bring an instance of org.json4s.Formats in scope or use the org.json4s.Defau
    ltFormats.
    [INFO] val str = write(map)

    添加
    implicit val formats: DefaultFormats = DefaultFormats

    3.
    Spark 2.0 DataFrame map操作中Unable to find encoder for type stored in a Dataset.问题的分析与解决

    主要是dataframe.map操作,这个之前在spark 1.X是可以运行的,然而在spark 2.0上却无法通过,修改为dataframe.rdd.map即可


    4.
    https://blog.csdn.net/someby/article/details/90715799

    DataFrame转Dataset时,首先需要引入隐式转换,然后将自定义的Case Class设置为全局变量


    https://stackoverflow.com/questions/30033043/hadoop-job-fails-resource-manager-doesnt-recognize-attemptid/30391973#30391973


    5.
    同一条sql语句 ,Spark Sql 和 hive shell 查询数据结果不一致
    https://blog.csdn.net/HappyLin0x29a/article/details/88557168
    [为了优化读取parquet格式文件,spark默认选择使用自己的解析方式读取数据,结果读出的数据就有问题,所以将配置项spark.sql.hive.convertMetastoreParquet 改为false就行了]

    Spark闭包与序列化
    https://blog.csdn.net/bluishglc/article/details/50945032

    加@transient 不序列化


    6.子类继承父类,重写成员变量,
    没有懒加载可能会导致resultArray获取不到fruitName
    class A{
    lazy val fruitName="apple"
    lazy val resultArray=Array(fruitName,"2")
    }

    class B extends A{
    override lazy val fruitName="orange"
    }


    7.
    MetadataFetchFailedException: Missing an output location for shuffle

    8.hive的string和varchar的区别
    ①VARCHAR与STRING类似,但是STRING存储变长的文本,对长度没有限制;varchar长度上只允许在1-65355之间
    ②还没有通用的UDF可以直接用于VARCHAR类型,可以使用String UDF代替,VARCHAR将会转换为String再传递给UDF

  • 相关阅读:
    httpclient + TestNG 接口自动测试 第二章
    httpclient + TestNG 接口自动测试 第一章
    Kafka-manager启动命令
    多台服务器搭建Spark集群
    Scala学习 -- 基础语法
    Spark学习 -- RDD
    Spark 学习
    Angular constructor和OnInit的区别和适用场景
    TypeScript基础学习 —— 变量声明
    TypeScript基础学习 —— 基础类型
  • 原文地址:https://www.cnblogs.com/ShyPeanut/p/11798913.html
Copyright © 2020-2023  润新知