• sparksql读取hive数据报错:java.lang.RuntimeException: serious problem


    问题:

    Caused by: java.util.concurrent.ExecutionException: java.lang.IndexOutOfBoundsException: Index: 0
    	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    	at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1016)
    	... 65 more
    Caused by: java.lang.IndexOutOfBoundsException: Index: 0
    	at java.util.Collections$EmptyList.get(Collections.java:4454)
    	at org.apache.hadoop.hive.ql.io.orc.OrcProto$Type.getSubtypes(OrcProto.java:12240)
    	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getColumnIndicesFromNames(ReaderImpl.java:651)
    	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.getRawDataSizeOfColumns(ReaderImpl.java:634)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:927)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:836)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:702)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:748)
    java.lang.RuntimeException: serious problem
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1021)
    	at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1048)
    	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
    	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    	at scala.Option.getOrElse(Option.scala:121)
    	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
    	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:46)
    	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
    	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
    	at scala.Option.getOrElse(Option.scala:121)
    
    

    原因:

    sparksql生成的hive表有空文件,但是sparksql读取空文件的时候,因为表示orc格式的,导致sparksql解析orc文件出错。但是用hive却可以正常读取。
    

    解决办法:

    设置set spark.sql.hive.convertMetastoreOrc=true
    单纯的设置以上参数还是会报错:

    java.lang.IndexOutOfBoundsException
    	at java.nio.Buffer.checkIndex(Buffer.java:540)
    	at java.nio.HeapByteBuffer.get(HeapByteBuffer.java:139)
    	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.extractMetaInfoFromFooter(ReaderImpl.java:374)
    	at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:316)
    	at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:187)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:75)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$getFileReader$2.apply(OrcFileOperator.scala:73)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    	at scala.collection.TraversableOnce$class.collectFirst(TraversableOnce.scala:145)
    	at scala.collection.AbstractIterator.collectFirst(Iterator.scala:1336)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$.getFileReader(OrcFileOperator.scala:86)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$1.apply(OrcFileOperator.scala:95)
    	at org.apache.spark.sql.hive.orc.OrcFileOperator$$anonfun$readSchema$1.apply(OrcFileOperator.scala:95)
    	at scala.collection.immutable.Stream.flatMap(Stream.scala:489)
    

    需要再设置set spark.sql.orc.impl=native
    参考https://issues.apache.org/jira/browse/SPARK-19809

  • 相关阅读:
    list转datatable c#
    按钮靠右css小结
    IE浏览器打印合格证相关问题
    vue项目插入视频-mp4
    vue项目bug-Couldn’t find preset "es2015"
    Mac打开swf文件
    mac+windows下从git上拉取项目及运行
    echarts.js制作中国地图
    前端数据可视化echarts.js
    vue-router 基本使用
  • 原文地址:https://www.cnblogs.com/goldenSky/p/11971422.html
Copyright © 2020-2023  润新知