• Hive Bug修复:ORC表中array数据类型长度超过1024报异常


    目前HVIE里查询如下语句报错:

    select * from dw.ticket_user_mtime limit 10;

    错误如下:

    17/07/06 16:45:38 [main]: DEBUG impl.RecordReaderImpl: merge = [{data range [22733, 19927580), size: 19904847 type: array-backed}]
    Failed with exception java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 1024
    17/07/06 16:45:38 [main]: ERROR CliDriver: Failed with exception java.io.IOException:java.lang.ArrayIndexOutOfBoundsException: 1024
    java.io.IOException: java.lang.ArrayIndexOutOfBoundsException: 1024
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:517)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.pushRow(FetchOperator.java:424)
    at org.apache.hadoop.hive.ql.exec.FetchTask.fetch(FetchTask.java:144)
    at org.apache.hadoop.hive.ql.Driver.getResults(Driver.java:1885)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:252)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:497)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: java.lang.ArrayIndexOutOfBoundsException: 1024
    at org.apache.orc.impl.RunLengthIntegerReaderV2.nextVector(RunLengthIntegerReaderV2.java:369)
    at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.commonReadByteArrays(TreeReaderFactory.java:1231)
    at org.apache.orc.impl.TreeReaderFactory$BytesColumnVectorUtil.readOrcByteArrays(TreeReaderFactory.java:1268)
    at org.apache.orc.impl.TreeReaderFactory$StringDirectTreeReader.nextVector(TreeReaderFactory.java:1368)
    at org.apache.orc.impl.TreeReaderFactory$StringTreeReader.nextVector(TreeReaderFactory.java:1212)
    at org.apache.orc.impl.TreeReaderFactory$ListTreeReader.nextVector(TreeReaderFactory.java:1902)
    at org.apache.orc.impl.TreeReaderFactory$StructTreeReader.nextBatch(TreeReaderFactory.java:1737)
    at org.apache.orc.impl.RecordReaderImpl.nextBatch(RecordReaderImpl.java:1045)
    at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.ensureBatch(RecordReaderImpl.java:77)
    at org.apache.hadoop.hive.ql.io.orc.RecordReaderImpl.hasNext(RecordReaderImpl.java:89)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:230)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$OrcRecordReader.next(OrcInputFormat.java:205)
    at org.apache.hadoop.hive.ql.exec.FetchOperator.getNextRow(FetchOperator.java:484)
    ... 15 more

    hive 补丁位置https://issues.apache.org/jira/browse/HIVE-14483

    该BUG触发的条件是,orc表中,array数据类型,并且当array这个字段中数组的元素超过1024个。

    可以通过打补丁的方式修复,重新编译hive的源码得到 hive-exec-2.1.0.jar 以及hive-orc-2.1.0.jar,

    将这两个jar包更新到所有线上的hive客户端的lib,然后需要重启hive相关的服务(轮流重启hive metasotre,然后轮流重启hiveserver2)。

  • 相关阅读:
    Redis网络连接库剖析
    如何下载和安装pywin32
    Python游戏开发入门:pygame事件处理机制
    python常见错误
    波特率与比特率
    __gcd最大公约数
    动态规划算法之矩阵连乘问题
    二分插入排序+二分搜索
    office 总结
    javaWeb总结
  • 原文地址:https://www.cnblogs.com/honeybee/p/8668771.html
Copyright © 2020-2023  润新知