• hive执行结果moveTask操作失败


    hive执行结果moveTask操作失败

    Apache Hive 2.1.0 ,在执行"INSERT OVERWRITE TABLE ...... select "或者 "insert overwrite directory /tmp/data/hive-test "操作,如果生成的结果文件是多个时,执行结果文件moveTask操作会失败。最新的Apache Hive 2.1.1版本同样有该问题;Apache Hive 1.2.1版本的hive没有该问题。

    具体执行的sql如下:

    insert overwrite directory '/tmp/fuxin.zhao/hive-test'
    select 
    shippingorderid 
    ,logisticsplatformid 
    ,stockoutorderid 
    ,logisticstypeid 
    ,externalshippingorderno 
    ,packageweight 
    ,freight 
    ,freightstatus 
    ,entertime 
    ,shippingorderstatus 
    ,shippinglog 
    ,shippinglogupdatetime 
    ,shippingstatustime 
    ,confirmreceivetime 
    ,remarks 
    ,createtype 
    ,lastmodifytime 
    ,enteruser 
    ,updatetime 
    ,updateuser 
    from 
    ( 
    select *,row_number() over(partition by shippingorderid order by LastModifyTime desc) as rn 
    from 
    (select * from ods.m1_shippingorder where dt = '2014-01-01' 
    union all 
    select * from fds.m1_shippingorder where dt = '2099-12-31' 
    ) a )b where b.rn = 1
    
    

    产生的异常如下:

    Failed with exception org.apache.hadoop.hdfs.protocol.AclException: Invalid ACL: multiple entries with same scope, type and name.
        at org.apache.hadoop.hdfs.server.namenode.AclTransformation.buildAndValidateAcl(AclTransformation.java:285)
        at org.apache.hadoop.hdfs.server.namenode.AclTransformation.replaceAclEntries(AclTransformation.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSDirAclOp.unprotectedSetAcl(FSDirAclOp.java:206)
        at org.apache.hadoop.hdfs.server.namenode.FSDirAclOp.setAcl(FSDirAclOp.java:146)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setAcl(FSNamesystem.java:7938)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setAcl(NameNodeRpcServer.java:1813)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setAcl(ClientNamenodeProtocolServerSideTranslatorPB.java:1330)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
    

    异常:org.apache.hadoop.hive.ql.exec.MoveTask执行失败

    ##insert overwrite diretory:
    2016-12-13 18:27:15,630 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 81.52 sec
    MapReduce Total cumulative CPU time: 1 minutes 21 seconds 520 msec
    Ended Job = job_1480497945656_0288
    Moving data to directory /tmp/t_FDS/m1_shippingorder/dt=2099-12-31
    Failed with exception Unable to move source hdfs://dbmtimehadoop/tmp/t_FDS/m1_shippingorder/dt=2099-12-31/.hive-staging_hive_2016-12-13_18-26-28_695_9094454822676037473-1/-ext-10000 to destination /tmp/t_FDS/m1_shippingorder/dt=2099-12-31
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source hdfs://dbmtimehadoop/tmp/t_FDS/m1_shippingorder/dt=2099-12-31/.hive-staging_hive_2016-12-13_18-26-28_695_9094454822676037473-1/-ext-10000 to destination /tmp/t_FDS/m1_shippingorder/dt=2099-12-31
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 5  Reduce: 4   Cumulative CPU: 81.52 sec   HDFS Read: 778870925 HDFS Write: 778698546 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 21 seconds 520 msec
    

    异常:java.util.ConcurrentModificationException

    Failed with exception Unable to move source hdfs://dbmtimehadoop/tmp/fuxin.zhao/hive-test/.hive-staging_hive_2016-12-22_11-45-12_256_5450334497172511865-1/-ext-10000 to destination /tmp/fuxin.zhao/hive-test
    16/12/22 11:45:59 [main]: ERROR exec.Task: Failed with exception Unable to move source hdfs://dbmtimehadoop/tmp/fuxin.zhao/hive-test/.hive-staging_hive_2016-12-22_11-45-12_256_5450334497172511865-1/-ext-10000 to destination /tmp/fuxin.zhao/hive-test
    org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dbmtimehadoop/tmp/fuxin.zhao/hive-test/.hive-staging_hive_2016-12-22_11-45-12_256_5450334497172511865-1/-ext-10000 to destination /tmp/fuxin.zhao/hive-test
    	at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:103)
    	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:255)
    	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
    	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
    	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
    	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
    	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
    	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:497)
    	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.ConcurrentModificationException
    	at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2984)
    	at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileInDfs(MoveTask.java:119)
    	at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:96)
    	... 20 more
    Caused by: java.util.ConcurrentModificationException
    	at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
    	at java.util.ArrayList$Itr.next(ArrayList.java:851)
    	at java.util.AbstractCollection.toString(AbstractCollection.java:461)
    	at java.lang.String.valueOf(String.java:2982)
    	at java.lang.StringBuilder.append(StringBuilder.java:131)
    	at org.apache.hadoop.fs.permission.AclStatus.toString(AclStatus.java:108)
    	at org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:75)
    	at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2961)
    	at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2953)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source hdfs://dbmtimehadoop/tmp/fuxin.zhao/hive-test/.hive-staging_hive_2016-12-22_11-45-12_256_5450334497172511865-1/-ext-10000 to destination /tmp/fuxin.zhao/hive-test
    16/12/22 11:45:59 [main]: ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
    
    

    异常:java.lang.ArrayIndexOutOfBoundsException

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException
            at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2984)
            at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileInDfs(MoveTask.java:119)
            at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:96)
            ... 20 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException
            at java.lang.System.arraycopy(Native Method)
            at java.util.ArrayList.removeRange(ArrayList.java:634)
            at java.util.ArrayList$SubList.removeRange(ArrayList.java:1063)
            at java.util.AbstractList.clear(AbstractList.java:234)
            at com.google.common.collect.Iterables.removeIfFromRandomAccessList(Iterables.java:209)
            at com.google.common.collect.Iterables.removeIf(Iterables.java:180)
            at org.apache.hadoop.hive.io.HdfsUtils.removeBaseAclEntries(HdfsUtils.java:155)
            at org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:77)
            at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2961)
            at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2953)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    

    下面是源码中关于文件权限继承的开关代码:
    HiveConf.ConfVars.HIVE_WAREHOUSE_SUBDIR_INHERIT_PERMS);
    import org.apache.hadoop.hive.conf.HiveConf;
    import org.apache.hadoop.hive.conf.HiveConf.ConfVars;

    产生问题的原因:
    hive的查询结果在在进行move操作时,需要进行文件权限的授权,多个文件的授权是并发进行的,hive中该源码是在一个线程池中
    执行的,该操作在多线程时线程同步有问题的该异常,这是hive的一个bug,目前截止目前的最新版本Apache Hive 2.1.1还没有修复该问题;
    可以通过关闭hive的文件权限继承 hive.warehouse.subdir.inherit.perms=false 来规避该问题。

    解决方法:
    hive.warehouse.subdir.inherit.perms

      <property>
        <name>hive.warehouse.subdir.inherit.perms</name>
        <value>true</value>
        <description>
          Set this to false if the table directories should be created
          with the permissions derived from dfs umask instead of
          inheriting the permission of the warehouse or database directory.
        </description>
      </property>
    
  • 相关阅读:
    Lucene in action 笔记 case study
    关于Restful Web Service的一些理解
    Lucene in action 笔记 analysis篇
    Lucene in action 笔记 index篇
    Lucene in action 笔记 term vector
    Lucene in action 笔记 search篇
    博客园开博记录
    数论(算法概述)
    DIV, IFRAME, Select, Span标签入门
    记一个较困难的SharePoint性能问题的分析和解决
  • 原文地址:https://www.cnblogs.com/honeybee/p/6401479.html
Copyright © 2020-2023  润新知