• hive执行结果moveTask操作失败


    hive执行结果moveTask操作失败

    Apache Hive 2.1.0 ,在执行"INSERT OVERWRITE TABLE ...... select "或者 "insert overwrite directory /tmp/data/hive-test "操作,如果生成的结果文件是多个时,执行结果文件moveTask操作会失败。最新的Apache Hive 2.1.1版本同样有该问题;Apache Hive 1.2.1版本的hive没有该问题。

    具体执行的sql如下:

    insert overwrite directory '/tmp/fuxin.zhao/hive-test'
    select 
    shippingorderid 
    ,logisticsplatformid 
    ,stockoutorderid 
    ,logisticstypeid 
    ,externalshippingorderno 
    ,packageweight 
    ,freight 
    ,freightstatus 
    ,entertime 
    ,shippingorderstatus 
    ,shippinglog 
    ,shippinglogupdatetime 
    ,shippingstatustime 
    ,confirmreceivetime 
    ,remarks 
    ,createtype 
    ,lastmodifytime 
    ,enteruser 
    ,updatetime 
    ,updateuser 
    from 
    ( 
    select *,row_number() over(partition by shippingorderid order by LastModifyTime desc) as rn 
    from 
    (select * from ods.m1_shippingorder where dt = '2014-01-01' 
    union all 
    select * from fds.m1_shippingorder where dt = '2099-12-31' 
    ) a )b where b.rn = 1
    
    

    产生的异常如下:

    Failed with exception org.apache.hadoop.hdfs.protocol.AclException: Invalid ACL: multiple entries with same scope, type and name.
        at org.apache.hadoop.hdfs.server.namenode.AclTransformation.buildAndValidateAcl(AclTransformation.java:285)
        at org.apache.hadoop.hdfs.server.namenode.AclTransformation.replaceAclEntries(AclTransformation.java:230)
        at org.apache.hadoop.hdfs.server.namenode.FSDirAclOp.unprotectedSetAcl(FSDirAclOp.java:206)
        at org.apache.hadoop.hdfs.server.namenode.FSDirAclOp.setAcl(FSDirAclOp.java:146)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.setAcl(FSNamesystem.java:7938)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.setAcl(NameNodeRpcServer.java:1813)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.setAcl(ClientNamenodeProtocolServerSideTranslatorPB.java:1330)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:616)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:969)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2049)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2045)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2043)
    

    异常:org.apache.hadoop.hive.ql.exec.MoveTask执行失败

    ##insert overwrite diretory:
    2016-12-13 18:27:15,630 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 81.52 sec
    MapReduce Total cumulative CPU time: 1 minutes 21 seconds 520 msec
    Ended Job = job_1480497945656_0288
    Moving data to directory /tmp/t_FDS/m1_shippingorder/dt=2099-12-31
    Failed with exception Unable to move source hdfs://dbmtimehadoop/tmp/t_FDS/m1_shippingorder/dt=2099-12-31/.hive-staging_hive_2016-12-13_18-26-28_695_9094454822676037473-1/-ext-10000 to destination /tmp/t_FDS/m1_shippingorder/dt=2099-12-31
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source hdfs://dbmtimehadoop/tmp/t_FDS/m1_shippingorder/dt=2099-12-31/.hive-staging_hive_2016-12-13_18-26-28_695_9094454822676037473-1/-ext-10000 to destination /tmp/t_FDS/m1_shippingorder/dt=2099-12-31
    MapReduce Jobs Launched:
    Stage-Stage-1: Map: 5  Reduce: 4   Cumulative CPU: 81.52 sec   HDFS Read: 778870925 HDFS Write: 778698546 SUCCESS
    Total MapReduce CPU Time Spent: 1 minutes 21 seconds 520 msec
    

    异常:java.util.ConcurrentModificationException

    Failed with exception Unable to move source hdfs://dbmtimehadoop/tmp/fuxin.zhao/hive-test/.hive-staging_hive_2016-12-22_11-45-12_256_5450334497172511865-1/-ext-10000 to destination /tmp/fuxin.zhao/hive-test
    16/12/22 11:45:59 [main]: ERROR exec.Task: Failed with exception Unable to move source hdfs://dbmtimehadoop/tmp/fuxin.zhao/hive-test/.hive-staging_hive_2016-12-22_11-45-12_256_5450334497172511865-1/-ext-10000 to destination /tmp/fuxin.zhao/hive-test
    org.apache.hadoop.hive.ql.metadata.HiveException: Unable to move source hdfs://dbmtimehadoop/tmp/fuxin.zhao/hive-test/.hive-staging_hive_2016-12-22_11-45-12_256_5450334497172511865-1/-ext-10000 to destination /tmp/fuxin.zhao/hive-test
    	at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:103)
    	at org.apache.hadoop.hive.ql.exec.MoveTask.execute(MoveTask.java:255)
    	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:197)
    	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
    	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1858)
    	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1562)
    	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1313)
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1084)
    	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
    	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:232)
    	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:183)
    	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:399)
    	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:776)
    	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:714)
    	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:641)
    	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    	at java.lang.reflect.Method.invoke(Method.java:497)
    	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.util.ConcurrentModificationException
    	at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2984)
    	at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileInDfs(MoveTask.java:119)
    	at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:96)
    	... 20 more
    Caused by: java.util.ConcurrentModificationException
    	at java.util.ArrayList$Itr.checkForComodification(ArrayList.java:901)
    	at java.util.ArrayList$Itr.next(ArrayList.java:851)
    	at java.util.AbstractCollection.toString(AbstractCollection.java:461)
    	at java.lang.String.valueOf(String.java:2982)
    	at java.lang.StringBuilder.append(StringBuilder.java:131)
    	at org.apache.hadoop.fs.permission.AclStatus.toString(AclStatus.java:108)
    	at org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:75)
    	at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2961)
    	at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2953)
    	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    	at java.lang.Thread.run(Thread.java:745)
    
    FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Unable to move source hdfs://dbmtimehadoop/tmp/fuxin.zhao/hive-test/.hive-staging_hive_2016-12-22_11-45-12_256_5450334497172511865-1/-ext-10000 to destination /tmp/fuxin.zhao/hive-test
    16/12/22 11:45:59 [main]: ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
    
    

    异常:java.lang.ArrayIndexOutOfBoundsException

    Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.ArrayIndexOutOfBoundsException
            at org.apache.hadoop.hive.ql.metadata.Hive.moveFile(Hive.java:2984)
            at org.apache.hadoop.hive.ql.exec.MoveTask.moveFileInDfs(MoveTask.java:119)
            at org.apache.hadoop.hive.ql.exec.MoveTask.moveFile(MoveTask.java:96)
            ... 20 more
    Caused by: java.lang.ArrayIndexOutOfBoundsException
            at java.lang.System.arraycopy(Native Method)
            at java.util.ArrayList.removeRange(ArrayList.java:634)
            at java.util.ArrayList$SubList.removeRange(ArrayList.java:1063)
            at java.util.AbstractList.clear(AbstractList.java:234)
            at com.google.common.collect.Iterables.removeIfFromRandomAccessList(Iterables.java:209)
            at com.google.common.collect.Iterables.removeIf(Iterables.java:180)
            at org.apache.hadoop.hive.io.HdfsUtils.removeBaseAclEntries(HdfsUtils.java:155)
            at org.apache.hadoop.hive.io.HdfsUtils.setFullFileStatus(HdfsUtils.java:77)
            at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2961)
            at org.apache.hadoop.hive.ql.metadata.Hive$3.call(Hive.java:2953)
            at java.util.concurrent.FutureTask.run(FutureTask.java:266)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
            at java.lang.Thread.run(Thread.java:745)
    

    下面是源码中关于文件权限继承的开关代码:
    HiveConf.ConfVars.HIVE_WAREHOUSE_SUBDIR_INHERIT_PERMS);
    import org.apache.hadoop.hive.conf.HiveConf;
    import org.apache.hadoop.hive.conf.HiveConf.ConfVars;

    产生问题的原因:
    hive的查询结果在在进行move操作时,需要进行文件权限的授权,多个文件的授权是并发进行的,hive中该源码是在一个线程池中
    执行的,该操作在多线程时线程同步有问题的该异常,这是hive的一个bug,目前截止目前的最新版本Apache Hive 2.1.1还没有修复该问题;
    可以通过关闭hive的文件权限继承 hive.warehouse.subdir.inherit.perms=false 来规避该问题。

    解决方法:
    hive.warehouse.subdir.inherit.perms

      <property>
        <name>hive.warehouse.subdir.inherit.perms</name>
        <value>true</value>
        <description>
          Set this to false if the table directories should be created
          with the permissions derived from dfs umask instead of
          inheriting the permission of the warehouse or database directory.
        </description>
      </property>
    
  • 相关阅读:
    终结篇:MyBatis原理深入解析(二)
    Centos7 安装clamav杀毒
    jenkins 自动化部署
    docker 安装redis
    linux CentOS7 安装字体库-转
    docker 安装jenkins
    linux 下安装docker
    linux 下安装redis
    linux 下mongo 基础配置
    Linux下MongoDB安装和配置详解
  • 原文地址:https://www.cnblogs.com/honeybee/p/6401479.html
Copyright © 2020-2023  润新知