• Hive Metastore ObjectStore PersistenceManager自动关闭bug解析


    最近在测试HCatalog,由于Hcatalog本身就是一个独立JAR包,虽然它也可以运行service,但是其实这个service就是metastore thrift server,我们在写基于Hcatalog的mapreduce job时候只要把hcatalog JAR包和对应的hive-site.xml文件加入libjars和HADOOP_CLASSPATH中就可以了。不过在测试的时候还是遇到了一些问题,hive metastore server在运行了一段时间后会抛如下错误

     

    2013-06-19 10:35:51,718 ERROR server.TThreadPoolServer (TThreadPoolServer.java:run(182)) - Error occurred during processing of message.
    javax.jdo.JDOFatalUserException: Persistence Manager has been closed
            at org.datanucleus.jdo.JDOPersistenceManager.assertIsOpen(JDOPersistenceManager.java:2124)
            at org.datanucleus.jdo.JDOPersistenceManager.currentTransaction(JDOPersistenceManager.java:315)
            at org.apache.hadoop.hive.metastore.ObjectStore.openTransaction(ObjectStore.java:294)
            at org.apache.hadoop.hive.metastore.ObjectStore.getTable(ObjectStore.java:732)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
            at java.lang.reflect.Method.invoke(Method.java:597)
            at org.apache.hadoop.hive.metastore.RetryingRawStore.invoke(RetryingRawStore.java:111)
            at com.sun.proxy.$Proxy5.getTable(Unknown Source)
            at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.get_table(HiveMetaStore.java:982)
            at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table.getResult(ThriftHiveMetastore.java:5017)
            at org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Processor$get_table.getResult(ThriftHiveMetastore.java:5005)
            at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
            at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)

    其中PersistenceManager负责控制一组持久化对象包括创建持久化对象和查询对象,它是ObjectStore的一个实例变量,每个ObjectStore拥有一个pm,RawStore是metastore逻辑层和物理底层元数据库(比如derby)交互的接口类,ObjectStore是RawStore的默认实现类。Hive Metastore Server启动的时候会指定一个TProcessor,包装了一个HMSHandler,内部有一个ThreadLocal<RawStore> threadLocalMS实例变量,每个thread维护一个RawStore

     

        private final ThreadLocal<RawStore> threadLocalMS =
          new ThreadLocal<RawStore>() {
            @Override
            protected synchronized RawStore initialValue() {
              return null;
            }
          };


    每一个从hive metastore client过来的请求都会从线程池中分配一个 WorkerProcess来处理,在HMSHandler中每一个方法都会通过getMS()获取rawstore instance来做具体操作

     

        public RawStore getMS() throws MetaException {
          RawStore ms = threadLocalMS.get();
          if (ms == null) {
            ms = newRawStore();
            threadLocalMS.set(ms);
            ms = threadLocalMS.get();
          }
          return ms;
        }

    看得出来RawStore是延迟加载,初始化后绑定到threadlocal变量中可以为以后复用


     

        private RawStore newRawStore() throws MetaException {
          LOG.info(addPrefix("Opening raw store with implemenation class:"
              + rawStoreClassName));
          Configuration conf = getConf();
    
          return RetryingRawStore.getProxy(hiveConf, conf, rawStoreClassName, threadLocalId.get());
        }


    RawStore使用了动态代理模式(继承 InvocationHandler接口 ),内部实现了invoke函数,通过method.invoke()执行真正的逻辑,这样的好处是可以在 method.invoke()上下文中添加自己其他的逻辑,RetryingRawStore就是在通过捕捉invoke函数抛出的异常,来达到重试的效果。由于使用reflection机制,异常是wrap在 InvocationTargetException中的, 不过在hive 0.9中竟然在捕捉到 此异常后直接throw出来了,而不是retry,明显不对啊。我对它修改了下,拿出wrap的target exception,判断是不是instance of jdoexception的,再做相应的处理

     

      @Override
      public Object invoke(Object proxy, Method method, Object[] args) throws Throwable {
        Object ret = null;
    
        boolean gotNewConnectUrl = false;
        boolean reloadConf = HiveConf.getBoolVar(hiveConf,
            HiveConf.ConfVars.METASTOREFORCERELOADCONF);
        boolean reloadConfOnJdoException = false;
    
        if (reloadConf) {
          updateConnectionURL(getConf(), null);
        }
    
        int retryCount = 0;
        Exception caughtException = null;
        while (true) {
          try {
            if (reloadConf || gotNewConnectUrl || reloadConfOnJdoException) {
              initMS();
            }
            ret = method.invoke(base, args);
            break;
          } catch (javax.jdo.JDOException e) {
            caughtException = (javax.jdo.JDOException) e.getCause();
          } catch (UndeclaredThrowableException e) {
            throw e.getCause();
          } catch (InvocationTargetException e) {
            Throwable t = e.getTargetException();
            if (t instanceof JDOException){
              caughtException = (JDOException) e.getTargetException();
              reloadConfOnJdoException = true;
              LOG.error("rawstore jdoexception:" + caughtException.toString());
            }else {
                throw e.getCause();
            }
          }
    
          if (retryCount >= retryLimit) {
            throw caughtException;
          }
    
          assert (retryInterval >= 0);
          retryCount++;
          LOG.error(
              String.format(
                  "JDO datastore error. Retrying metastore command " +
                      "after %d ms (attempt %d of %d)", retryInterval, retryCount, retryLimit));
          Thread.sleep(retryInterval);
          // If we have a connection error, the JDO connection URL hook might
          // provide us with a new URL to access the datastore.
          String lastUrl = getConnectionURL(getConf());
          gotNewConnectUrl = updateConnectionURL(getConf(), lastUrl);
        }
        return ret;
      }


    初始化RawStore有两种方式,一种是在 RetryingRawStore的构造函数中调用" this.base = (RawStore) ReflectionUtils.newInstance(rawStoreClass, conf); "  因为ObjectStore实现了Configurable,在newInstance方法中主动调用里面的setConf(conf)方法初始化RawStore,还有一种情况是在捕捉到异常后retry,也会调用 base.setConf(getConf());

     

    private void initMS() {
        base.setConf(getConf());
      }


    ObjectStore的setConf方法中,先将PersistenceManagerFactory锁住,pm close掉,设置成NULL,再初始化pm

     

    public void setConf(Configuration conf) {
        // Although an instance of ObjectStore is accessed by one thread, there may
        // be many threads with ObjectStore instances. So the static variables
        // pmf and prop need to be protected with locks.
        pmfPropLock.lock();
        try {
          isInitialized = false;
          hiveConf = conf;
          Properties propsFromConf = getDataSourceProps(conf);
          boolean propsChanged = !propsFromConf.equals(prop);
    
          if (propsChanged) {
            pmf = null;
            prop = null;
          }
    
          assert(!isActiveTransaction());
          shutdown();
          // Always want to re-create pm as we don't know if it were created by the
          // most recent instance of the pmf
          pm = null;
          openTrasactionCalls = 0;
          currentTransaction = null;
          transactionStatus = TXN_STATUS.NO_STATE;
    
          initialize(propsFromConf);
    
          if (!isInitialized) {
            throw new RuntimeException(
            "Unable to create persistence manager. Check dss.log for details");
          } else {
            LOG.info("Initialized ObjectStore");
          }
        } finally {
          pmfPropLock.unlock();
        }
      }
    private void initialize(Properties dsProps) {
        LOG.info("ObjectStore, initialize called");
        prop = dsProps;
        pm = getPersistenceManager();
        isInitialized = pm != null;
        return;
      }


    回到一开始报错的那段信息,怎么会Persistence Manager会被关闭呢,仔细排查后才发现是由于HCatalog使用HiveMetastoreClient用完后主动调用了close方法,而一般Hive里面内部不会调这个方法.

    HiveMetaStoreClient.java

     

    public void close() {
        isConnected = false;
        try {
          if (null != client) {
            client.shutdown();
          }
        } catch (TException e) {
          LOG.error("Unable to shutdown local metastore client", e);
        }
        // Transport would have got closed via client.shutdown(), so we dont need this, but
        // just in case, we make this call.
        if ((transport != null) && transport.isOpen()) {
          transport.close();
        }
      }


    对应server端HMSHandler中的shutdown方法

    @Override
        public void shutdown() {
          logInfo("Shutting down the object store...");
          RawStore ms = threadLocalMS.get();
          if (ms != null) {
            ms.shutdown();
            ms = null;
          }
          logInfo("Metastore shutdown complete.");
        }


    ObjectStore的shutdown方法

     

    public void shutdown() {
        if (pm != null) {
          pm.close();
        }
      }

     

    我们看到shutdown方法里面只是把当前thread的ObjectStore拿出来后,做了一个ObjectStore shutdown方法,把pm关闭了。但是并没有把ObjectStore销毁掉,它还是存在于threadLocalMS中,下次还是会被拿出来,下一次这个thread服务于另外一个请求的时候又会被get出ObjectSture来,但是由于里面的pm已经close掉了所以肯定抛异常。正确的做法是应该加上threadLocalMS.remove()或者threadLocalMS.set(null),主动将其从ThreadLocalMap中删除。

    修改后的 shutdown方法

     

    public void shutdown() {
          logInfo("Shutting down the object store...");
          RawStore ms = threadLocalMS.get();
          if (ms != null) {
            ms.shutdown();
            ms = null;
            threadLocalMS.remove();
          }
          logInfo("Metastore shutdown complete.");
        }


  • 相关阅读:
    Android UI法宝的设计资源的开发
    Ural 1309 Dispute (递归)
    ZOJ3827 ACM-ICPC 2014 亚洲区域赛的比赛现场牡丹江I称号 Information Entropy 水的问题
    myeclipse如何恢复已删除的文件和代码
    在C#主线程和子线程将数据传递给对方如何实现
    SSh框架结构(Struts2.1+Hibernate4.0+Spring3.1)
    基于大数据分析的安全管理平台技术研究及应用【摘录】
    ulimit -t 引起的kill血案
    Oracle RAC 环境下的连接管理
    SMTP协议--在cmd下利用命令行发送邮件
  • 原文地址:https://www.cnblogs.com/dyllove98/p/3155361.html
Copyright © 2020-2023  润新知