• SparkThriftServer 源码分析


    版本

    spark 2.2.0

    起点

    • Spark thrift server复用了Hive Server2的源码,插入了自己的覆盖的方法。
    • 整个过程里面需要穿插着Hive和Spark的源码。
    • 整个流程是从Beeline开始的,Beeline属于是Hive的源码,下面开始进入流程:

    客户端——Beeline

    • jar包:hive-beeline-1.2.1.spark2.jar
    • SparkJDBC通过Beeline作为客户端,发送请求,与Spark服务进行交互
    • Beeline的入口:
    //位置:srcjavaorgapachehiveeelineBeeLine.java
      public static void main(String[] args) throws IOException {
        mainWithInputRedirection(args, null);
    }
      public static void mainWithInputRedirection(String[] args, InputStream inputStream)
        throws IOException
      {
        BeeLine beeLine = new BeeLine();
        int status = beeLine.begin(args, inputStream);
        if (!Boolean.getBoolean("beeline.system.exit")) {
          System.exit(status);
        }
      }
    
    • 调用beeLine.begin:解析传入的参数,调用execute方法
      public int begin(String[] args, InputStream inputStream)
        throws IOException
      {
        try
        {
          getOpts().load();
        }
        catch (Exception e) {}
        try
        {
          int code = initArgs(args);
          int i;
          if (code != 0) {
            return code;
          }
          if (getOpts().getScriptFile() != null) {
            return executeFile(getOpts().getScriptFile());
          }
          try
          {
            info(getApplicationTitle());
          }
          catch (Exception e) {}
          ConsoleReader reader = getConsoleReader(inputStream);
          return execute(reader, false);
        }
        finally
        {
          close();
        }
      }
      
    
    String line = getOpts().isSilent() ? reader.readLine(null, Character.valueOf('00')) : reader.readLine(getPrompt());
    if ((!dispatch(line)) && (exitOnError)) {
      return 2;
    }
    
    • execute读取输入流,调用dispath方法
    • dispatch:处理无效字符及help请求,
      • 若以!开始,则创建CommandHandler处理;
      • 否则调用Commands的sql函数处理sql命令:org.apache.hive.beeline.DatabaseConnection的getConnection创建连接
        this.commands.sql(line, getOpts().getEntireLineAsCommand());
    • 执行SQL: Commands.execute中调用JDBC标准接口,stmnt.execute(sql);具体流程参考下节jdbc中HiveStatement的execute执行
    • 处理结果集:Commands.execute中处理结果集,代码如下:
        if (hasResults)
                  {
                    do
                    {
                      ResultSet rs = stmnt.getResultSet();
                      try
                      {
                        int count = this.beeLine.print(rs);
                        long end = System.currentTimeMillis();
                        
                        this.beeLine.info(this.beeLine.loc("rows-selected", count) + " " + this.beeLine.locElapsedTime(end - start));
                      }
                      finally
                      {
                        if (logThread != null)
                        {
                          logThread.join(10000L);
                          showRemainingLogsIfAny(stmnt);
                          logThread = null;
                        }
                        rs.close();
                      }
                    } while (BeeLine.getMoreResults(stmnt));
                  }
                  else
                  {
                    int count = stmnt.getUpdateCount();
                    long end = System.currentTimeMillis();
                    this.beeLine.info(this.beeLine.loc("rows-affected", count) + " " + this.beeLine.locElapsedTime(end - start));
                  }
    

    服务端

    • Spark JDBC基于thrift框架,实现RPC服务,并复用了hiveServer中大量的代码
    • hive-jdbc通过封装rpc client请求,结果集处理,实现了JDBC标准接口
    • SparkThrift中实现了实际计算的流程

    Hive-jdbc

    TCLIService.Iface客户端请求

    • 取消请求
    TCancelOperationResp cancelResp = this.client.CancelOperation(cancelReq);
    
    • 关闭请求
        TCloseOperationReq closeReq = new TCloseOperationReq(this.stmtHandle);
        TCloseOperationResp closeResp = this.client.CloseOperation(closeReq);
    
    • 执行查询
    TExecuteStatementResp execResp = this.client.ExecuteStatement(execReq);
    
    • 查看执行状态
    statusResp = this.client.GetOperationStatus(statusReq);
    

    流程

    • jar包:hive-jdbc-1.2.1.spark2.jar
    • hive jdbc基于thrift框架(Facebook开源的RPC框架)实现,client(TCLIService.Iface client)RPC调用的客户端,远程调用HiveServer里面的TCLIService服务
    • 分析HiveStatement的execute方法
      • 执行查询:TExecuteStatementResp execResp = this.client.ExecuteStatement(execReq);
      • 检查operationComplete,在操作未结束时,持续调用client.GetOperationStatus,获取服务端执行状态
      • 解析状态码,若为2,则执行结束。接下来判断是否存在结果集,若存在则解析结果集
      • 解析结果集
      this.resultSet = new HiveQueryResultSet
          .Builder(this)
          .setClient(this.client)
          .setSessionHandle(this.sessHandle)
          .setStmtHandle(this.stmtHandle)
          .setMaxRows(this.maxRows)
          .setFetchSize(this.fetchSize)
          .setScrollable(this.isScrollableResultset)
          .setTransportLock(this.transportLock)
          .build();
      
    • execute代码
    public Boolean execute(String sql)
        throws SQLException
      {
    	checkConnection("execute");
    	closeClientOperation();
    	initFlags();
    	TExecuteStatementReq execReq = new TExecuteStatementReq(this.sessHandle, sql);
    	execReq.setRunAsync(true);
    	execReq.setConfOverlay(this.sessConf);
    	this.transportLock.lock();
    	try
    	    {
    		TExecuteStatementResp execResp = this.client.ExecuteStatement(execReq);
    		Utils.verifySuccessWithInfo(execResp.getStatus());
    		this.stmtHandle = execResp.getOperationHandle();
    		this.isExecuteStatementFailed = false;
    	}
    	catch (SQLException eS)
    	    {
    		this.isExecuteStatementFailed = true;
    		throw eS;
    	}
    	catch (Exception ex)
    	    {
    		this.isExecuteStatementFailed = true;
    		throw new SQLException(ex.toString(), "08S01", ex);
    	}
    	finally
    	    {
    		this.transportLock.unlock();
    	}
    	TGetOperationStatusReq statusReq = new TGetOperationStatusReq(this.stmtHandle);
    	Boolean operationComplete = false;
    	while (!operationComplete) {
    		try
    		      {
    			this.transportLock.lock();
    			TGetOperationStatusResp statusResp;
    			try
    			        {
    				statusResp = this.client.GetOperationStatus(statusReq);
    			}
    			finally
    			        {
    				this.transportLock.unlock();
    			}
    			Utils.verifySuccessWithInfo(statusResp.getStatus());
    			if (statusResp.isSetOperationState()) {
    				switch (1.$SwitchMap$org$apache$hive$service$cli$thrift$TOperationState[statusResp.getOperationState().ordinal()])
    				          {
    					case 1: 
    					          case 2: 
    					            operationComplete = true;
    					break;
    					case 3: 
    					            throw new SQLException("Query was cancelled", "01000");
    					case 4: 
    					            throw new SQLException(statusResp.getErrorMessage(), statusResp.getSqlState(), statusResp.getErrorCode());
    					case 5: 
    					            throw new SQLException("Unknown query", "HY000");
    				}
    			}
    		}
    		catch (SQLException e)
    		      {
    			this.isLogBeingGenerated = false;
    			throw e;
    		}
    		catch (Exception e)
    		      {
    			this.isLogBeingGenerated = false;
    			throw new SQLException(e.toString(), "08S01", e);
    		}
    	}
    	this.isLogBeingGenerated = false;
    	if (!this.stmtHandle.isHasResultSet()) {
    		return false;
    	}
    	this.resultSet = new HiveQueryResultSet.Builder(this).setClient(this.client).setSessionHandle(this.sessHandle).setStmtHandle(this.stmtHandle).setMaxRows(this.maxRows).setFetchSize(this.fetchSize).setScrollable(this.isScrollableResultset).setTransportLock(this.transportLock).build();
    	return true;
    }
    

    SparkThrift

    • SparkThrift服务中,启动了两个service:SparkSQLCLIService和ThriftHttpCLIService(ThriftBinaryCLIService)
    • ThriftHttpCLIService:是RPC调用的通道
    • SparkSQLCLIService:是用来对客户提出的请求进行服务的服务

    主函数HiveThriftServer2

    • 调用 SparkSQLEnv.init(),创建Sparksession、sparkConf等
    • init
      • 创建SparkSQLCLIService,并通过反射,设置cliService,并通过addService加入到父类的serviceList中,然后调用initCompositeService
        public void init(HiveConf hiveConf)
        {
          SparkSQLCLIService sparkSqlCliService = new SparkSQLCLIService(this, this.sqlContext);
          ReflectionUtils..MODULE$.setSuperField(this, "cliService", sparkSqlCliService);
          addService(sparkSqlCliService);
          
          ThriftCLIService thriftCliService = isHTTPTransportMode(hiveConf) ? 
            new ThriftHttpCLIService(sparkSqlCliService) : 
            
            new ThriftBinaryCLIService(sparkSqlCliService);
          
      
          ReflectionUtils..MODULE$.setSuperField(this, "thriftCLIService", thriftCliService);
          addService(thriftCliService);
          initCompositeService(hiveConf);
        }
      
      • initCompositeService,该函数封装了ReflectedCompositeService的initCompositeService
      private[thriftserver] trait ReflectedCompositeService { this: AbstractService =>
        def initCompositeService(hiveConf: HiveConf) {
          // Emulating `CompositeService.init(hiveConf)`
          val serviceList = getAncestorField[JList[Service]](this, 2, "serviceList")
          serviceList.asScala.foreach(_.init(hiveConf))
      
          // Emulating `AbstractService.init(hiveConf)`
          invoke(classOf[AbstractService], this, "ensureCurrentState", classOf[STATE] -> STATE.NOTINITED)
          setAncestorField(this, 3, "hiveConf", hiveConf)
          invoke(classOf[AbstractService], this, "changeState", classOf[STATE] -> STATE.INITED)
          getAncestorField[Log](this, 3, "LOG").info(s"Service: $getName is inited.")
        }
      }
      
      • 通过反射,拿到祖先类的serviceList成员变量,对这个List里面的每个成员调用了一次init方法
      • 由于已经把ThriftHttpCLIService和SparkSQLCLIService放入到这个List里面了,因此这里会调用到它们的init方法。
    • 给sc加了个HiveThriftServer2Listener,它也是继承自SparkListener,它会记录每个sql statement的时间、状态等信息。
        listener = new HiveThriftServer2Listener(server, SparkSQLEnv.sqlContext.conf)
        SparkSQLEnv.sparkContext.addSparkListener(listener)
        
        HiveThriftServer2Listener实现的接口包括:
        onJobStart
        onSessionCreated
        onSessionClosed
        onStatementStart
        onStatementParsed
        onStatementError
        onStatementFinish
        将这些信息主要记录在了sessionList和executionList中
    
    • thrift server启动后会向spark ui注册一个TAB:“JDBC/ODBC Server”。
    
          uiTab = if (SparkSQLEnv.sparkContext.getConf.getBoolean("spark.ui.enabled", true)) {
            Some(new ThriftServerTab(SparkSQLEnv.sparkContext))
          } else {
            None
          }
    

    ThriftHttpCLIService/ThriftBinaryCLIService

    • 封装SparkSQLCLIService
    • 对外提供http或者TCP服务

    ThriftHttpCLIService

    • 由于Spark里面没有实现这个类,而是完全复用的Hive的源码,这里直接看一下Hive中的ThriftHttpCLIService的start方法,由于ThriftHttpCLIService没有实现start方法,继续跟进到它的父类里面:
    //位置:hive/hive-1.1.0-cdh5.7.0/service/src/java/org/apache/hive/service/cli/thrift/ThriftCLIService.java
      @Override
      public synchronized void start() {
        super.start();
        if (!isStarted && !isEmbedded) {
          new Thread(this).start();
          isStarted = true;
        }
      }
    
    • 这个方法很简单,首先是调用父类的start方法,这里的父类也是AbstractService,因此,也是把服务的状态从INITED重置为STARTED。然后,启动了包裹自己的一个线程,这个线程会调用ThriftHttpCLIService类里面的run方法
     @Override
      public void run() {
        try {
          ...
          // HTTP Server
          httpServer = new org.eclipse.jetty.server.Server(threadPool);
    
          TProcessor processor = new TCLIService.Processor<Iface>(this);
          TProtocolFactory protocolFactory = new TBinaryProtocol.Factory();
          
          // 配置servlet,主要代码逻辑在processor
          TServlet thriftHttpServlet = new ThriftHttpServlet(processor, protocolFactory, authType,
              serviceUGI, httpUGI);
          context.addServlet(new ServletHolder(thriftHttpServlet), httpPath);
    
          httpServer.join();
        } catch (Throwable t) {
          LOG.fatal(
              "Error starting HiveServer2: could not start "
                  + ThriftHttpCLIService.class.getSimpleName(), t);
          System.exit(-1);
        }
      }
    
    • 这个方法是通过jetty启动了一个http服务,然后配置ThriftHttpServlet来处理用户的请求。
    • 注意这段代码中的processor对象,它是这个jetty服务的最主要的处理逻辑,下面跟进一下:
        protected Processor(I iface, Map<String,  org.apache.thrift.ProcessFunction<I, ? extends  org.apache.thrift.TBase>> processMap) {
          super(iface, getProcessMap(processMap));
        }
    
    • 这里的getProcessMap是用来处理各种请求的函数:
        private static <I extends Iface> Map<String,  org.apache.thrift.ProcessFunction<I, ? extends  org.apache.thrift.TBase>> getProcessMap(Map<String,  org.apache.thrift.ProcessFunction<I, ? extends  org.apache.thrift.TBase>> processMap) {
          processMap.put("OpenSession", new OpenSession());
          processMap.put("CloseSession", new CloseSession());
          processMap.put("GetInfo", new GetInfo());
          processMap.put("ExecuteStatement", new ExecuteStatement());
          processMap.put("GetTypeInfo", new GetTypeInfo());
          processMap.put("GetCatalogs", new GetCatalogs());
          processMap.put("GetSchemas", new GetSchemas());
          processMap.put("GetTables", new GetTables());
          processMap.put("GetTableTypes", new GetTableTypes());
          processMap.put("GetColumns", new GetColumns());
          processMap.put("GetFunctions", new GetFunctions());
          processMap.put("GetOperationStatus", new GetOperationStatus());
          processMap.put("CancelOperation", new CancelOperation());
          processMap.put("CloseOperation", new CloseOperation());
          processMap.put("GetResultSetMetadata", new GetResultSetMetadata());
          processMap.put("FetchResults", new FetchResults());
          processMap.put("GetDelegationToken", new GetDelegationToken());
          processMap.put("CancelDelegationToken", new CancelDelegationToken());
          processMap.put("RenewDelegationToken", new RenewDelegationToken());
          return processMap;
        }
    
    • 查看ExecuteStatement()接口,该接口的实现在CLIService中
      @Override
      public OperationHandle executeStatement(SessionHandle sessionHandle, String statement,
          Map<String, String> confOverlay)
              throws HiveSQLException {
        OperationHandle opHandle = sessionManager.getSession(sessionHandle)
            .executeStatement(statement, confOverlay);
        LOG.debug(sessionHandle + ": executeStatement()");
        return opHandle;
      }
    
      @Override
      public OperationHandle executeStatementAsync(SessionHandle sessionHandle, String statement,
          Map<String, String> confOverlay) throws HiveSQLException {
        OperationHandle opHandle = sessionManager.getSession(sessionHandle)
            .executeStatementAsync(statement, confOverlay);
        LOG.debug(sessionHandle + ": executeStatementAsync()");
        return opHandle;
      }
    
    • 可以看出,要想改变执行层,需要修改sessionManager、OperationHandle

    小结

    spark想借用hivejdbc服务,需要做以下几件事:

    • 重写OperationHandle,将执行操作交给spark sql来做
    • 重写sessionManager,保证获取的OperationHandle是spark OperationHandle
    • 重写SparkSQLCLIService,保证spark相关配置能传入执行类;保证重写的sparkSqlSessionManager加入到serviceList中

    SparkSQLCLIService

    • 继承CLIService

    SparkSQLCLIService

    • init代码
      override def init(hiveConf: HiveConf) {
        setSuperField(this, "hiveConf", hiveConf)
    
        val sparkSqlSessionManager = new SparkSQLSessionManager(hiveServer, sqlContext)
        setSuperField(this, "sessionManager", sparkSqlSessionManager)
        addService(sparkSqlSessionManager)
        var sparkServiceUGI: UserGroupInformation = null
    
        if (UserGroupInformation.isSecurityEnabled) {
          try {
            HiveAuthFactory.loginFromKeytab(hiveConf)
            sparkServiceUGI = Utils.getUGI()
            setSuperField(this, "serviceUGI", sparkServiceUGI)
          } catch {
            case e @ (_: IOException | _: LoginException) =>
              throw new ServiceException("Unable to login to kerberos with given principal/keytab", e)
          }
        }
    
        initCompositeService(hiveConf)
      }
    
    • 这里会创建一个SparkSQLSessionManager的实例,然后把这个实例放入到父类中,添加上这个服务。然后再通过initCompositeService方法,来调用到SparkSQLSessionManager实例的init方法

    SparkSQLSessionManager

    • SessionManager初始化时,会注册SparkSQLOperationManager,它用来:

      • 管理会话和hiveContext的关系,根据会话可以找到其hc;
      • 代替hive的OperationManager管理句柄和operation的关系。
    • thriftserver可以配置为单会话,即所有beeline共享一个hiveContext,也可以配置为新起一个会话,每个会话独享HiveContext,这样可以获得独立的UDF/UDAF,临时表,会话状态等。默认是新起会话。

    • 新建会话的时候,会将会话和hiveContext的对应关系添加到OperationManager的sessionToContexts这个Map

      sparkSqlOperationManager.sessionToContexts.put(sessionHandle, ctx)
      
    • init

      • 创建一个SparkSQLOperationManager对象,然后通过initCompositeService来调用SparkSQLOperationManager对象的init方法,由于这个对象并没有重写这个方法,因此需要追到它的父类OperationManager:
        private lazy val sparkSqlOperationManager = new SparkSQLOperationManager()
      
        override def init(hiveConf: HiveConf) {
          setSuperField(this, "hiveConf", hiveConf)
      
          // Create operation log root directory, if operation logging is enabled
          if (hiveConf.getBoolVar(ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)) {
            invoke(classOf[SessionManager], this, "initOperationLogRootDir")
          }
      
          val backgroundPoolSize = hiveConf.getIntVar(ConfVars.HIVE_SERVER2_ASYNC_EXEC_THREADS)
          setSuperField(this, "backgroundOperationPool", Executors.newFixedThreadPool(backgroundPoolSize))
          getAncestorField[Log](this, 3, "LOG").info(
            s"HiveServer2: Async execution pool size $backgroundPoolSize")
      
          setSuperField(this, "operationManager", sparkSqlOperationManager)
          addService(sparkSqlOperationManager)
      
          initCompositeService(hiveConf)
        }
      
      • OperationManager.init
       public synchronized void init(HiveConf hiveConf)
        {
          if (hiveConf.getBoolVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_ENABLED)) {
            initOperationLogCapture(hiveConf.getVar(HiveConf.ConfVars.HIVE_SERVER2_LOGGING_OPERATION_LEVEL));
          } else {
            this.LOG.debug("Operation level logging is turned off");
          }
          super.init(hiveConf);
        }
      
    • execute

      • CLIService根据句柄找到Session并执行statement
        public OperationHandle executeStatement(SessionHandle sessionHandle, String statement,
        Map<String, String> confOverlay)
            throws HiveSQLException {
          OperationHandle opHandle = sessionManager.getSession(sessionHandle)
              .executeStatement(statement, confOverlay);
          LOG.debug(sessionHandle + ": executeStatement()");
          return opHandle;
        }
      
      • 每次statement执行,都是新申请的operation,都会加到OperationManager去管理。newExecuteStatementOperation被SparkSQLOperationManager.newExecuteStatementOperation覆盖了,创建的operation实际是SparkExecuteStatementOperation。
        @Override
      public OperationHandle executeStatement(String statement, Map<String, String> confOverlay)
        throws HiveSQLException {
      return executeStatementInternal(statement, confOverlay, false);
      }
      
      private OperationHandle executeStatementInternal(String statement, Map<String, String> confOverlay,
        boolean runAsync)
            throws HiveSQLException {
      acquire(true);
      
      OperationManager operationManager = getOperationManager();
      ExecuteStatementOperation operation = operationManager
          .newExecuteStatementOperation(getSession(), statement, confOverlay, runAsync);
      OperationHandle opHandle = operation.getHandle();
      try {
      // 调用operation的run函数
        operation.run();
        opHandleSet.add(opHandle);
        return opHandle;
      } catch (HiveSQLException e) {
        // Refering to SQLOperation.java,there is no chance that a HiveSQLException throws and the asyn
        // background operation submits to thread pool successfully at the same time. So, Cleanup
        // opHandle directly when got HiveSQLException
        operationManager.closeOperation(opHandle);
        throw e;
      } finally {
        release(true);
      }
      
      
      // Operation 中run函数的实现
        public void run() throws HiveSQLException {
      beforeRun();
      try {
        runInternal();
      } finally {
        afterRun();
      }
      }
      
    • 整个的过程与Hive原生流程基本是一致的。在Hive的原生流程中,会在HiveServer2里面创建一个CLIService服务,这个跟Spark中的SparkSQLCLIService对应;然后在CLIService服务里面会创建一个SessionManager服务,这个跟Spark中的SparkSQLSessionManager对应;再之后,在SessionManager里面会创建一个OperationManager服务,这个跟Spark中的SparkSQLOperationManager对应。

    SparkExecuteStatementOperation

    • 如前所述,一个Operation分三步:beforeRun、runInternal、afterRun,beforeRun和afterRun用来记录日志,直接看runInternal。
    • runInternal默认为异步,即后台执行,SparkExecuteStatementOperation创建一个Runable对象,并将其提交到里面backgroundOperationPool,新起一个线程来做excute。
    • excute中最主要的是sqlContext.sql(statement)
    private def execute(): Unit = {
        statementId = UUID.randomUUID().toString
        logInfo(s"Running query '$statement' with $statementId")
        setState(OperationState.RUNNING)
    
      result = sqlContext.sql(statement)
      logDebug(result.queryExecution.toString())
      result.queryExecution.logical match {
        case SetCommand(Some((SQLConf.THRIFTSERVER_POOL.key, Some(value)))) =>
          sessionToActivePool.put(parentSession.getSessionHandle, value)
          logInfo(s"Setting spark.scheduler.pool=$value for future statements in this session.")
        case _ =>
      }
      HiveThriftServer2.listener.onStatementParsed(statementId, result.queryExecution.toString())
      iter = {
        if (sqlContext.getConf(SQLConf.THRIFTSERVER_INCREMENTAL_COLLECT.key).toBoolean) {
          resultList = None
          result.toLocalIterator.asScala
        } else {
          resultList = Some(result.collect())
          resultList.get.iterator
        }
      }
      dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray
    
        setState(OperationState.FINISHED)
        HiveThriftServer2.listener.onStatementFinish(statementId)
      }
    

    成员变量

    // 结果集DataFrame
    private var result: DataFrame = _
    // 结果集列表
    private var resultList: Option[Array[SparkRow]] = _
    结果集迭代器
    private var iter: Iterator[SparkRow] = _
    数据类型
    private var dataTypes: Array[DataType] = _
    语句ID,例如 “61146141-2a0a-41ec-bce4-dd691a0fa63c”
    private var statementId: String = _
    
    
    结果集schema,在getResultSetMetadata接口调用时,使用
      private lazy val resultSchema: TableSchema = {
        if (result == null || result.schema.isEmpty) {
          new TableSchema(Arrays.asList(new FieldSchema("Result", "string", "")))
        } else {
          logInfo(s"Result Schema: ${result.schema}")
          SparkExecuteStatementOperation.getTableSchema(result.schema)
        }
      }
    

    execute

    • 执行查询: sqlContext.sql(statement)
    • 保存结果集private var result: DataFrame
    • 保存结果集迭代器: private var iter: Iterator[SparkRow] = _
    • 结果集schema: dataTypes = result.queryExecution.analyzed.output.map(_.dataType).toArray

    getNextRowSet

    • def getNextRowSet(order: FetchOrientation, maxRowsL: Long)
    • order:是否从开始取;maxRowsL:最多取多少行;
    • 根据数据结构,构建rowset:
    val resultRowSet: RowSet = RowSetFactory.create(getResultSetSchema, getProtocolVersion)
    
    • 解析结果集
    resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]])
    
    • addRow调用RowBasedSet的addRow
    • 调用ColumnValue.toTColumnValue,将相应object进行数据类型转换
      public static TColumnValue toTColumnValue(Type type, Object value) {
        switch (type) {
        case BOOLEAN_TYPE:
          return booleanValue((Boolean)value);
        case TINYINT_TYPE:
          return byteValue((Byte)value);
        case SMALLINT_TYPE:
          return shortValue((Short)value);
        case INT_TYPE:
          return intValue((Integer)value);
        case BIGINT_TYPE:
          return longValue((Long)value);
        case FLOAT_TYPE:
          return floatValue((Float)value);
        case DOUBLE_TYPE:
          return doubleValue((Double)value);
        case STRING_TYPE:
          return stringValue((String)value);
        case CHAR_TYPE:
          return stringValue((HiveChar)value);
        case VARCHAR_TYPE:
          return stringValue((HiveVarchar)value);
        case DATE_TYPE:
          return dateValue((Date)value);
        case TIMESTAMP_TYPE:
          return timestampValue((Timestamp)value);
        case INTERVAL_YEAR_MONTH_TYPE:
          return stringValue((HiveIntervalYearMonth) value);
        case INTERVAL_DAY_TIME_TYPE:
          return stringValue((HiveIntervalDayTime) value);
        case DECIMAL_TYPE:
          return stringValue(((HiveDecimal)value));
        case BINARY_TYPE:
          return stringValue((String)value);
        case ARRAY_TYPE:
        case MAP_TYPE:
        case STRUCT_TYPE:
        case UNION_TYPE:
        case USER_DEFINED_TYPE:
          return stringValue((String)value);
        default:
          return null;
        }
      }
    

    总体启动调用逻辑

    • CLIService 父类:CompositeService
    //位置:hive/hive-1.1.0-cdh5.7.0/service/src/java/org/apache/hive/service/cli/CLIService.java
      @Override
      public synchronized void start() {
        super.start();
      }
    
    • CompositeService,调用所有serviceList中的服务start方法,然后调用父类start
    //位置:hive/hive-1.1.0-cdh5.7.0/service/src/java/org/apache/hive/service/CompositeService.java
      @Override
      public synchronized void start() {
        int i = 0;
        try {
          for (int n = serviceList.size(); i < n; i++) {
            Service service = serviceList.get(i);
            service.start();
          }
          super.start();
        } catch (Throwable e) {
          LOG.error("Error starting services " + getName(), e);
          // Note that the state of the failed service is still INITED and not
          // STARTED. Even though the last service is not started completely, still
          // call stop() on all services including failed service to make sure cleanup
          // happens.
          stop(i);
          throw new ServiceException("Failed to Start " + getName(), e);
        }
      }
    
    • SparkSQLCLIService的父类是CLIService,serviceList中包含SparkSQLSessionManager
    • SparkSQLSessionManager的父类是SessionManager,其父类是CompositeService,其serviceList包含SparkSQLOperationManager
    • 流程图

    参考

  • 相关阅读:
    ACMICPC 2009 China Northeast Area Contest :(
    2576 数论
    sql source control and sql prompt
    C语言中的单精度双精度数的有效位数
    [软件调试学习笔记]防止栈缓冲区溢出的基于Cookie的安全检查机制
    A tiny introduction to asynchronous IO
    Mysql扩展之replication概述
    C语言中的单精度双精度数的有效位数
    MySQL Cluster(MySQL 集群) 初试
    MySQL Cluster(MySQL 集群) 初试
  • 原文地址:https://www.cnblogs.com/bigbigtree/p/8872514.html
Copyright © 2020-2023  润新知