• 如何查看SparkSQL 生成的抽象语法树?


    前言

        在《Spark SQL内核剖析》书中4.3章节,谈到Catalyst体系中生成的抽象语法树的节点都是以Context来结尾,在ANLTR4以及生成的SqlBaseParser解析SQL生成,其源码部分就是语法解析,其生成的抽象语法树的节点都是ParserRuleContext的子类。

    提出问题

        ANLTR4解析SQL生成抽象语法树,最终这颗树长成什么样子,如何查看?

    源码分析

    测试示例

    spark.sql("select id, count(name) from student group by id").show()

    源码入口

    SparkSession的sql 方法如下:

    def sql(sqlText: String): DataFrame = {
        // TODO 1. 生成LogicalPlan
        // sqlParser 为 SparkSqlParser
        val logicalPlan: LogicalPlan = sessionState.sqlParser.parsePlan(sqlText)
        // 根据 LogicalPlan
        val frame: DataFrame = Dataset.ofRows(self, logicalPlan)
        frame // sqlParser
      }

    定位SparkSqlParser

    入口源码涉及到SessionState这个关键类,其初始化代码如下:

    lazy val sessionState: SessionState = {
        parentSessionState
          .map(_.clone(this))
          .getOrElse {
            // 构建 org.apache.spark.sql.internal.SessionStateBuilder
            val state = SparkSession.instantiateSessionState(
              SparkSession.sessionStateClassName(sparkContext.conf),
              self)
            initialSessionOptions.foreach { case (k, v) => state.conf.setConfString(k, v) }
            state
          }
      }

    org.apache.spark.sql.SparkSession$#sessionStateClassName 方法具体如下:

    private def sessionStateClassName(conf: SparkConf): String = {
        // spark.sql.catalogImplementation, 分为 hive 和 in-memory模式,默认为 in-memory 模式
        conf.get(CATALOG_IMPLEMENTATION) match {
          case "hive" => HIVE_SESSION_STATE_BUILDER_CLASS_NAME // hive 实现 org.apache.spark.sql.hive.HiveSessionStateBuilder
          case "in-memory" => classOf[SessionStateBuilder].getCanonicalName // org.apache.spark.sql.internal.SessionStateBuilder
        }
      }

    其中,这里用到了builder模式,org.apache.spark.sql.internal.SessionStateBuilder就是用来构建 SessionState的。在 SparkSession.instantiateSessionState 中有具体说明,如下:

    /**
       * Helper method to create an instance of `SessionState` based on `className` from conf.
       * The result is either `SessionState` or a Hive based `SessionState`.
       */
      private def instantiateSessionState(
          className: String,
          sparkSession: SparkSession): SessionState = {
        try {
          // org.apache.spark.sql.internal.SessionStateBuilder
          // invoke `new [Hive]SessionStateBuilder(SparkSession, Option[SessionState])`
          val clazz = Utils.classForName(className)
          val ctor = clazz.getConstructors.head
          ctor.newInstance(sparkSession, None).asInstanceOf[BaseSessionStateBuilder].build()
        } catch {
          case NonFatal(e) =>
            throw new IllegalArgumentException(s"Error while instantiating '$className':", e)
        }
      }

    其中,BaseSessionStateBuilder下面有两个主要实现,分别为 org.apache.spark.sql.hive.HiveSessionStateBuilder(hive模式) 和 org.apache.spark.sql.internal.SessionStateBuilder(in-memory模式,默认)

    org.apache.spark.sql.internal.BaseSessionStateBuilder#build 方法,源码如下:

    /**
       * Build the [[SessionState]].
       */
      def build(): SessionState = {
        new SessionState(
          session.sharedState,
          conf,
          experimentalMethods,
          functionRegistry,
          udfRegistration,
          () => catalog,
          sqlParser,
          () => analyzer,
          () => optimizer,
          planner,
          streamingQueryManager,
          listenerManager,
          () => resourceLoader,
          createQueryExecution,
          createClone)
      }

    SessionState中,包含了很多的参数,关键参数介绍如下:

    conf:SparkConf对象,对SparkSession的配置

    functionRegistry:FunctionRegistry对象,负责函数的注册,其内部维护了一个map对象用于维护注册的函数。

    UDFRegistration:UDFRegistration对象,用于注册UDF函数,其依赖于FunctionRegistry

    catalogBuilder: () => SessionCatalog:返回SessionCatalog对象,其主要用于管理SparkSession的Catalog

    sqlParser: ParserInterface, 实际为 SparkSqlParser 实例,其内部调用ASTBuilder将SQL解析为抽象语法树

    analyzerBuilder: () => Analyzer, org.apache.spark.sql.internal.BaseSessionStateBuilder.analyzer 自定义 org.apache.spark.sql.catalyst.analysis.Analyzer.Analyzer

    optimizerBuilder: () => Optimizer, // org.apache.spark.sql.internal.BaseSessionStateBuilder.optimizer --> 自定义 org.apache.spark.sql.execution.SparkOptimizer.SparkOptimizer

    planner: SparkPlanner, // org.apache.spark.sql.internal.BaseSessionStateBuilder.planner --> 自定义 org.apache.spark.sql.execution.SparkPlanner.SparkPlanner

    resourceLoaderBuilder: () => SessionResourceLoader,返回资源加载器,主要用于加载函数的jar或资源

    createQueryExecution: LogicalPlan => QueryExecution:根据LogicalPlan生成QueryExecution对象

    parsePlan方法

    SparkSqlParser没有该方法的实现,具体是现在其父类 AbstractSqlParser中,如下:

    /** Creates LogicalPlan for a given SQL string. */
        // TODO 根据 sql语句生成 逻辑计划 LogicalPlan
      override def parsePlan(sqlText: String): LogicalPlan = parse(sqlText) { parser =>
          val singleStatementContext: SqlBaseParser.SingleStatementContext = parser.singleStatement()
        astBuilder.visitSingleStatement(singleStatementContext) match {
          case plan: LogicalPlan => plan
          case _ =>
            val position = Origin(None, None)
            throw new ParseException(Option(sqlText), "Unsupported SQL statement", position, position)
        }
      }

    其中 parse 方法后面的方法是一个回调函数,它在parse 方法中被调用,如下:

    org.apache.spark.sql.execution.SparkSqlParser#parse源码如下:

    private val substitutor = new VariableSubstitution(conf) // 参数替换器
    
      protected override def parse[T](command: String)(toResult: SqlBaseParser => T): T = {
        super.parse(substitutor.substitute(command))(toResult)
      }

    其中,substitutor是一个参数替换器,用于把SQL中的参数都替换掉,继续看其父类AbstractSqlParser的parse 方法:

    protected def parse[T](command: String)(toResult: SqlBaseParser => T): T = {
        logDebug(s"Parsing command: $command")
    
        // 词法分析
        val lexer = new SqlBaseLexer(new UpperCaseCharStream(CharStreams.fromString(command)))
        lexer.removeErrorListeners()
        lexer.addErrorListener(ParseErrorListener)
        lexer.legacy_setops_precedence_enbled = SQLConf.get.setOpsPrecedenceEnforced
    
        // 语法分析
        val tokenStream = new CommonTokenStream(lexer)
        val parser = new SqlBaseParser(tokenStream)
        parser.addParseListener(PostProcessor)
        parser.removeErrorListeners()
        parser.addErrorListener(ParseErrorListener)
        parser.legacy_setops_precedence_enbled = SQLConf.get.setOpsPrecedenceEnforced
    
        try {
          try {
            // first, try parsing with potentially faster SLL mode
            parser.getInterpreter.setPredictionMode(PredictionMode.SLL)
            // 使用 AstBuilder 生成 Unresolved LogicalPlan
            toResult(parser)
          }
          catch {
            case e: ParseCancellationException =>
              // if we fail, parse with LL mode
              tokenStream.seek(0) // rewind input stream
              parser.reset()
    
              // Try Again.
              parser.getInterpreter.setPredictionMode(PredictionMode.LL)
              toResult(parser)
          }
        }
        catch {
          case e: ParseException if e.command.isDefined =>
            throw e
          case e: ParseException =>
            throw e.withCommand(command)
          case e: AnalysisException =>
            val position = Origin(e.line, e.startPosition)
            throw new ParseException(Option(command), e.message, position, position)
        }
      }

    在这个方法中调用ANLTR4的API将SQL转换为AST抽象语法树,然后调用 toResult(parser) 方法,这个 toResult 方法就是parsePlan 方法的回调方法。

    截止到调用astBuilder.visitSingleStatement 方法之前, AST抽象语法树已经生成。

    打印生成的AST

    修改源码

    下面,看 astBuilder.visitSingleStatement  方法:

    override def visitSingleStatement(ctx: SingleStatementContext): LogicalPlan = withOrigin(ctx) {
        val statement: StatementContext = ctx.statement
        printRuleContextInTreeStyle(statement, 1)
        // 调用accept 生成 逻辑算子树AST
        visit(statement).asInstanceOf[LogicalPlan]
      }

    在使用访问者模式访问AST节点生成UnResolved LogicalPlan之前,我定义了一个方法用来打印刚解析生成的抽象语法树, printRuleContextInTreeStyle 代码如下:

    /**
       * 树形打印抽象语法树
       */
      private def printRuleContextInTreeStyle(ctx: ParserRuleContext, level:Int): Unit = {
        val prefix:String = "|"
        val curLevelStr: String = "-" * level
        val childLevelStr: String = "-" * (level + 1)
        println(s"${prefix}${curLevelStr} ${ctx.getClass.getCanonicalName}")
        val children: util.List[ParseTree] = ctx.children
        if( children == null || children.size() == 0) {
          return
        }
        children.iterator().foreach {
          case context: ParserRuleContext => printRuleContextInTreeStyle(context, level + 1)
          case _ => println(s"${prefix}${childLevelStr} ${ctx.getClass.getCanonicalName}")
        }
      }

    三种SQL打印示例

    SQL示例1(带where)

    select name from student where age > 18

    其生成的AST如下:

    |- org.apache.spark.sql.catalyst.parser.SqlBaseParser.StatementDefaultContext
    |-- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryContext
    |--- org.apache.spark.sql.catalyst.parser.SqlBaseParser.SingleInsertQueryContext
    |---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryTermDefaultContext
    |----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryPrimaryDefaultContext
    |------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionSeqContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.RelationContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableNameContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableIdentifierContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableAliasContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonOperatorContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonOperatorContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ConstantDefaultContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NumericLiteralContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.IntegerLiteralContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IntegerLiteralContext
    |---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext

    SQL示例2(带排序)

    select name from student where age > 18 order by id desc

    其生成的AST如下:

    |- org.apache.spark.sql.catalyst.parser.SqlBaseParser.StatementDefaultContext
    |-- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryContext
    |--- org.apache.spark.sql.catalyst.parser.SqlBaseParser.SingleInsertQueryContext
    |---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryTermDefaultContext
    |----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryPrimaryDefaultContext
    |------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionSeqContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.RelationContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableNameContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableIdentifierContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableAliasContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonOperatorContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ComparisonOperatorContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ConstantDefaultContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NumericLiteralContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.IntegerLiteralContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IntegerLiteralContext
    |---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext
    |----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext
    |----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext
    |----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.SortItemContext
    |------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.SortItemContext

    SQL示例2(带分组)

    select id, count(name) from student group by id

    其生成的AST如下:

    |- org.apache.spark.sql.catalyst.parser.SqlBaseParser.StatementDefaultContext
    |-- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryContext
    |--- org.apache.spark.sql.catalyst.parser.SqlBaseParser.SingleInsertQueryContext
    |---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryTermDefaultContext
    |----- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryPrimaryDefaultContext
    |------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QuerySpecificationContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionSeqContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionSeqContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.NamedExpressionContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.FunctionCallContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QualifiedNameContext
    |-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |---------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FunctionCallContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
    |-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |--------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |---------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
    |----------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |------------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FunctionCallContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.FromClauseContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.RelationContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableNameContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableIdentifierContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.TableAliasContext
    |------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.AggregationContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.AggregationContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.AggregationContext
    |-------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ExpressionContext
    |--------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.PredicatedContext
    |---------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ValueExpressionDefaultContext
    |----------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.ColumnReferenceContext
    |------------ org.apache.spark.sql.catalyst.parser.SqlBaseParser.IdentifierContext
    |------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |-------------- org.apache.spark.sql.catalyst.parser.SqlBaseParser.UnquotedIdentifierContext
    |---- org.apache.spark.sql.catalyst.parser.SqlBaseParser.QueryOrganizationContext

    总结

    在本篇文章中,主要从测试代码出发,到如何调用ANTLR4解析SQL得到生成AST,并且修改了源码来打印这个AST树。尽管现在看来,使用ANTLR解析SQL生成AST是一个black box,但对于Spark SQL来说,其后续流程的输入已经得到。

  • 相关阅读:
    高并发网络编程之epoll详解
    位操作实现加减乘除四则运算
    堆和栈的区别
    IT思想类智力题
    C/C++基础总结
    数据库总结
    面试网络总结
    Windows内存管理和linux内存管理
    面试操作系统总结
    数据结构与算法
  • 原文地址:https://www.cnblogs.com/johnny666888/p/12345142.html
Copyright © 2020-2023  润新知