• 手写一个简单的ElasticSearch SQL转换器(一)


       一.前言

       之前有个需求,是使ElasticSearch支持使用SQL进行简单查询,较新版本的ES已经支持该特性(不过貌似还是实验性质的?) ,而且git上也有elasticsearch-sql

    插件,之所以决定手写一个,主要有两点原因:

          1. 目前用的ES版本较老

          2. elasticsearch-sql虽好,但比较复杂,代码也不易维护

          3. 练练手

     二.技术选型

       目前主流软件中通常使用ANTLR做词法语法分析,诸如著名的Hibernate,Spark,Hive等项目,之前因为工作原因也有所接触,不过如果只是解析标准SQL的话,

     其实还有更好的选择,如使用Hibernate或阿里巴巴的数据库Druid(Druid采用了手写词法语法分析器的方案,这种方式当然比自动ANTLR生成的解析器性能高得多), 这里

     我选择了第二种方案。

         开始之前先看下我们可以通过Druid拿到的SQL语言的抽象语法树:

        

                                                      图片:https://www.jianshu.com/p/437aa22ea3ca

     三.技术实现

         首先我们创建一个SqlParser类,主流程都在parse方法中,该方法负责将一个SQL字符串解析(顺便说一句,Druid支持多种SQL方言,这里我选择了MySQL),

     并返回SearchSourceBuilder对象,这是一个ElasticSearch提供的DSL构建器,以该对象作为参数,ES client端即可发起对ES 服务端搜索请求。

        

     1 /**
     2  * 
     3  * @author fred
     4  *
     5  */
     6 public class SqlParser {
     7     private final static String dbType = JdbcConstants.MYSQL;
     8     private final static Logger logger = LoggerFactory.getLogger(SqlParser.class);
     9     private SearchSourceBuilder builder;
    10 
    11     public SqlParser(SearchSourceBuilder builder) {
    12         this.builder = builder;
    13     }
    14     /**
    15      * 将SQL解析为ES查询
    16      */
    17     public SearchSourceBuilder parse(String sql) throws Exception {
    18         if (Objects.isNull(sql)) {
    19             throw new IllegalArgumentException("输入语句不得为空");
    20         }
    21         sql = sql.trim().toLowerCase();
    22         List<SQLStatement> stmtList = SQLUtils.parseStatements(sql, dbType);
    23         if (Objects.isNull(stmtList) || stmtList.size() != 1) {
    24             throw new IllegalArgumentException("必须输入一句查询语句");
    25         }
    26         // 使用Parser解析生成AST
    27         SQLStatement stmt = stmtList.get(0);
    28         if (!(stmt instanceof SQLSelectStatement)) {
    29             throw new IllegalArgumentException("输入语句须为Select语句");
    30         }
    31         SQLSelectStatement sqlSelectStatement = (SQLSelectStatement) stmt;
    32         SQLSelectQuery sqlSelectQuery = sqlSelectStatement.getSelect().getQuery();
    33         SQLSelectQueryBlock sqlSelectQueryBlock = (SQLSelectQueryBlock) sqlSelectQuery;
    34 
    35         SQLExpr whereExpr = sqlSelectQueryBlock.getWhere();
    36 
    37         // 生成ES查询条件
    38         BoolQueryBuilder bridge = QueryBuilders.boolQuery();
    39         bridge.must();
    40 
    41         QueryBuilder whereBuilder = whereHelper(whereExpr); // 处理where
    42         bridge.must(whereBuilder);
    43         SQLOrderBy orderByExpr = sqlSelectQueryBlock.getOrderBy(); // 处理order by
    44         if (Objects.nonNull(orderByExpr)) {
    45             orderByHelper(orderByExpr, bridge);
    46         }
    47         builder.query(bridge);
    48         return builder;
    49     }

         

        主流程很简单,拿到SQL字符串后,直接通过Druid API将其转换为抽象语法树,我们要求输入语句必须为Select语句。接下来是对where语句和order by语句的处理,

      目前的难点其实主要在于如何将where语句映射到ES查询中。

         先从简单的看起,如何处理order by呢?SQL语句中 order by显然可以允许用户根据多字段排序,所以排序字段肯定是一个List<排序字段>,我们要做的就是将这个List映射到

    SearchSourceBuilder对象中。见下面代码:

        

     1     /**
     2      * 处理所有order by字段
     3      * 
     4      * @param orderByExpr
     5      */
     6     private void orderByHelper(SQLOrderBy orderByExpr, BoolQueryBuilder bridge) {
     7         List<SQLSelectOrderByItem> orderByList = orderByExpr.getItems(); // 待排序的列
     8         for (SQLSelectOrderByItem sqlSelectOrderByItem : orderByList) {
     9             if (sqlSelectOrderByItem.getType() == null) {
    10                 sqlSelectOrderByItem.setType(SQLOrderingSpecification.ASC); // 默认升序
    11             }
    12             String orderByColumn = sqlSelectOrderByItem.getExpr().toString();
    13             builder.sort(orderByColumn,
    14                     sqlSelectOrderByItem.getType().equals(SQLOrderingSpecification.ASC) ? SortOrder.ASC
    15                             : SortOrder.DESC);
    16         }
    17     }

       通过Druid的API,我们很容易拿到了SQL语句中所有的排序字段,我们逐个遍历这些字段,拿到排序的列名字面量和顺序,传递给SearchSourceBuilder的sort方法,需注意的

    是, 如果原始SQL中没有指定字段是顺序,我们默认升序。

       

        接下来我们处理稍微有点麻烦的where语句,因为SQL语句被解析成了语法树,很自然的我们想到使用递归方式进行处理。 而通常在处理递归问题的时候,

      我习惯于从递归的base case开始考虑,where语句中的运算符根据Druid API中的定义主要分为以下三种:

        1. 简单二元运算符:包括逻辑处理,如and, or 和大部分关系运算(后续会详细讲)

        2. between或not between运算符:我们可以简单的将其映射成ES中的Range Query

        3. in, not in 运算符: 可以简单的映射成ES中的Term Query

       通过Druid,我们可以很方便的获取每种运算中的运算符与操作数

     1 /**
     2      * 递归遍历“where”子树
     3      * 
     4      * @return
     5      */
     6     private QueryBuilder whereHelper(SQLExpr expr) throws Exception {
     7         if (Objects.isNull(expr)) {
     8             throw new NullPointerException("节点不能为空!");
     9         }
    10         BoolQueryBuilder bridge = QueryBuilders.boolQuery();
    11         if (expr instanceof SQLBinaryOpExpr) { // 二元运算
    12             SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符
    13             if (operator.isLogical()) { // and,or,xor
    14                 return handleLogicalExpr(expr);
    15             } else if (operator.isRelational()) { // 具体的运算,位于叶子节点
    16                 return handleRelationalExpr(expr);
    17             }
    18         } else if (expr instanceof SQLBetweenExpr) { // between运算
    19             SQLBetweenExpr between = ((SQLBetweenExpr) expr);
    20             boolean isNotBetween = between.isNot(); // between or not between ?
    21             String testExpr = between.testExpr.toString();
    22             String fromStr = formatSQLValue(between.beginExpr.toString());
    23             String toStr = formatSQLValue(between.endExpr.toString());
    24             if (isNotBetween) {
    25                 bridge.must(QueryBuilders.rangeQuery(testExpr).lt(fromStr).gt(toStr));
    26             } else {
    27                 bridge.must(QueryBuilders.rangeQuery(testExpr).gte(fromStr).lte(toStr));
    28             }
    29             return bridge;
    30         } else if (expr instanceof SQLInListExpr) { // SQL的 in语句,ES中对应的是terms
    31             SQLInListExpr siExpr = (SQLInListExpr) expr;
    32             boolean isNotIn = siExpr.isNot(); // in or not in?
    33             String leftSide = siExpr.getExpr().toString();
    34             List<SQLExpr> inSQLList = siExpr.getTargetList();
    35             List<String> inList = new ArrayList<>();
    36             for (SQLExpr in : inSQLList) {
    37                 String str = formatSQLValue(in.toString());
    38                 inList.add(str);
    39             }
    40             if (isNotIn) {
    41                 bridge.mustNot(QueryBuilders.termsQuery(leftSide, inList));
    42             } else {
    43                 bridge.must(QueryBuilders.termsQuery(leftSide, inList));
    44             }
    45             return bridge;
    46         }
    47         return bridge;
    48     }

       上述第一种情况比较复杂,首先我们先看看运算符是逻辑运算的情况:

        如下面的代码所示,如果运算符是逻辑运算符,我们需要对左右操作数分别递归,然后根据运算符类型归并结果:or可以映射成ES 中的Should,而and则映射成Must.

       

        /**
         * 逻辑运算符,目前支持and,or
         * 
         * @return
         * @throws Exception
         */
        private QueryBuilder handleLogicalExpr(SQLExpr expr) throws Exception {
            BoolQueryBuilder bridge = QueryBuilders.boolQuery();
            SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符
            SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft();
            SQLExpr rightExpr = ((SQLBinaryOpExpr) expr).getRight();
    
            // 分别递归左右子树,再根据逻辑运算符将结果归并
            QueryBuilder leftBridge = whereHelper(leftExpr);
            QueryBuilder rightBridge = whereHelper(rightExpr);
            if (operator.equals(SQLBinaryOperator.BooleanAnd)) {
                bridge.must(leftBridge).must(rightBridge);
            } else if (operator.equals(SQLBinaryOperator.BooleanOr)) {
                bridge.should(leftBridge).should(rightBridge);
            }
            return bridge;
        }

       下面来讨论下第一种情况中,如果运算符是关系运算符的情况,我们知道,SQL中的关系运算主要就是一些比较运算符,诸如大于,小于,等于,Like等,这里我还加上了

    正则搜索(不过貌似性能比较差,ES对正则搜索的限制颇多,不太建议使用)。

      

    /**
         * 大于小于等于正则
         * 
         * @param expr
         * @return
         */
        private QueryBuilder handleRelationalExpr(SQLExpr expr) {
            SQLExpr leftExpr = ((SQLBinaryOpExpr) expr).getLeft();
            if (Objects.isNull(leftExpr)) {
                throw new NullPointerException("表达式左侧不得为空");
            }
            String leftExprStr = leftExpr.toString();
            String rightExprStr = formatSQLValue(((SQLBinaryOpExpr) expr).getRight().toString()); // TODO:表达式右侧可以后续支持方法调用
            SQLBinaryOperator operator = ((SQLBinaryOpExpr) expr).getOperator(); // 获取运算符
            QueryBuilder queryBuilder;
            switch (operator) {
            case GreaterThanOrEqual:
                queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gte(rightExprStr);
                break;
            case LessThanOrEqual:
                queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lte(rightExprStr);
                break;
            case Equality:
                queryBuilder = QueryBuilders.boolQuery();
                TermQueryBuilder eqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr);
                ((BoolQueryBuilder) queryBuilder).must(eqCond);
                break;
            case GreaterThan:
                queryBuilder = QueryBuilders.rangeQuery(leftExprStr).gt(rightExprStr);
                break;
            case LessThan:
                queryBuilder = QueryBuilders.rangeQuery(leftExprStr).lt(rightExprStr);
                break;
            case NotEqual:
                queryBuilder = QueryBuilders.boolQuery();
                TermQueryBuilder notEqCond = QueryBuilders.termQuery(leftExprStr, rightExprStr);
                ((BoolQueryBuilder) queryBuilder).mustNot(notEqCond);
                break;
            case RegExp: // 对应到ES中的正则查询
                queryBuilder = QueryBuilders.boolQuery();
                RegexpQueryBuilder regCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr);
                ((BoolQueryBuilder) queryBuilder).mustNot(regCond);
                break;
            case NotRegExp:
                queryBuilder = QueryBuilders.boolQuery();
                RegexpQueryBuilder notRegCond = QueryBuilders.regexpQuery(leftExprStr, rightExprStr);
                ((BoolQueryBuilder) queryBuilder).mustNot(notRegCond);
                break;
            case Like:
                queryBuilder = QueryBuilders.boolQuery();
                MatchPhraseQueryBuilder likeCond = QueryBuilders.matchPhraseQuery(leftExprStr,
                        rightExprStr.replace("%", ""));
                ((BoolQueryBuilder) queryBuilder).must(likeCond);
                break;
            case NotLike:
                queryBuilder = QueryBuilders.boolQuery();
                MatchPhraseQueryBuilder notLikeCond = QueryBuilders.matchPhraseQuery(leftExprStr,
                        rightExprStr.replace("%", ""));
                ((BoolQueryBuilder) queryBuilder).mustNot(notLikeCond);
                break;
            default:
                throw new IllegalArgumentException("暂不支持该运算符!" + operator.toString());
            }
            return queryBuilder;
        }

        到这里我们就完成了SQL转ES DSL的功能了(其实只是简单查询的转换),下面我们写几个Junit测试一下吧:

        首先是简单的比较运算:

    public void normalSQLTest() {
            String sql = "select * from test where time>= 1";
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
             try {
                 searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
            } catch (Exception e) {
                e.printStackTrace();
            }
             System.out.println(searchSourceBuilder);
             SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
             QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1");
             BoolQueryBuilder briage = QueryBuilders.boolQuery();
             briage.must();
             briage.must(whereBuilder);
             builderToCompare.query(briage);
             assertEquals(searchSourceBuilder,builderToCompare);
        }

      下面是输出的ES 查询语句:

    {
      "query" : {
        "bool" : {
          "must" : [
            {
              "range" : {
                "time" : {
                  "from" : "1",
                  "to" : null,
                  "include_lower" : true,
                  "include_upper" : true,
                  "boost" : 1.0
                }
              }
            }
          ],
          "disable_coord" : false,
          "adjust_pure_negative" : true,
          "boost" : 1.0
        }
      }
    }

      再来个带排序的:

       

        @Test
        public void normalSQLWithOrderByTest() {
            String sql = "select * from test where time>= 1 order by time desc";
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
             try {
                 searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
            } catch (Exception e) {
                e.printStackTrace();
            }
             System.out.println(searchSourceBuilder);
             SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
             QueryBuilder whereBuilder = QueryBuilders.rangeQuery("time").gte("1");
             BoolQueryBuilder briage = QueryBuilders.boolQuery();
             briage.must();
             briage.must(whereBuilder);
             builderToCompare.sort("time",SortOrder.DESC);
             builderToCompare.query(briage);
             assertEquals(searchSourceBuilder,builderToCompare);
        }

       between, in这些没什么区别,就不贴代码了,最后看看稍微复杂点儿,带逻辑运算的查询:

      

    @Test
        public void sqlLogicTest() {
            String sql = "select * from test where raw_log not like"+"'%aaa' && b=1 or c=0";
            SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
             try {
                 searchSourceBuilder = new SqlParser(searchSourceBuilder).parse(sql);
            } catch (Exception e) {
                e.printStackTrace();
            }
             System.out.println(searchSourceBuilder);
             SearchSourceBuilder builderToCompare = new SearchSourceBuilder();
             QueryBuilder builder =QueryBuilders.matchPhraseQuery("raw_log","aaa"); 
             
             BoolQueryBuilder briage1 = QueryBuilders.boolQuery();//raw log not like
             briage1.mustNot(builder);  
             
             BoolQueryBuilder briage2 = QueryBuilders.boolQuery();  //b=1
             briage2.must(QueryBuilders.termQuery("b","1"));
             
             BoolQueryBuilder briage3 = QueryBuilders.boolQuery();   // not like and b=1
             briage3.must(briage1).must(briage2);
             
             BoolQueryBuilder briage4 = QueryBuilders.boolQuery();    //c =0
             briage4.must(QueryBuilders.termQuery("c","0"));
             
             BoolQueryBuilder briage5 = QueryBuilders.boolQuery();  // not like and b =1 or c =0
             briage5.should(briage3).should(briage4);
             
             
             
             BoolQueryBuilder briage6 = QueryBuilders.boolQuery();
             briage6.must();
             briage6.must(briage5);
             builderToCompare.query(briage6);
             assertEquals(searchSourceBuilder,builderToCompare);
        }

     下面是生成的查询语句:

       

    {
      "query" : {
        "bool" : {
          "must" : [
            {
              "bool" : {
                "should" : [
                  {
                    "bool" : {
                      "must" : [
                        {
                          "bool" : {
                            "must_not" : [
                              {
                                "match_phrase" : {
                                  "raw_log" : {
                                    "query" : "aaa",
                                    "slop" : 0,
                                    "boost" : 1.0
                                  }
                                }
                              }
                            ],
                            "disable_coord" : false,
                            "adjust_pure_negative" : true,
                            "boost" : 1.0
                          }
                        },
                        {
                          "bool" : {
                            "must" : [
                              {
                                "term" : {
                                  "b" : {
                                    "value" : "1",
                                    "boost" : 1.0
                                  }
                                }
                              }
                            ],
                            "disable_coord" : false,
                            "adjust_pure_negative" : true,
                            "boost" : 1.0
                          }
                        }
                      ],
                      "disable_coord" : false,
                      "adjust_pure_negative" : true,
                      "boost" : 1.0
                    }
                  },
                  {
                    "bool" : {
                      "must" : [
                        {
                          "term" : {
                            "c" : {
                              "value" : "0",
                              "boost" : 1.0
                            }
                          }
                        }
                      ],
                      "disable_coord" : false,
                      "adjust_pure_negative" : true,
                      "boost" : 1.0
                    }
                  }
                ],
                "disable_coord" : false,
                "adjust_pure_negative" : true,
                "boost" : 1.0
              }
            }
          ],
          "disable_coord" : false,
          "adjust_pure_negative" : true,
          "boost" : 1.0
        }
      }
    }

         

       四.总结

         本篇文章主要讲述了如何使用Druid实现SQL语句转换ES DSL进行搜索的功能,后续文章中会陆续完善这个功能,实现诸如聚合查询,分页查询等功能。

  • 相关阅读:
    常用网站
    我的第一个 python 爬虫脚本
    在文件夹下所有文件中查找字符串(linux/windows)
    Python 列表 insert() 方法
    mysql 替换 tab 键 ( )
    访问权限的修饰符
    eclipse 快捷键
    位运算
    hadoop 环境搭建
    Hadoop 快速入门
  • 原文地址:https://www.cnblogs.com/showing/p/11774719.html
Copyright © 2020-2023  润新知