• MYSQL Nested Join Optimization


      table_factor的语法和标准sql比较,后者只接受table_reference,每个逗号项都等于一个inner Join,e.g.

    SELECT * FROM t1 LEFT JOIN (t2, t3, t4)
                     ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

    等于:

    SELECT * FROM t1 LEFT JOIN (t2 CROSS JOIN t3 CROSS JOIN t4)
                     ON (t2.a=t1.a AND t3.b=t1.b AND t4.c=t1.c)

       在mysql中,cross join 语法上等效与 inner join(他们可以互相替换),在标准sql中,他们并不等效,

       一般情况下,在只包含inner join操作的join表达式中,圆括号可以被忽略,但外连接时,省略圆括号会带来不一样的结果

    t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b OR t2.b IS NULL)
       ON t1.a=t2.a

      转变:

    (t1 LEFT JOIN t2 ON t1.a=t2.a) LEFT JOIN t3
        ON t2.b=t3.b OR t2.b IS NULL

       然而,这两个表达式并不等效,假设:

     1: 表t1包含 行(1),(2);
    2: 表t2包含行(1,101);
    3: 表t3包含行(101);


    在这种情况下,第一个表达式返回的结果集包括(1,1,101,101) ,(2,null,null,null),然而第二表达式返回(1,1,101,101),(2,null,null,101);

    mysql> select * from 
        ->    t1 left join 
        ->      (t2 left join t3 on t2.b = t3.b or t2.b is null )
        ->         on t1.a = t2.a;
    +------+------+------+------+
    | a    | a    | b    | b    |
    +------+------+------+------+
    |    1 |    1 |  101 |  101 |
    |    2 | NULL | NULL | NULL |
    +------+------+------+------+
    2 rows in set (0.02 sec)
    
    mysql> select * from
        ->    (t1 left join t2 on t1.a = t2.a)
        ->       left join t3
        ->           on t2.b = t3.b or t2.b is null;
    +------+------+------+------+
    | a    | a    | b    | b    |
    +------+------+------+------+
    |    1 |    1 |  101 |  101 |
    |    2 | NULL | NULL |  101 |
    +------+------+------+------+
    2 rows in set (0.00 sec)


      <============================================================================>

    t1 LEFT JOIN (t2, t3) ON t1.a=t2.a

     该表达式不能转变成:

    t1 LEFT JOIN t2 ON t1.a=t2.a, t3.
    Database changed
    mysql> select * from t1 left join (t2,t3) on t1.a = t2.a;
    +------+------+------+------+
    | a    | a    | b    | b    |
    +------+------+------+------+
    |    1 |    1 |  101 |  101 |
    |    2 | NULL | NULL | NULL |
    +------+------+------+------+
    2 rows in set (0.00 sec)
    
    mysql> select * from t1 left join t2 on t1.a = t2.a, t3;
    +------+------+------+------+
    | a    | a    | b    | b    |
    +------+------+------+------+
    |    1 |    1 |  101 |  101 |
    |    2 | NULL | NULL |  101 |
    +------+------+------+------+
    2 rows in set (0.00 sec)

    不同的结果集

        因此,如果我们去掉有outer join操作的join表达式的圆括号,我们会改变原始表达式的结果集。

        更加准确的,我们不能去掉左外连接的右括号操作,和右外连接的左括号操作。换句话,我们不能去掉外连接操作中的内表表达式的括号,外表的其他操作括号可以被去掉;

    (t1,t2) LEFT JOIN t3 ON P(t2.b,t3.b)

    等效于:

    t1, t2 LEFT JOIN t3 ON P(t2.b,t3.b)
     
    SELECT * FROM t1 LEFT JOIN (t2 LEFT JOIN t3 ON t2.b=t3.b) ON t1.a=t2.a
      WHERE t1.a > 1
    
    SELECT * FROM t1 LEFT JOIN (t2, t3) ON t1.a=t2.a
      WHERE (t2.b=t3.b OR t2.b IS NULL) AND t1.a > 1

    这梁查询都包含内连接:

    t2 LEFT JOIN t3 ON t2.b=t3.b
    t2, t3
    SELECT * FROM T1 INNER JOIN T2 ON P1(T1,T2)
                     INNER JOIN T3 ON P2(T2,T3)
      WHERE P(T1,T2,T3).

    该表达式的执行顺序:

    FOR each row t1 in T1 {
      FOR each row t2 in T2 such that P1(t1,t2) {
        FOR each row t3 in T3 such that P2(t2,t3) {
          IF P(t1,t2,t3) {
             t:=t1||t2||t3; OUTPUT t;
          }
        }
      }
    }
    SELECT * FROM T1 LEFT JOIN
                  (T2 LEFT JOIN T3 ON P2(T2,T3))
                  ON P1(T1,T2)
      WHERE P(T1,T2,T3).

    顺序:

    FOR each row t1 in T1 {
      BOOL f1:=FALSE;
      FOR each row t2 in T2 such that P1(t1,t2) {
        BOOL f2:=FALSE;
        FOR each row t3 in T3 such that P2(t2,t3) {
          IF P(t1,t2,t3) {
            t:=t1||t2||t3; OUTPUT t;
          }
          f2=TRUE;
          f1=TRUE;
        }
        IF (!f2) {
          IF P(t1,t2,NULL) {
            t:=t1||t2||NULL; OUTPUT t;
          }
          f1=TRUE;
        }
      }
      IF (!f1) {
        IF P(t1,NULL,NULL) {
          t:=t1||NULL||NULL; OUTPUT t;
        }
      }
    }

        

        在一般情况下,对于在一外联接操作的任何嵌套循环的第一内表,一个标志被引入在循环之前处于关闭并在循环之后被检查。该标志被打开时,用于从外部表的当前行从代表内操作数表中找到一个匹配的。如果在循环周期结束的标志仍然关闭,没有发现匹配的外部表的当前行。在这种情况下,该行被用于内表的列补充NULL值。结果行被传递到最后的检查为输出或到下一个嵌套循环,但仅当行满足所有嵌入式外的连接条件。

         对于外连接的查询,优化器可以只选择为循环表外循环在内部表之前的顺序。因此,我们外部连接的查询,只有一个嵌套顺序是可能的。对于下面的查询,优化器将评估两个不同的嵌套:

    SELECT * T1 LEFT JOIN (T2,T3) ON P1(T1,T2) AND P2(T1,T3)
      WHERE P(T1,T2,T3)
    FOR each row t1 in T1 {
      BOOL f1:=FALSE;
      FOR each row t2 in T2 such that P1(t1,t2) {
        FOR each row t3 in T3 such that P2(t1,t3) {
          IF P(t1,t2,t3) {
            t:=t1||t2||t3; OUTPUT t;
          }
          f1:=TRUE
        }
      }
      IF (!f1) {
        IF P(t1,NULL,NULL) {
          t:=t1||NULL||NULL; OUTPUT t;
        }
      }
    }

    和:

    FOR each row t1 in T1 {
      BOOL f1:=FALSE;
      FOR each row t3 in T3 such that P2(t1,t3) {
        FOR each row t2 in T2 such that P1(t1,t2) {
          IF P(t1,t2,t3) {
            t:=t1||t2||t3; OUTPUT t;
          }
          f1:=TRUE
        }
      }
      IF (!f1) {
        IF P(t1,NULL,NULL) {
          t:=t1||NULL||NULL; OUTPUT t;
        }
      }
    }

    如果P可以拆分:

    P(T1,T2,T2) = C1(T1) AND C2(T2) AND C3(T3).
    

    算法会变成

    FOR each row t1 in T1 such that C1(t1) {
      FOR each row t2 in T2 such that P1(t1,t2) AND C2(t2)  {
        FOR each row t3 in T3 such that P2(t2,t3) AND C3(t3) {
          IF P(t1,t2,t3) {
             t:=t1||t2||t3; OUTPUT t;
          }
        }
      }
    }

         每个合取C1(T1)的,C 2(T2),C 3(T3)被压出的最内环到最外循环在那里可以进行评估。如果C1(T 1)是一个非常严格的条件下,此条件下推可能大大从传递给内表T1减少行数。其结果是,在执行时间的查询可能提高极大。

  • 相关阅读:
    排查程序死循环,死锁的方法 ——pstack
    可变参数使用
    snprintf 返回值陷阱 重新封装
    linux 查看cpu个数,内存情况,系统版本
    nginx取结构体地址
    fuser命令使用心得
    Linux中dos2unix批量转换
    rpm中config,config(noreplace)区别
    slowhttptest慢攻击工具介绍
    jmeter性能测试
  • 原文地址:https://www.cnblogs.com/onlysun/p/4525570.html
Copyright © 2020-2023  润新知