• MySQL Execution PlanLEFT JOIN语句中多条件执行顺序导致的性能问题


    问题描述

    有慢SQL如下:

    SELECT
    vendor.id,
    third_vendor_id,
    ctrip_id,
    vendor_type,
    vendor.lat as lat,
    vendor.lon as lon,
    vendor.create_time,
    vendor.update_time,
    l.hotel_name as name,
    l.hotel_address as address,
    l.hotel_seq as seq
    FROM vendor
    LEFT JOIN hotel_linkage_new l 
    ON vendor.ctrip_id != 0
    AND vendor.ctrip_id = l.extra 
    AND l.status = 'on'
    where vendor.id>18891
    ORDER BY vendor.id
    LIMIT  1000;
    

    对应执行计划为:

    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: vendor
             type: range
    possible_keys: PRIMARY
              key: PRIMARY
          key_len: 4
              ref: NULL
             rows: 30095
            Extra: Using where
    *************************** 2. row ***************************
               id: 1
      select_type: SIMPLE
            table: l
             type: ref
    possible_keys: idx_extra
              key: idx_extra
          key_len: 8
              ref: prmt_raven.vendor.ctrip_id
             rows: 1
            Extra: Using where
    

    问题分析

    测试vendor表单表查询性能(返回数据较快,耗时低于10ms)

    SELECT
    vendor.id,
    third_vendor_id,
    ctrip_id,
    vendor_type,
    vendor.lat as lat,
    vendor.lon as lon,
    vendor.create_time
    FROM vendor
    WHERE vendor.id>18891
    ORDER BY vendor.id
    LIMIT  1000;
    1000 rows in set (0.00 sec)
    

    尝试使用子查询来改写SQL:

    SELECT 
    T1.id,
    T1.third_vendor_id,
    T1.ctrip_id,
    T1.vendor_type,
    T1.lat as lat,
    T1.lon as lon,
    T1.create_time,
    T1.update_time,
    l.hotel_name as name,
    l.hotel_address as address,
    l.hotel_seq as seq
    FROM (
    	SELECT
    	vendor.id,
    	third_vendor_id,
    	ctrip_id,
    	vendor_type,
    	vendor.lat as lat,
    	vendor.lon as lon,
    	vendor.create_time,
    	vendor.update_time
    	FROM vendor
    	WHERE vendor.id>18891
    	ORDER BY vendor.id
    	LIMIT  1000
    ) AS T1
    LEFT JOIN hotel_linkage_new l 
    ON T1.ctrip_id != 0
    AND T1.ctrip_id = l.extra 
    AND l.status = 'on'
    ORDER BY T1.id
    LIMIT  1000
    

    改写后SQL执行性能无明显提升。

    查看LEFT JOIN操作关联的数据量:

    SELECT 
    COUNT(1) AS total_rows
    FROM (
    	SELECT
    	vendor.id,
    	ctrip_id
    	FROM vendor
    	WHERE vendor.id>18891
    	ORDER BY vendor.id
    	LIMIT  1000
    ) AS T1
    LEFT JOIN hotel_linkage_new l 
    ON T1.ctrip_id = l.extra;
    
    +------------+
    | total_rows |
    +------------+
    |    1028824 |
    +------------+
    

    查看按照匹配记录较多的ctrip_id:

    SELECT 
    T1.ctrip_id,
    COUNT(1) AS total_rows
    FROM (
    	SELECT
    	vendor.id,
    	ctrip_id
    	FROM vendor
    	WHERE vendor.id>18891
    	ORDER BY vendor.id
    	LIMIT  1000
    ) AS T1
    LEFT JOIN hotel_linkage_new l 
    ON T1.ctrip_id = l.extra
    GROUP BY T1.ctrip_id
    ORDER BY COUNT(1) DESC
    LIMIT 10;
    
    +----------+------------+
    | ctrip_id | total_rows |
    +----------+------------+
    |        0 |    1028223 |
    |   968878 |          2 |
    |  1978038 |          1 |
    |   732266 |          1 |
    |   764158 |          1 |
    |   445344 |          1 |
    |   420028 |          1 |
    |   669306 |          1 |
    |   437412 |          1 |
    |   918312 |          1 |
    +----------+------------+
    

    由于ctrip_id=0的记录超过100万,而LEFT JOIN中包含有vendor.ctrip_id != 0 AND vendor.ctrip_id = l.extra,便存在下面两种执行可能(伪代码):

    • 先执行vendor.ctrip_id != 0 再执行vendor.ctrip_id = l.extra:
    def get_left_join_rows1():
        vendor_rows = get_table_rows_by_primary_index(
            'SELECT * FROM vendor WHERE vendor.id>18891 ORDER BY vendor.id LIMIT 1000'
        )
        left_join_rows = []
        for vendor_row in vendor_rows:
            if vendor_row.ctrip_id != 0:
                linkage_rows = []
                index_rows = get_index_rows_by_secondary_index(
                    'SELECT extra,id FROM hotel_linkage_new WHERE extra={}'.format(vendor_row.ctrip_id)
                )
                for index_row in index_rows:
                    linkage_row = get_table_row_by_primary_index(
                        'SELECT * FROM hotel_linkage_new WHERE id={}'.format(index_row.id)
                    )
                    linkage_rows.append(linkage_row)
                if len(linkage_rows) > 0:
                    match_rows = linkage_rows
                else:
                    match_rows = [None]
            else:
                match_rows = [None]
            for match_row in match_rows:
                left_join_row = (vendor_row, match_row)
                left_join_rows.append(left_join_row)
        return left_join_rows
    
    • 先执行vendor.ctrip_id = l.extra 再执行vendor.ctrip_id != 0 :
    def get_left_join_rows2():
        vendor_rows = get_table_rows_by_primary_index(
            'SELECT * FROM vendor WHERE vendor.id>18891 ORDER BY vendor.id LIMIT 1000'
        )
        left_join_rows = []
        for vendor_row in vendor_rows:
            linkage_rows = []
            index_rows = get_index_rows_by_secondary_index(
                'SELECT extra,id FROM hotel_linkage_new WHERE extra={}'.format(vendor_row.ctrip_id)
            )
            for index_row in index_rows:
                linkage_row = get_table_row_by_primary_index(
                    'SELECT * FROM hotel_linkage_new WHERE id={}'.format(index_row.id)
                )
                linkage_rows.append(linkage_row)
            if len(linkage_rows) > 0:
                if vendor_row.ctrip_id != 0:
                    match_rows = linkage_rows
                else:
                    match_rows = [None]
            else:
                match_rows = [None]
            for match_row in match_rows:
                left_join_row = (vendor_row, match_row)
                left_join_rows.append(left_join_row)
        return left_join_rows
    

    虽然上面两种执行方式很相近,但执行性能相差极大:

    • 如果先执行vendor.ctrip_id = l.extra 再执行vendor.ctrip_id != 0,则需对hotel_linkage_new表执行 l.extra = vendor.ctrip_id = 0`的查询操作,该操作会造成1次索引范围扫描和1028223次主键回表查找,对另外的999条记录产生的999次索引范围扫描和999次主键回表查询。
    • 如果先执行vendor.ctrip_id != 0 再执行vendor.ctrip_id = l.extra,则可跳过对hotel_linkage_new表执行 l.extra = vendor.ctrip_id = 0`的查询操作,对另外的999条记录产生的999次索引范围扫描和999次主键回表查询。

    经过对比跟踪会话统计变量Handler_read_next和Handler_read_key可以断定执行计划采用先执行vendor.ctrip_id = l.extra 再执行vendor.ctrip_id != 0的执行计划,导致查询性能极差。

    优化建议

    在确认性能问题原因后,我们可以通过修改SQL语句来避免,修改后SQL为:

    SELECT * FROM (
    SELECT 
    T1.id,
    T1.third_vendor_id,
    T1.ctrip_id,
    T1.vendor_type,
    T1.lat as lat,
    T1.lon as lon,
    T1.create_time,
    T1.update_time,
    l.hotel_name as name,
    l.hotel_address as address,
    l.hotel_seq as seq
    FROM (
    	SELECT
    	vendor.id,
    	third_vendor_id,
    	ctrip_id,
    	vendor_type,
    	vendor.lat as lat,
    	vendor.lon as lon,
    	vendor.create_time,
    	vendor.update_time
    	FROM vendor
    	WHERE vendor.id>18891
    	ORDER BY vendor.id
    	LIMIT  1000
    ) AS T1
    LEFT JOIN hotel_linkage_new l 
    ON T1.ctrip_id = l.extra 
    AND l.status = 'on'
    WHERE T1.ctrip_id != 0
    UNION ALL
    SELECT 
    T1.id,
    T1.third_vendor_id,
    T1.ctrip_id,
    T1.vendor_type,
    T1.lat as lat,
    T1.lon as lon,
    T1.create_time,
    T1.update_time,
    NULL as name,
    NULL as address,
    NULL as seq
    FROM (
    	SELECT
    	vendor.id,
    	third_vendor_id,
    	ctrip_id,
    	vendor_type,
    	vendor.lat as lat,
    	vendor.lon as lon,
    	vendor.create_time,
    	vendor.update_time
    	FROM vendor
    	WHERE vendor.id>18891
    	ORDER BY vendor.id
    	LIMIT  1000
    ) AS T1
    WHERE T1.ctrip_id = 0
    ) AS T2
    ORDER BY T2.id
    LIMIT  1000
    
    
  • 相关阅读:
    Support依赖库大全
    反射调用泛型
    会爬行的小乌龟
    改进版——使用了双缓冲技术
    启动运行发现窗体不能最大化,添加
    添加图层
    实现放大,缩小,漫游,复位等功能
    从上一个项目中我得到的反思
    ​Error -4075: File not found. An error occurred merging module <MODULENAME> for feature <FEATURENAME>.
    总结—angularjs项目
  • 原文地址:https://www.cnblogs.com/gaogao67/p/15717375.html
Copyright © 2020-2023  润新知