Mysql大范围分页优化案例

在BBS线上业务抓到如下分页SQL：

142597301   meizu_bbs   192.168.17.72:39096 meizu_bbs   Query   217 Sending data    SELECT * FROM pre_forum_thread  WHERE fid=22 AND displayorder>=0 ORDER BY lastpost DESC  LIMIT 1933100, 50
142597338   meizu_bbs   192.168.17.72:39128 meizu_bbs   Query   216 Sending data    SELECT * FROM pre_forum_thread  WHERE fid=22 AND displayorder>=0 ORDER BY lastpost DESC  LIMIT 1933100, 50
142604367   nagiosuser  127.0.0.1:39893 NULL    Query   0   NULL    show full processlist

这个SQL一共有3个问题：

1：select * 这种写法不符合SQL编写规范，任何时候都不要用*来代替具体的列名称，需要什么列就取什么列。如果表里有个text/blob等大字段，影响就更加明显。

2：pre_forum_thread 表在tid字段做了分区，但是查询里面没有用到分区字段，这样就需要扫描全部分区，性能比不分区更差。

3：在这个分页SQL里，偏移量高到193万。

LIMIT语法：

[LIMIT {[offset,] row_count | row_count OFFSET offset}]

MYSQL是处理LIMIT语句的方式是：取出全部offset+rowcount,然后丢弃掉前面所有行，只返回row_count行。

在这个案例里，在mysql server端需要查询的行数为1933100+50，217S还未得出结果。

优化方案：最终需要的只是50行记录，如果先取出这50行记录的主键ID，这样是不是会很快？执行计划和执行时间：

mysql> explain partitions SELECT tid FROM pre_forum_thread  WHERE fid=22 AND displayorder>=0 ORDER BY lastpost DESC  LIMIT 1933100, 50;
+----+-------------+------------------+-----------------------------------------------------------------------------------------------------------+-------+-------------------------------------------------------------------------------+--------------+---------+------+---------+------------------------------------------+
| id | select_type | table            | partitions                                                                                                | type  | possible_keys                                                                 | key          | key_len | ref  | rows    | Extra                                    |
+----+-------------+------------------+-----------------------------------------------------------------------------------------------------------+-------+-------------------------------------------------------------------------------+--------------+---------+------+---------+------------------------------------------+
|  1 | SIMPLE      | pre_forum_thread | p0,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,p15,p16,p17,p18,p19,p20,p21,p22,p23,p24,p25,p26,p27,p28 | range | displayorder,rate,lastpost,fd,fdd,idx_fid_displayorder_heats,idx_displayorder | displayorder | 4       | NULL | 2673718 | Using where; Using index; Using filesort |
+----+-------------+------------------+-----------------------------------------------------------------------------------------------------------+-------+-------------------------------------------------------------------------------+--------------+---------+------+---------+------------------------------------------+
1 row in set (0.00 sec)
 
mysql> SELECT sql_no_cache tid FROM pre_forum_thread  WHERE fid=22 AND displayorder>=0 ORDER BY lastpost DESC  LIMIT 1933100, 50;
+--------+
| tid    |
+--------+
| 795442 |
.........
| 795387 |
| 795168 |
+--------+
50 rows in set (1.02 sec)

分析一下为什么只取出PK值会很快。在INNODB索引树里面，每个二级索引的叶子节点都会保存一份PK值，通过二级索引查找数据的过程是：从索引树的根节点开始比较索引值是否和查询值匹配，如果不匹配，根据情况进入左或右分支，再比较，直到在找到符合要求的节点，然后从叶节点里取出PK值，再回表根据主键得到全部数据。如果只是查找主键，那么就少了”然后从叶节点里取出PK值，再回表根据主键得到全部数据“这一部分，而这一部分正是最耗时的。在执行计划里可以看到”Using index“，这就说明优化器使用了”覆盖索引“，只需要扫描索引数据就可以得到最终数据，索引一般情况下比数据要小，往往常驻内存，所以虽然偏移量193万，也能给在1.02秒内返回结果。

得到这50个主键ID值之后，用这50条记录再关联原表查询：

mysql> explain select sql_no_cache * from pre_forum_thread A inner join (SELECT tid FROM pre_forum_thread  WHERE fid=22 AND displayorder>=0 ORDER BY lastpost DESC  LIMIT 1933100, 50) B on A.tid=B.tid;
+----+-------------+------------------+--------+-------------------------------------------------------------------------------+---------+---------+-------+---------+----------------+
| id | select_type | table            | type   | possible_keys                                                                 | key     | key_len | ref   | rows    | Extra          |
+----+-------------+------------------+--------+-------------------------------------------------------------------------------+---------+---------+-------+---------+----------------+
|  1 | PRIMARY     | <derived2>       | ALL    | NULL                                                                          | NULL    | NULL    | NULL  |      50 |                |
|  1 | PRIMARY     | A                | eq_ref | PRIMARY                                                                       | PRIMARY | 4       | B.tid |       1 |                |
|  2 | DERIVED     | pre_forum_thread | ALL    | displayorder,rate,lastpost,fd,fdd,idx_fid_displayorder_heats,idx_displayorder | NULL    | NULL    | NULL  | 3307262 | Using filesort |
+----+-------------+------------------+--------+-------------------------------------------------------------------------------+---------+---------+-------+---------+----------------+
3 rows in set (1.03 sec)
 
#执行时间
50 rows in set (1.06 sec)

处理分页的方法有很多种，在业务层面可以限制翻页的起始位置，不允许直接定位到10000页。在数据库查询方面也有别的方法来处理，根据相应的业务需要寻找最佳的处理方法。本案例里的 LIMIT 1933100, 50 需要规避。

相关阅读:
python 格式化 json输出
 python
回顾2013
C扩展Python
C扩展Python
Python
Python interview preparing
Python用smtplib发送邮件
 Python 安装路径， dist-packages 和 site-packages 区别
 nginx worker_cpu_affinity使用方法
原文地址：https://www.cnblogs.com/zuoxingyu/p/5642882.html