MySQL Internals-How MySQL optimize single range
Louis Hust
0 Perface
When MySQL calculates the cost of QEP(Query execute plan), it will do a quick test for range scan with const values. But when the condition in where is complex, it should do the right thing for optimization, otherwise the exact plan may be lost. So this article showes how MySQL handle complex condition in WHERE with a sample in MySQL manual.
0 SEL_TREE & SEL_ARG
SEL_TREE & SEL_ARG are two important structs which is used in range optimization. Here we don't care what the two struct constains, what we should know is what they stand for. SEL_TREE can stand for a simple condition, such as c1 < 'aaa', and also can stand for complex condition, such as c1 > 'aaa' and c1 < 'bbb', so it's a process of recursion.
Just like the condition c1 > 'aaa' and c1 < 'bbb', first of all, there are two SEL_TREE structs st1, st2 stand for c1>'aaa', c1 < 'bbb' respectively, and the s1 and s1 do an 'tree and' operation to generate a new SEL_TREE st which stands for (c1 > 'aaa' and c1 < 'bbb').
What does SEL_ARG stand for? SEL_ARG stands for a range, such as c1 > 'aaa', so there is a SEL_ARG sa1 which belongs to st1, stands for c1 in range ('aaa', +inf) and a SEL_ARG sa2 which belongs to st2 stands for c1 in range (-inf, 'bbb'). And at last, the two will combine togother to generate a new SEL_ARG sa which stards for c1 in range ('aaa', 'bbb') and sa belongs to st.
0 Examples
mysql> show create table range_opt_single\G *************************** 1. row *************************** Table: range_opt_single Create Table: CREATE TABLE `range_opt_single` ( `c1` varchar(20) DEFAULT NULL, `c2` int(11) DEFAULT NULL, KEY `c1` (`c1`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec) mysql> EXPLAIN SELECT * FROM range_opt_single -> WHERE (c1 < 'a' AND (c1 LIKE 'abcde%' OR c1 LIKE '%b')) -> OR (c1 < 'aaa' AND c2 = 4) -> OR (c1 < 'uux' AND c1 > 'z')\G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: range_opt_single type: range possible_keys: c1 key: c1 key_len: 23 ref: NULL rows: 1 Extra: Using index condition; Using where 1 row in set (0.00 sec)
The WHERE condition is some kind of complex, three conditions combined with OR, let's look every part in deep.
- (c1 < 'a' AND (c1 LIKE 'abcde%' OR c1 LIKE '%b'))
(c1 LIKE 'abcde%') can generate a SEL_TREE st_12 with c1 in range ('abcde', 0x 61 62 63 64 65 ff ff);
(c1 LIKE '%b') can not generate a SEL_TREE, cause the '%b' starts with % which can not be optimized;
So tree_or(st_12, NULL) returns NULL;
(c1 < 'a') generates a SEL_TREE st_11 with c1 in range (-inf, 'a');
At last, this condition returns st_11. - (c1 < 'aaa' AND c2 = 4)
(c1 < 'aaa') generates a SEL_TREE st_21 with c1 in range (-inf, 'aaa');
(c2 = 4) can not generate SEL_TREE, cause c2 is not in the index;
So tree_and(st_21, NULL) returns st_21; - (c1 < 'uux' AND c1 > 'z')
(c1 < 'uux') generates a SEL_TREE st_31 with c1 in range (-inf, 'uux');
(c1 > 'uux') generates a SEL_TREE st_32 with c1 in range ('z', +inf);
tree_and(st_31, st_32) return a new SEL_TREE st_3 with an impossible SEL_ARG ('z', 'uux');
- tree_or(st_11, st_21)
Cause st_11 stands for (-inf, 'a') and st_21 stands for (-inf, 'aaa'), the OR operater will get the large range, so generate a new SEL_TREE st_n12 with range (-inf, 'aaa'); - tree_or(st_n12, st_3)
Cause st_3 is impossible, so just return st_n12.
At last, the WHERE condition return a SEL_TREE st_n12 with SEL_ARG stands for (-inf, 'aaa'), and the the optimizer will check the cost of a plan scan with the INDEX on c1 with the range (-inf, 'aaa').
0 Code Inspection
You may have figured out that the function for get SEL_TREE is recursive, actually it is. The name of the function is get_mm_tree.
References
File translated from
TEX
by
TTH,
version 4.03.
On 26 Jan 2013, 19:03.