case when 性能优化

case when 性能优化
背景：性能应该是功能的一个重要参考，特别是在大数据的背景之下！写SQL语句时如果仅考虑业务逻辑，而不去考虑语句效率问题，有可能导致严重的效率问题，导致功能不可用或者资源消耗过大。其中的一种情况是，处理每日增量数据的程序，实际执行过程中可能会进行全表扫描，效率与全量程序并无二致。

案例：

mio_log数据量：134,092,418条记录

freph_a01_fromtask3数据量：176,581,388条记录

生产系统上按照业务处理逻辑编写的SQL语句核心代码如下：
1. SELECT (CASE
2. WHEN c.in_force_dateISNOT NULL
3. THEN (CASE
4. WHEN a.mio_date>=c.in_force_dateTHENa.mio_date
5. ELSE c.in_force_date
6. END )
7. WHEN c.in_force_dateISNULL THEN (CASE
8. WHEN a.mio_date>=a.plnmio_dateTHENa.mio_date
9. ELSE a.plnmio_date
10. END )
11. ELSE a.mio_date
12. END ) mio_date
13. FROM dbo.mio_loga
14. INNER JOIN dbo.freph_a01_fromtask3c
15. ON a.cntr_no = c.cntr_no
16. AND a.pol_code=c.pol_code
17. WHERE ((c.in_force_dateISNOT NULL
18. AND((CASE
19. WHEN a.mio_date>=c.in_force_dateTHENa.mio_date
20. ELSE c.in_force_date
21. END ) BETWEEN @stat_begindateAND@stat_enddate))
22. OR(c.in_force_dateISNULL
23. AND((CASE
24. WHEN a.mio_date>=a.plnmio_dateTHENa.mio_date
25. ELSE a.plnmio_date
26. END ) BETWEEN @stat_begindateAND@stat_enddate)) )
导致虽然mio_log表的mio_date、plnmio_date字段，以及freph_a01_fromtask3表的in_force_date字段上均有索引，但是由于两表不同字段进行CASE WHEN比较，执行计划为聚集索引扫描：

优化思路：

由于mio_log表的mio_date、plnmio_date字段，以及freph_a01_fromtask3表的in_force_date字段上均有索引，可先通过单个mio_date、in_force_date、plnmio_date索引取出增量时间段数据，在增量数据上进行不同表、字段的比对。
1. SELECT (CASE
2. WHEN in_force_date IS NOT NULL
3. THEN ( CASE
4. WHEN mio_date >= in_force_dateTHENmio_date
5. ELSE in_force_date
6. END )
7. WHEN in_force_date IS NULL
8. THEN ( CASE
9. WHEN mio_date >= plnmio_dateTHENmio_date
10. ELSE plnmio_date
11. END )
12. ELSE mio_date
13. END ) mio_date
14. from(
15. SELECT a.mio_date,
16. c.in_force_date,
17. a.plnmio_date,
18. a.MIO_LOG_ID
19. FROM dbo.mio_loga
20. INNER JOIN dbo.freph_a01_fromtask3c
21. ON a.cntr_no = c.cntr_no
22. ANDa.pol_code=c.pol_code
23. WHERE
24. a.mio_dateBETWEEN@stat_begindateAND@stat_enddate
25. union
26. SELECT a.mio_date,
27. c.in_force_date,
28. a.plnmio_date,
29. a.MIO_LOG_ID
30. FROM dbo.mio_loga
31. INNER JOIN dbo.freph_a01_fromtask3c
32. ON a.cntr_no = c.cntr_no
33. ANDa.pol_code=c.pol_code
34. WHERE
35. c.in_force_dateBETWEEN@stat_begindateAND@stat_enddate
36. union
37. SELECT a.mio_date,
38. c.in_force_date,
39. a.plnmio_date,
40. a.MIO_LOG_ID
41. FROM dbo.mio_loga
42. INNER JOIN dbo.freph_a01_fromtask3c
43. ON a.cntr_no = c.cntr_no
44. ANDa.pol_code=c.pol_code
45. WHERE
46. a.plnmio_dateBETWEEN@stat_begindateAND@stat_enddate
48. ) T
49. WHERE ((in_force_dateIS NOT NULL
50. AND((CASE
51. WHEN mio_date>= in_force_dateTHENmio_date
52. ELSE in_force_date
53. END ) BETWEEN @stat_begindateAND@stat_enddate))
54. OR(in_force_dateIS NULL
55. AND((CASE
56. WHEN mio_date>= plnmio_dateTHENmio_date
57. ELSE plnmio_date
58. END ) BETWEEN @stat_begindateAND@stat_enddate)) )
该语句存在两个问题：

1.       如果子查询中mio_log、freph_a01_fromtask3没有主键，则需通过ROWID标识不同记录，即如果没有主键，可以通过ROWID进行替换。

ROWID这个概念在Oracle中非常重要，使用也非常广泛，其意义如下：

ROWIDPseudocolumn

Foreach row in the database, the ROWID pseudocolumn returns the address of therow. oracle Database rowid values contain information necessary to locate arow:

·         The dataobject number of the object

·         The datablock in the datafile in which the row resides

·         The positionof the row in the data block (first row is 0)

·         The datafilein which the row resides (first file is 1). The file number is relative to thetablespace.

SQLServer中并没有ROWID这个概念， SQL Server2008及以后版本中%%physloc%%虚拟列与ROWID最相近，信息如下：

The closest equivalent tothis in SQL Server is the rid which has three componentsFile:Page:Slot.

In SQL Server 2008 it ispossible to use the undocumented and unsupported %%physloc%% virtual column to see this. Thisreturns a binary(8) value with the Page ID in the firstfour bytes, then 2 bytes for File ID, followed by 2 bytes for the slot locationon the page.

The scalar function sys.fn_PhysLocFormatter or the sys.fn_PhysLocCracker TVF can be used to convert this into amore readable form.
1. CREATE TABLET(XINT);
3. INSERT INTOTVALUES(1),(2)
5. SELECT %%physloc%%AS[%%physloc%%],
6. sys.fn_PhysLocFormatter(%%physloc%%)AS[File:Page:Slot]
7. FROM T
%%physloc%%

File:Page:Slot

0x7600000001000000

(1:118:0)

0x7600000001000100

(1:118:1)

Note that this is not leveraged by the queryprocessor. Whilst it is possible to use this in a WHERE clause
1. SELECT *FROMT
2. WHERE %%physloc%%=0x7600000001000000
SQL Server will not directly seek to thespecified row. Instead it will do a full table scan, evaluate %%physloc%% foreach row and return the one that matches (if any do).

2. 该语句有parameter sniffing问题：

当使用存储过程的时候，总是要使用到一些变量。变量有两种，一种是在存储过程的外面定义的，当调用存储过程的时候，必须要给它代入值，SQLServer在编译时知道它的值是多少。还有一种变量是在存储过程里面定义的。它的值是在存储过程的语句执行过程中得到的。对这种本地变量，SQLServer在编译时不知道它的值是多少。

SQLServer在处理存储过程时，为了节省编译时间，是一次编译多次使用的。那么计划重用就有两个潜在问题：

（1）对于第一类变量，根据第一次运行时带入的值生成的执行计划，是不是就能够适合所有可能的变量值？

（2）对于第二类本地变量，SQL Server在编译时并不知道它的值是多少，那怎么选择“合适”的执行计划？

parametersniffing”问题的定义:因为语句的执行计划对变量值很敏感，而导致重用执行计划会遇到性能问题。本地变量做出来的执行计划是一种比较“中庸”的方法，一般不会有parameter sniffing那么严重，很多时候，它还是解决parametersniffing的一个候选方案。

解决parameter sniffing问题的方法：

（1）用exec()方式运行动态SQL语句：如果在存储过程里不是直接运行语句，而是把语句带上变量，生成一个字符串，再让exec()命令多动态语句运行，那SQL Server就会在运行到这个语句的时候，对动态语句进行编译。这时，SQLServer已经知道了变量的值，会根据值生成优化的执行计划，从而绕过parametersniffing问题。

（2）使用本地变量：如果把变量值赋给一个本地变量，SQLServer在编译的时候是没有办法知道这个本地变量的值的。所以它会根据表格里数据的一般分布情况“猜测”一个返回值。不管用户在调用存储过程的时候带入的变量值是多少，做出来的执行计划都是一样的。而这样的执行计划一般比较“中庸”，不会是最优的执行计划，但是对大多数变量值来讲，也不会是一个很差的执行计划。该方法的好处是保持了存储过程的优点，缺点是要修改存储过程，而执行计划也不是最优的。

（3）在语句里使用query hint指定执行计划：

在SELECT、INSERT、UPDATE、DELETE语句最后，可以加一个“Option(<query_hint>)”子句，对SQL Server将要生成的执行计划进行指导。目前的query_hint很强大，有十几种hint。完整的定义如下：
1. <query_hint>::=
2. { {HASH| ORDER } GROUP
3. | {CONCAT| HASH | MERGE} UNION
4. | {LOOP| MERGE | HASH} JOIN
5. | FASTnumber_rows
6. | FORCEORDER
7. | MAXDOPnumber_of_processors
8. | OPTIMIZEFOR( @vaariable_name= literal_constant[ , ...n ])
9. | PARAMETERIZATION{SIMPLE | FORCED }
10. | RECOMPILE
11. | ROBUSTPLAN
12. | KEEPPLAN
13. | KEEPFIXEDPLAN
14. | EXPANDVIEWS
15. | MAXRECURSIONnumber
16. | USEPLANN'xml_plan'
17. }
这些hint的用途不一样。有些是引导执行计划使用什么样的运算的，例如{HASH| ORDER } GROUP、{CONCAT |HASH | MERGE} UNION、{LOOP| MERGE|HASH} JOIN。有些是防止重编译的，例如PARAMETERIZATION{SIMPLE | FORCED }、KEEPPLAN、KEEPFIXEDPLAN，有些是强制重编译的，如RECOMPILE。有些是影响执行计划的选择的，如FASTnumber_rows、FORCEORDER、MAXDOPnumber_of_processors、OPTIMIZEFOR( @vaariable_name= literal_constant[ , ...n ])，它们是和在不同的场合。具体定义参见SQL Server联机帮助。

为避免parameter sniffing问题，主要有以下几种常见query hint

（1）Recompile

Recompile这个查询提示告诉SQL Server,语句在每一次存储过程运行的时候，都要重新编译一下。这样就能够使SQL Server根据当前变量的值，选一个最好的执行计划。对前面的那个例子，我们可以这么改写。
1. CREATE PROCNosniff_queryhint_recompile(@iINT)
2. AS
3. SELECT Count(b.SalesOrderID),
4. Sum(p.Weight)
5. FROM dbo.SalesOrderHeader_testa
6. INNER JOIN dbo.SalesOrderDetail_testb
7. ON a.SalesOrderID=b.SalesOrderID
8. INNER JOIN Production.Productp
9. ON b.ProductID=p.ProductID
10. WHERE a.SalesOrderID=@i
11. OPTION (recompile)
12. go
和这种方法类似的，是在存储过程的定义里直接指定"recompile"，也能达到避免parameter sniffing的效果。
1. CREATE PROCNosniff_spcreate_recompile(@iINT)
2. WITH recompile
3. AS
4. SELECT Count(b.SalesOrderID),
5. Sum(p.Weight)
6. FROM dbo.SalesOrderHeader_testa
7. INNER JOIN dbo.SalesOrderDetail_testb
8. ON a.SalesOrderID=b.SalesOrderID
9. INNER JOIN Production.Productp
10. ON b.ProductID=p.ProductID
11. WHERE a.SalesOrderID=@i
13. go
（2）指定JOIN运算
1. CREATE PROCNosniff_queryhint_joinhint(@iINT)
2. AS
3. SELECT Count(b.SalesOrderID),
4. Sum(p.Weight)
5. FROM dbo.SalesOrderHeader_testa
6. INNER JOIN dbo.SalesOrderDetail_testb
7. ON a.SalesOrderID=b.SalesOrderID
8. INNER hash JOIN Production.Productp
9. ON b.ProductID=p.ProductID
10. WHERE a.SalesOrderID=@i
11. go
（3） OPTIMIZEFOR（@variable_name= literal_constant[ , …n] ）

使用OPTIMIZE FOR 这个查询指导，就能够让SQL Server做到这一点。这是SQL 2005以后的一个新功能。
1. create procNoSniff_QueryHint_OptimizeFor(@iint)as
2. select count(b.SalesOrderID),sum(p.Weight)
3. from dbo.SalesOrderHeader_testa
4. inner joindbo.SalesOrderDetail_testb
5. on a.SalesOrderID=b.SalesOrderID
6. inner joinProduction.Productp
7. on b.ProductID=p.ProductID
8. where a.SalesOrderID=@i
9. option (optimizefor(@i= 75124))
10. go
（4） Plan Guide

以上方法有个明显的局限性，就是徐要修改存储过程定义。有些时候没有应用开发组的许可，修改存储过程是不可以的。对用sp_executesql方式调用的指令，问题更大，因为这些指令可能是写在应用程序里面而不是SQLServer里。数据库管理员没有办法去修改应用程序。自SQLServer 2005以后，引入和完善了一种叫PlanGuide的功能，数据库管理员可以告诉SQLServer，当运行某个语句时，请数据库使用我制定的执行计划。这样就不许要修改存储过程或者应用。例如可以用下面的方法，在原来那个有parameter sniffing问题的存储过程”Sniff”上，解决sniffing问题。
1. EXEC sp_create_plan_guide
2. @name= N'Guide1',
3. @stmt = N'select count(b.SalesOrderID),sum(p.Weight)
4. from dbo.SalesOrderHeader_test a
5. inner join dbo.SalesOrderDetail_test b
6. on a.SalesOrderID = b.SalesOrderID
7. inner join Production.Product p
8. on b.ProductID = p.ProductID
9. where a.SalesOrderID =@i',
10. @type = N'OBJECT',
11. @module_or_batch = N'Sniff',
12. @params = NULL,
13. @hints = N'OPTION (optimize for (@i = 75124))';
14. go
由于以上两个问题，导致该方案在实际中并不是很好用

最优解决方案：

总体优化思路与上面的类似，只不过取增量范围是通过mio_log、in_force_date、plnmio_date字段上的索引取出mio_log_id范围，这三个索引取出的最大mio_log_id的最大值为@mio_log_id_max，最小的mio_log_id的最小值为@mio_log_id_min，那么增量数据范围可取出为mio_log_idbetween @mio_log_id_min and @mio_log_id_max。这是因为是瞬间完成的，同时通过mio_log_id取增量时能够确保走聚集索引。

具体解决方案如下：
1. SELECT @mio_log_id_max3=Max(mio_log_id),
2. @mio_log_id_min3 = Min(mio_log_id)
3. FROM dbo.freph_a01_fromtask3c(INDEX=i2_freph_a01_fromtask3)
4. INNER loop JOIN mio_logaWITH(nolock)
5. ON a.cntr_no = c.cntr_no
6. AND a.pol_code=c.pol_code
7. WHERE c.in_force_dateBETWEEN@date_minAND @date_max
9. SELECT @mio_log_id_max2=Max(mio_log_id),
10. @mio_log_id_min2 = Min(mio_log_id)
11. FROM mio_log(INDEX=idx_mio_log_plnmio_date)
12. WHERE plnmio_dateBETWEEN@date_minAND @date_max
14. SELECT @mio_log_id_max1=Max(mio_log_id),
15. @mio_log_id_min1 = Min(mio_log_id)
16. FROM mio_log(INDEX=idx_mio_log_mio_date)
17. WHERE mio_dateBETWEEN@date_minAND @date_max
19. SELECT @mio_log_id_max=dbo.F_find_max(@mio_log_id_max1,@mio_log_id_max2,@mio_log_id_max3)
21. SELECT @mio_log_id_min=dbo.F_find_min(@mio_log_id_min1,@mio_log_id_min2,@mio_log_id_min3)
23. SELECT (CASE
24. WHEN in_force_date IS NOT NULL THEN
25. (CASE
26. WHEN mio_date>= in_force_dateTHENmio_date
27. ELSE in_force_date
28. END )
29. WHEN in_force_date IS NULL THEN
30. (CASE
31. WHEN mio_date>= plnmio_dateTHENmio_date
32. ELSE plnmio_date
33. END )
34. ELSE mio_date
35. END ) mio_date
36. FROM (SELECTa.mio_date,
37. a.plnmio_date,
38. c.in_force_date
39. FROM dbo.mio_logaWITH(nolock)
40. INNER JOIN dbo.freph_a01_fromtask3cWITH(nolock)
41. ON a.cntr_no = c.cntr_no
42. AND a.pol_code=c.pol_code
43. WHERE mio_log_id BETWEEN @mio_log_id_min AND @mio_log_id_max) T
44. WHERE ((t.in_force_dateISNOT NULL
45. AND((CASE
46. WHEN t.mio_date>=t.in_force_dateTHENt.mio_date
47. ELSE t.in_force_date
48. END ) BETWEEN @date_minAND@date_max ) )
49. OR(t.in_force_dateISNULL
50. AND((CASE
51. WHEN t.mio_date>=t.plnmio_dateTHENt.mio_date
52. ELSE t.plnmio_date
53. END ) BETWEEN @date_minAND@date_max ) ) )
该方案在实施过程中有两个问题需要注意：

1. 通过非聚集索引取聚集索引键的最大最小值时，其自身生成的执行计划效率低下，需要通过query hint指导SQL Server优化器选择正确的执行计划：
1. set statisticsioon
2. set statisticstimeon
4. declare @date_mindatetime
5. declare @date_maxdatetime
7. set @date_min='2013-07-15'
8. set @date_max='2013-07-25'
10. declare @mio_log_id_max1int
11. declare @mio_log_id_min1int
12. select @mio_log_id_max1=max(mio_log_id),@mio_log_id_min1=min(mio_log_id)
13. from mio_log
14. where mio_datebetween@date_minAND @date_max
执行计划如下为两个并行聚集索引扫描：

之所以通过聚集索引扫描来得到最大、最小mio_log_id，并不是进行完整的聚集索引扫描。SQL Server优化器以为从两头分别进行扫描，碰到第一个符合WHERE条件就返回的算法是最优的。而实验中通过参数得到的实际数据均分布在mio_log的最大端，得到最小的mio_log_id几乎就扫描了整个mio_log表，因而整个逻辑读为【到目前为止结果还没出来……，不等了】。

该问题可以通过指导SQL Server优化器选择正确的执行计划解决：
1. select @mio_log_id_max1=max(mio_log_id),@mio_log_id_min1=min(mio_log_id)
2. from mio_log(index=idx_mio_log_mio_date)
3. where mio_datebetween@date_minAND @date_max
执行计划如下：

逻辑读673，耗时215 ms。

2. 通过freph_a01_fromtask3表in_force_date字段获取mio_log表的mio_log_id时，其自身生成的执行计划效率低下，需要通过query hint指导SQL Server优化器选择正确的执行计划：
1. SELECT @mio_log_id_max3=Max(mio_log_id),
2. @mio_log_id_min3 = Min(mio_log_id)
3. FROM dbo.freph_a01_fromtask3c(INDEX=i2_freph_a01_fromtask3)
4. INNER loop JOIN mio_logaWITH(nolock)
5. ON a.cntr_no = c.cntr_no
6. AND a.pol_code=c.pol_code
7. WHERE c.in_force_dateBETWEEN@date_minAND @date_max
另外，在逻辑优化过程中，还用到了索引覆盖、关联字段添加索引、脏读等技术。

参考资料：

1. SQL Server ROWID: http://stackoverflow.com/questions/909155/equivalent-of-oracles-rowid-in-sql-server

2. 徐海蔚. Microsoft SQL Server企业级平台管理实践
相关阅读:
form表单生成的简单理解
 drupal里面的ajax最粗浅的理解-流程
 #array_parents #parents的区别
 hook_schema 小总结
 多语言的sitemap xml
做百度竞价的步骤不断总结
 为什么要baidu/Google问题尽量少在群里问问题
 JavaScript Window对象
 JavaScript 3种内置对象
 图片轮播
原文地址：https://www.cnblogs.com/lxl57610/p/7382697.html

%%physloc%%	File:Page:Slot
0x7600000001000000	(1:118:0)
0x7600000001000100	(1:118:1)