通过AWR报告发现数据库cpu告警的问题,定位产生问题的sql语句
首先查看SQL的执行计划
SELECT count(1) FROM cmds.psn_cntr_holder a ,cmds.pol_attrib c ,cmds.std_contract b INNER JOIN cmds.CUSTOMINFO_E_SINGLE_CNTR k ON k.cntr_no = b.cntr_no AND k.e_flag = 'Y' AND k.branch_src = b.branch_src AND b.sys_src IN ( 'NBPS' ,'OBPS' ) WHERE 1 = 1 AND ( 1 = 0 OR a.hldr_cust_no = '2004140804000044007' AND a.branch_src = '140000' AND a.sys_src = 'CBPS8' OR a.hldr_cust_no = '2004140804000044007' AND a.branch_src = '000002' AND a.sys_src = 'UBPS' ) AND a.cntr_id = b.cntr_id AND b.pol_code = c.pol_cbps_code AND b.cntr_stat IN ('K') AND c.pol_term_code = 'L' AND b.sys_src IN ( SELECT syssrc FROM cmds.paramSystem p WHERE 1 = 1 AND p.orisys = 'ECSS' );
执行计划没有问题
-------------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 169 | 71 (0)| 00:00:01 | | | | 1 | SORT AGGREGATE | | 1 | 169 | | | | | |* 2 | HASH JOIN SEMI | | 1 | 169 | 71 (0)| 00:00:01 | | | | 3 | NESTED LOOPS | | 1 | 155 | 68 (0)| 00:00:01 | | | | 4 | NESTED LOOPS | | 2 | 155 | 68 (0)| 00:00:01 | | | | 5 | NESTED LOOPS | | 1 | 148 | 64 (0)| 00:00:01 | | | | 6 | MERGE JOIN CARTESIAN | | 1 | 85 | 60 (0)| 00:00:01 | | | | 7 | INLIST ITERATOR | | | | | | | | |* 8 | TABLE ACCESS BY GLOBAL INDEX ROWID| PSN_CNTR_HOLDER | 1 | 53 | 6 (0)| 00:00:01 | ROWID | ROWID | |* 9 | INDEX RANGE SCAN | IDX_PSN_CNTR_HOLDERNO | 3 | | 4 (0)| 00:00:01 | | | | 10 | BUFFER SORT | | 67 | 2144 | 54 (0)| 00:00:01 | | | | 11 | PARTITION RANGE ALL | | 67 | 2144 | 54 (0)| 00:00:01 | 1 | 35 | | 12 | PARTITION LIST ALL | | 67 | 2144 | 54 (0)| 00:00:01 | 1 | 2 | |* 13 | TABLE ACCESS FULL | CUSTOMINFO_E_SINGLE_CNTR | 67 | 2144 | 54 (0)| 00:00:01 | 1 | 70 | |* 14 | TABLE ACCESS BY GLOBAL INDEX ROWID | STD_CONTRACT | 1 | 63 | 4 (0)| 00:00:01 | ROWID | ROWID | |* 15 | INDEX RANGE SCAN | IDX_STD_CONTACT_CNTRID | 1 | | 3 (0)| 00:00:01 | | | |* 16 | INDEX RANGE SCAN | POL_ATTRIB_CBPS_CODE | 2 | | 1 (0)| 00:00:01 | | | |* 17 | TABLE ACCESS BY INDEX ROWID | POL_ATTRIB | 1 | 7 | 4 (0)| 00:00:01 | | | |* 18 | TABLE ACCESS FULL | PARAMSYSTEM | 2 | 28 | 3 (0)| 00:00:01 | | | --------------------------------------------------------------------------------------------------------------------------------------
检查下面的SQL的执行的session,因为比较奇怪,所以看看是谁执行的
SELECT DISTINCT TBL$OR$IDX$PART$NUM("STD_CONTRACT", 0, 2, 0, "SYS_SRC") FROM ( SELECT "P"."SYSSRC" "SYS_SRC" FROM "PARAMSYSTEM" "P" WHERE "P"."ORISYS" = 'ECSS' AND ( "P"."SYSSRC" = 'NBPS' OR "P"."SYSSRC" = 'OBPS' ) ) ORDER BY 1
无论是在当前的session视图还是历史的session视图都查不到这个sql的任何信息,定位sql里面的TBL$OR$IDX$PART$NUM 函数,oracle通过TBL$OR$IDX$PART$NUM函数来判断是否普通表满足分区置换的条件。
检查第一个sql的内容,发现INNER JOIN 部分存在一定的问题,建议开发调整该部分
INNER JOIN cmds.CUSTOMINFO_E_SINGLE_CNTR k ON k.cntr_no = b.cntr_no AND k.e_flag = 'Y' AND k.branch_src = b.branch_src AND b.sys_src IN ( 'NBPS' ,'OBPS' )
再看看sql的统计信息
可以看出,该sql返回一样产生的逻辑读非常高,不合理,但是sql从10046时间里面看,使用的执行计划是正确的,跟估算的是一样的。
这时猜测应该是触发了ORACLE bug,该sql引起了oracle内部非常高频率的反复运行分区置换条件的判断。
在oracle support里面确定BUG编号为:17572606
BUG处理方式:
_subquery_pruning_enabled=FALSE on a session or system level. OR use the no_subquery_pruning hint on the problematic queries.
It's a table-level hint so it would be "/*+ no_subquery_pruning(t1) ..
OR
Apply Patch 19889960 if available for your version and platform.