之前碰到一个隐式转换的问题,这里回顾总结下
业务需求是每天晚上要从生产环境同步数据到分析数据库,两个库的表按照系统设计之处表结构是要一样的,现在只模拟两个字段。
创建测试表
1 create table source_tab as select a.col1 pc_id,a.col2 pc_name from sreal_tab a;--模拟生产环境表 2 create table target_tab as select col1 tpc_id,col2 tpc_name from treal_tab where 1=2;--模拟分析数据库表 3 create index idx_tpc_id on target_tab (tpc_id);
如果记录已经在分析环境中存在的情况下,则更新tpc_name(名称字段);
以下是存储过程实现代码
1 create procedure proc_data_sync is 2 n_num number :=0; 3 n_cnt number; 4 begin 5 for s in (select pc_id,pc_name from source_tab) 6 loop 7 n_num := n_num + 1; 8 begin 9 n_cnt := 0; 10 select count(*) 11 into n_cnt 12 from target_tab t 13 where s.pc_id = tpc_id; 14 15 if n_cnt = 0 then 16 insert into target_tab 17 (tpc_id, 18 tpc_name 19 ) 20 values 21 (s.pc_id, 22 s.pc_name 23 ); 24 else 25 update target_tab 26 set tpc_name = s.pc_name 27 where tpc_id = s.pc_id; 28 end if; 29 exception 30 when others then 31 dbms_output.put_line('执行到pc_id='||s.pc_id||'时出错'||sqlerrm); 32 end; 33 if mod(n_num, 5000) = 0 then 34 --每5000条提交一次 35 commit; 36 end if; 37 end loop; 38 commit; 39 end proc_data_sync;
在存储过程中为了让
select count(*)
into n_cnt
from target_tab t
where s.pc_id = tpc_id;
以及update
update target_tab
set tpc_name = s.pc_name
where tpc_id = s.pc_id;
的查询速度创建了下列索引
create index idx_tpc_id on target_tab (tpc_id);
等待测试很长时间没有结果
看看会话在做什么
1 select sid from v$session where machine='pc-wxc' and status='ACTIVE' and module='PL/SQL Developer';
2 select c.SAMPLE_TIME,c.SQL_EXEC_ID,c.SQL_OPNAME,c.SESSION_ID,c.SQL_ID,c.SQL_PLAN_OPERATION,c.SQL_PLAN_OPTIONS from v$active_session_history c where session_id='396' ;
看到几乎都是全表扫描
10 05-2月 -16 11.24.23.784 上午 20791275 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL 11 05-2月 -16 11.24.22.774 上午 20791180 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL 12 05-2月 -16 11.24.21.774 上午 20791074 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL 13 05-2月 -16 11.24.20.774 上午 20790967 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL 14 05-2月 -16 11.24.19.774 上午 20790870 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL 15 05-2月 -16 11.24.18.761 上午 20790766 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL 16 05-2月 -16 11.24.17.761 上午 20790663 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL 17 05-2月 -16 11.24.16.761 上午 20790558 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL 18 05-2月 -16 11.24.15.751 上午 20790452 SELECT 396 1pt181n76fvzc TABLE ACCESS FULL
检查sql 执行计划
select * from TABLE(dbms_xplan.display_cursor('1pt181n76fvzc',1,'ADVANCED +PEEKED_BINDS'));
SQL_ID 1pt181n76fvzc, child number 0
-------------------------------------
SELECT COUNT(*) FROM TARGET_TAB T WHERE :B1 = TPC_ID
Plan hash value: 3416452088
---------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 2 (100)| |
| 1 | SORT AGGREGATE | | 1 | 34 | | |
|* 2 | TABLE ACCESS FULL| TARGET_TAB | 1 | 34 | 2 (0)| 00:00:01 |
---------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
1 - SEL$1
2 - SEL$1 / T@SEL$1
Outline Data
-------------
/*+
BEGIN_OUTLINE_DATA
IGNORE_OPTIM_EMBEDDED_HINTS
OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
DB_VERSION('11.2.0.3')
ALL_ROWS
OUTLINE_LEAF(@"SEL$1")
FULL(@"SEL$1" "T"@"SEL$1")
END_OUTLINE_DATA
*/
Peeked Binds (identified by position):
--------------------------------------
1 - :B1 (NUMBER): 27629632
Predicate Information (identified by operation id):
---------------------------------------------------
2 - filter(TO_NUMBER("TPC_ID")=:B1)
Column Projection Information (identified by operation id):
-----------------------------------------------------------
1 - (#keys=0) COUNT(*)[22]
Note
-----
- dynamic sampling used for this statement (level=2)
开始以为优化器选择执行计划应该是有它的道理吧,那评估下全部执行总时间
加入dbms_utility.get_time来测量时间
1 create or replace procedure proc_data_sync is 2 n_num number :=0; 3 n_cnt number; 4 start_time number; 5 end_time number; 6 begin 7 for s in (select pc_id,pc_name from source_tab where rownum <=10) 8 loop 9 n_num := n_num + 1; 10 start_time := dbms_utility.get_time; 11 begin 12 n_cnt := 0; 13 select count(*) 14 into n_cnt 15 from target_tab t 16 where s.pc_id = tpc_id; 17 18 if n_cnt = 0 then 19 insert into target_tab 20 (tpc_id, 21 tpc_name 22 ) 23 values 24 (s.pc_id, 25 s.pc_name 26 ); 27 else 28 update target_tab 29 set tpc_name = s.pc_name 30 where tpc_id = s.pc_id; 31 end if; 32 exception 33 when others then 34 dbms_output.put_line('执行到pc_id='||s.pc_id||'时出错'||sqlerrm); 35 end; 36 end_time := dbms_utility.get_time; 37 dbms_output.put_line(' 单次执行时间为' ||to_char((end_time - start_time) / 100, 38 'fm999990.999') || 's.'); 39 if mod(n_num, 5000) = 0 then 40 --每5000条提交一次 41 commit; 42 end if; 43 end loop; 44 commit; 45 end proc_data_sync;
end proc_data_sync;
结果是
单次执行时间为0.06s.
单次执行时间为0.02s.
单次执行时间为0.02s.
单次执行时间为0.02s.
单次执行时间为0.02s.
单次执行时间为0.02s.
单次执行时间为0.02s.
单次执行时间为0.02s.
单次执行时间为0.02s.
单次执行时间为0.02s.
单次执行时间为0.02s,
select count(*)*0.02/60/60 from source_tab
结果是10个小时可以同步完,这个结果显然不太能接受
是不是因为没有收集优化器统计信息?
那就收一下
1 begin 2 dbms_stats.delete_table_stats(ownname=>'user_name',tabname=>'target_tab',cascade_indexes=>true); 3 end;
收完后重复上述 过程问题依旧,什么情况?看执行计划,没什么发现,后注意到有这样的一行
2 - filter(TO_NUMBER("TPC_ID")=:B1)
这里做了函数的隐式类型转换,原来是这里的问题。
再去看看表结构 souce_tab的PC_ID列是number 类型 ,而target_tab的 TPC_ID是varrchar2的
至此问题就清晰了,在游标中做转换
for s in (select to_char(pc_id) pc_id,pc_name from source_tab)
再测试
单次执行时间为0.s.
单次执行时间为0.s.
单次执行时间为0.s.
单次执行时间为0.s.
单次执行时间为0.s.
单次执行时间为0.s.
单次执行时间为0.s.
单次执行时间为0.s.
单次执行时间为0.s.
再看执行计划
1 SQL_ID 1pt181n76fvzc, child number 2
2 -------------------------------------
3 SELECT COUNT(*) FROM TARGET_TAB T WHERE :B1 = TPC_ID
4
5 Plan hash value: 4054714533
6
7 --------------------------------------------------------------------------------
8 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
9 --------------------------------------------------------------------------------
10 | 0 | SELECT STATEMENT | | | | 1 (100)| |
11 | 1 | SORT AGGREGATE | | 1 | 34 | | |
12 |* 2 | INDEX RANGE SCAN| IDX_TPC_ID | 1 | 34 | 1 (0)| 00:00:01 |
13 --------------------------------------------------------------------------------
14
15 Query Block Name / Object Alias (identified by operation id):
16 -------------------------------------------------------------
17
18 1 - SEL$1
19 2 - SEL$1 / T@SEL$1
20
21 Outline Data
22 -------------
23
24 /*+
25 BEGIN_OUTLINE_DATA
26 IGNORE_OPTIM_EMBEDDED_HINTS
27 OPTIMIZER_FEATURES_ENABLE('11.2.0.3')
28 DB_VERSION('11.2.0.3')
29 ALL_ROWS
30 OUTLINE_LEAF(@"SEL$1")
31 INDEX(@"SEL$1" "T"@"SEL$1" ("TARGET_TAB"."TPC_ID"))
32 END_OUTLINE_DATA
33 */
34
35 Peeked Binds (identified by position):
36 --------------------------------------
37
38 1 - :B1 (VARCHAR2(30), CSID=852): '27629632'
39
40 Predicate Information (identified by operation id):
41 ---------------------------------------------------
42
43 2 - access("TPC_ID"=:B1)
44
45 Column Projection Information (identified by operation id):
46 -----------------------------------------------------------
47
48 1 - (#keys=0) COUNT(*)[22]
49
50 Note
51 -----
52 - dynamic sampling used for this statement (level=2)
53
"TPC_ID"=:B1 这个已经从过滤条件变成了访问条件,也就是用上tpc_id的列的索引了,隐式类型转换虽然让编写sql变得简单了,但是很多人不会注意到,这成了潜在的性能问题.
再跑测试出来后3分钟就搞定了,这个速度相差太远了吧.