• [转]ORACLE小小说之武朝迷案


    作者:梁敬彬? wabjtam123?
    地址:ORACLE小小说之武朝迷案
    整理:killkill
    下载:原文及附件

    1、武朝迷案

    最近网管组遇到了一个困扰很长时间的棘手问题,就是如下执行频率极高的语句(用于权限管理),在生产环境中运行比较缓慢,执行需要200秒以上,急需优化,最好是控制在10秒以内。(测试环境服务器也是如此状况)

    据同事另外反映两点疑惑:

    1、生产环境及测试服务器环境运行时快时慢
    2、模拟的本机笔记本环境居然跑的比生产快几十倍!

    Select *
       From (Select Rownum Sid, a.*
               From (Select a.Ne_Alarm_List_Id Id,
                            a.Flow_Id, /*流程ID*/
                            a.Oprt_State Oprtstate, /*操作状态*/
                            a.Alarm_State Alarmstate, /*告警状态*/
                            a.Alarm_Level Alarmlevel, /*告警级别*/
                            a.Perf_Msg_Id, /*性能消息ID*/
                            a.Alarm_Class Alarmclass, /*告警类别*/
                            To_Char(a.Generate_Time, 'yyyy-mm-dd') Generatedate, /*产生日期*/
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd') Lastdate, /*最后产生日期*/
                            B1.List_Label Alarm_Type,
                            C1.Ne_Name,
                            d.Kpi_Name,
                            a.Kpi_Value,
                            Decode(C1.Ne_Flag, 6, e.Region_Name, C2.Ne_Name) Datasource,
                            '' ||
                            Decode(a.Alarm_Level,
                                   '1',
                                   '严重',
                                   '2',
                                   '重要',
                                   '3',
                                   '一般',
                                   '未知') || '' Alarm_Level,
                            To_Char(a.Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Generate_Time,
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Last_Generate_Time,
                            Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator,
                            a.Alarm_Times,
                            Decode(a.Oprt_State,
                                   '20',
                                   To_Char(a.Confirm_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '25',
                                   To_Char(a.Suspend_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '30',
                                   To_Char(a.Clear_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '40',
                                   To_Char(a.Delete_Time, 'yyyy-mm-dd hh24:mi:ss')) Executetim,
                            B2.List_Label Alarm_State,
                            Decode(a.Flow_Id, '', '未派单', '已派单') Work_State /*自定义列项*/
                       From Ne_Alarm_List a,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_NE_ALARM_TYPE') B1,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_STATE') B2,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_DR_ID_FLAG') B3,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_OPRT_STATE') B4,
                            Net_Element C1,
                            Net_Element C2,
                            Kpi_Code_List d,
                            Manage_Region e,
                            Kpi_Mapping_Cfg f,
                            Ne_Trans_Alarm Nta,
                            (Select t.Primary_Id "$PRIMARY_ID",
                                    Sum(t.Has_Read) "$HAS_READ"
                               From Tree_Privilege t
                              Where t.Tree_Cfg_Name = 'NET_ELEMENT'
                                And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
                         Group By t.Primary_Id) "$PRI_VIEW"
                 Where Nvl(a.Config_Ne_Id, a.Ne_Id) ="$PRI_VIEW"."$PRIMARY_ID"(+)
                        And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0
                        And B1.List_Value = a.Alarm_Type
                        And a.Ne_Id = C1.Ne_Id
                        And B2.List_Value = a.Alarm_State
                        And B3.List_Value = a.Dr_Id || ''
                        And B4.List_Value = a.Oprt_State
                        And a.Config_Ne_Id = C2.Ne_Id(+)
                        And a.Kpi_Id = d.Kpi_Id
                        And a.Kpi_Id = f.Kpi_Id(+)
                        And Nvl(a.Alarm_Region_Origin, '-1') =
                            To_Char(e.Region_Id(+))
                        And a.Ne_Alarm_List_Id = Nta.Ne_Alarm_List_Id(+)
                        And (Select Path || '/'
                               From Net_Element
                              Where Ne_Id = a.Ne_Id) Not Like
                            (Select Path || '/%'
                               From Net_Element
                              Where Ne_Type_Id = 30
                                And Ne_Flag = '6'
                                And State = '0SA') /*非业务系统*/
                        And To_Number(a.Dr_Id) = 0
                      Order By B2.Sort_Id,
                               B4.Sort_Id,
                               a.Alarm_Type,
                               a.Ne_Id,
                               Nvl(a.Last_Send_Time, a.Create_Time) Desc) a) b
      Where b.Sid >= 1
        And b.Sid <= 15;
    

    首先自己确认了一下这个情况,确实,测试环境中执行该语句,需要跑230多秒,继而在别人和自己的笔记本环境上实验(有将语句涉及到的数据同步进自己的环境),10秒左右结果就出来了。另外随着如下子查询IN值的变化,测试环境服务器上该脚本的执行速度也在变化,当该子查询的结果集很小的时候,查的很快。越大越慢!

    Select 
    		t.Primary_Id "$PRIMARY_ID",
    	 	Sum(t.Has_Read) "$HAS_READ"
        From Tree_Privilege t
       Where t.Tree_Cfg_Name = 'NET_ELEMENT'
         And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
    Group By t.Primary_Id) "$PRI_VIEW"	
    

    2、疑云重生

    表面情况都了解清楚了,现在要考虑深入其中,探索真相!

    首先想到的就是,为何在本机上可以跑的很快而在服务器上跑的很慢,不是涉及到的表数据量都一样吗?应该是执行计划出了问题,于是,各自做了个set autotrace traceonly来查看执行计划的差异.

    一分析,发现执行计划是有少许差异,但是逻辑读却有天壤之别,笔记本上788910 consistent gets 而服务器上是37494576 consistent gets 。递归调用对比更是离谱 :笔记本为 0 recursive calls而服务器为1305552 recursive calls 。

    服务器上的执行计划及统计信息:

    -----------------------------------------------------------------------------------------------------------------
    | Id  | Operation                             | Name                    | Rows  | Bytes | Cost (%CPU)| Time     |
    -----------------------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                      |                         |     1 |  2574 |  4426   (1)| 00:00:54 |
    |*  1 |  VIEW                                 |                         |     1 |  2574 |  4426   (1)| 00:00:54 |
    |   2 |   COUNT                               |                         |       |       |            |          |
    |*  3 |    FILTER                             |                         |       |       |            |          |
    |   4 |     VIEW                              |                         |     1 |  2574 |  4424   (1)| 00:00:54 |
    |   5 |      SORT ORDER BY                    |                         |     1 |   368 |  4424   (1)| 00:00:54 |
    |*  6 |       FILTER                          |                         |       |       |            |          |
    |*  7 |        HASH JOIN OUTER                |                         |     1 |   368 |  4423   (1)| 00:00:54 |
    |   8 |         NESTED LOOPS                  |                         |     1 |   349 |  3373   (1)| 00:00:41 |
    |   9 |          NESTED LOOPS                 |                         |     1 |   328 |  3372   (1)| 00:00:41 |
    |  10 |           NESTED LOOPS OUTER          |                         |     1 |   282 |  3371   (1)| 00:00:41 |
    |  11 |            NESTED LOOPS OUTER         |                         |     1 |   263 |  3370   (1)| 00:00:41 |
    |* 12 |             HASH JOIN OUTER           |                         |     1 |   255 |  3370   (1)| 00:00:41 |
    |* 13 |              HASH JOIN                |                         |     1 |   245 |  3367   (1)| 00:00:41 |
    |* 14 |               HASH JOIN               |                         |     2 |   420 |  3361   (1)| 00:00:41 |
    |* 15 |                TABLE ACCESS FULL      | TP_DOMAIN_LISTVALUES    |     3 |    87 |     5   (0)| 00:00:01 |
    |* 16 |                HASH JOIN              |                         |    94 | 17014 |  3355   (1)| 00:00:41 |
    |* 17 |                 TABLE ACCESS FULL     | TP_DOMAIN_LISTVALUES    |     1 |    38 |     5   (0)| 00:00:01 |
    |* 18 |                 HASH JOIN RIGHT OUTER |                         |  1144 |   159K|  3350   (1)| 00:00:41 |
    |  19 |                  TABLE ACCESS FULL    | NE_TRANS_ALARM          |     1 |    13 |     3   (0)| 00:00:01 |
    |* 20 |                  HASH JOIN            |                         |  1144 |   145K|  3346   (1)| 00:00:41 |
    |* 21 |                   TABLE ACCESS FULL   | TP_DOMAIN_LISTVALUES    |     3 |    78 |     5   (0)| 00:00:01 |
    |* 22 |                   TABLE ACCESS FULL   | NE_ALARM_LIST           | 99419 |     9M|  3340   (1)| 00:00:41 |
    |* 23 |               TABLE ACCESS FULL       | TP_DOMAIN_LISTVALUES    |    22 |   770 |     5   (0)| 00:00:01 |
    |  24 |              TABLE ACCESS FULL        | MANAGE_REGION           |   237 |  2370 |     3   (0)| 00:00:01 |
    |* 25 |             INDEX UNIQUE SCAN         | PK_KPI_MAPPING_CFG      |     1 |     8 |     0   (0)| 00:00:01 |
    |  26 |            TABLE ACCESS BY INDEX ROWID| NET_ELEMENT             |     1 |    19 |     1   (0)| 00:00:01 |
    |* 27 |             INDEX UNIQUE SCAN         | PK_NET_ELEMENT          |     1 |       |     0   (0)| 00:00:01 |
    |  28 |           TABLE ACCESS BY INDEX ROWID | KPI_CODE_LIST           |     1 |    46 |     1   (0)| 00:00:01 |
    |* 29 |            INDEX UNIQUE SCAN          | PK_KPI_CODE_LIST        |     1 |       |     0   (0)| 00:00:01 |
    |  30 |          TABLE ACCESS BY INDEX ROWID  | NET_ELEMENT             |     1 |    21 |     1   (0)| 00:00:01 |
    |* 31 |           INDEX UNIQUE SCAN           | PK_NET_ELEMENT          |     1 |       |     0   (0)| 00:00:01 |
    |  32 |         VIEW                          |                         |   170 |  3230 |  1050   (1)| 00:00:13 |
    |  33 |          HASH GROUP BY                |                         |   170 |  5100 |  1050   (1)| 00:00:13 |
    |* 34 |           TABLE ACCESS FULL           | TREE_PRIVILEGE          |   175 |  5250 |  1049   (1)| 00:00:13 |
    |  35 |     TABLE ACCESS BY INDEX ROWID       | NET_ELEMENT             |     1 |    38 |     2   (0)| 00:00:01 |
    |* 36 |      INDEX UNIQUE SCAN                | PK_NET_ELEMENT          |     1 |       |     1   (0)| 00:00:01 |
    |* 37 |      TABLE ACCESS BY INDEX ROWID      | NET_ELEMENT             |     1 |    43 |     2   (0)| 00:00:01 |
    |* 38 |       INDEX RANGE SCAN                | IDX_NE_ELEM_NET_TYPE_ID |     3 |       |     1   (0)| 00:00:01 |
    -----------------------------------------------------------------------------------------------------------------
    
    Statistics
    ----------------------------------------------------------
        1305552  recursive calls
       37494576  consistent gets
          16030  physical reads
    

    笔记本上的计划及统计信息:

    ---------------------------------------------------------------------------------------------------
    | Id  | Operation                                  | Name                 | Rows  | Bytes | Cost  |
    ---------------------------------------------------------------------------------------------------
    |   0 | SELECT STATEMENT                           |                      |     1 |  2574 |  1275 |
    |*  1 |  VIEW                                      |                      |     1 |  2574 |  1275 |
    |   2 |   COUNT                                    |                      |       |       |       |
    |   3 |    VIEW                                    |                      |     1 |  2561 |  1275 |
    |   4 |     SORT ORDER BY                          |                      |     1 |   660 |  1275 |
    |*  5 |      FILTER                                |                      |       |       |       |
    |*  6 |       HASH JOIN                            |                      |     1 |   660 |  1271 |
    |*  7 |        HASH JOIN OUTER                     |                      |     1 |   635 |  1267 |
    |*  8 |         FILTER                             |                      |       |       |       |
    |*  9 |          HASH JOIN OUTER                   |                      |     1 |   625 |  1264 |
    |* 10 |           HASH JOIN                        |                      |     1 |   485 |  1256 |
    |* 11 |            HASH JOIN                       |                      |     1 |   451 |  1252 |
    |* 12 |             HASH JOIN                      |                      |     1 |   423 |  1248 |
    |  13 |              NESTED LOOPS                  |                      |     1 |   386 |  1244 |
    |  14 |               NESTED LOOPS OUTER           |                      |     1 |   366 |  1243 |
    |  15 |                NESTED LOOPS                |                      |     1 |   348 |  1242 |
    |  16 |                 NESTED LOOPS OUTER         |                      |  6914 |  2133K|  1242 |
    |  17 |                  NESTED LOOPS OUTER        |                      |  6914 |  2079K|  1242 |
    |  18 |                   TABLE ACCESS FULL        | NE_ALARM_LIST        |  6914 |  1991K|  1242 |
    |* 19 |                   INDEX RANGE SCAN         | IDX_NE_TRAN_LIST_ID  |     1 |    13 |       |
    |* 20 |                  INDEX UNIQUE SCAN         | PK_KPI_MAPPING_CFG   |     1 |     8 |       |
    |  21 |                 TABLE ACCESS BY INDEX ROWID| KPI_CODE_LIST        |     1 |    32 |       |
    |* 22 |                  INDEX UNIQUE SCAN         | PK_KPI_CODE_LIST     |     1 |       |       |
    |  23 |                TABLE ACCESS BY INDEX ROWID | NET_ELEMENT          |     1 |    18 |     1 |
    |* 24 |                 INDEX UNIQUE SCAN          | PK_NET_ELEMENT       |     1 |       |       |
    |  25 |               TABLE ACCESS BY INDEX ROWID  | NET_ELEMENT          |     1 |    20 |     1 |
    |* 26 |                INDEX UNIQUE SCAN           | PK_NET_ELEMENT       |     1 |       |       |
    |* 27 |              TABLE ACCESS FULL             | TP_DOMAIN_LISTVALUES |     4 |   148 |     3 |
    |* 28 |             TABLE ACCESS FULL              | TP_DOMAIN_LISTVALUES |     4 |   112 |     3 |
    |* 29 |            TABLE ACCESS FULL               | TP_DOMAIN_LISTVALUES |    26 |   884 |     3 |
    |  30 |           VIEW                             |                      |    62 |  8680 |     7 |
    |  31 |            SORT GROUP BY                   |                      |    62 |  2418 |     7 |
    |* 32 |             TABLE ACCESS FULL              | TREE_PRIVILEGE       |    63 |  2457 |     4 |
    |  33 |         TABLE ACCESS FULL                  | MANAGE_REGION        |   237 |  2370 |     2 |
    |* 34 |        TABLE ACCESS FULL                   | TP_DOMAIN_LISTVALUES |     4 |   100 |     3 |
    |  35 |       TABLE ACCESS BY INDEX ROWID          | NET_ELEMENT          |     1 |    24 |     2 |
    |* 36 |        INDEX UNIQUE SCAN                   | PK_NET_ELEMENT       |     1 |       |     1 |
    |* 37 |        TABLE ACCESS FULL                   | NET_ELEMENT          |   115 |  3565 |   140 |
    ---------------------------------------------------------------------------------------------------
    
    统计信息
    ----------------------------------------------------------
              0  recursive calls
         788910  consistent gets
              0  physical reads
    

    究竟是什么原因导致如此大的差异呢?执行计划虽有差异,但是却并不感觉特别离谱,限入沉思中。。。。


    3、无所适从

    3.1统计信息不全?

    查询后发现系统确有收集统计信息,心中略有不甘,为防止收集不准确,继续做全收集如下(无分区表,所以不用 dbms_stats包了)

    analyze table Ne_Alarm_List        
            compute statistics for table for all indexes for all indexed columns;
    analyze table Tp_Domain_Listvalues 
            compute statistics for table for all indexes for all indexed columns;
    analyze table Net_Element          
            compute statistics for table for all indexes for all indexed columns;
    analyze table Kpi_Code_List        
            compute statistics for table for all indexes for all indexed columns;
    analyze table Manage_Region        
            compute statistics for table for all indexes for all indexed columns;
    analyze table Kpi_Mapping_Cfg      
            compute statistics for table for all indexes for all indexed columns;
    analyze table Ne_Trans_Alarm       
            compute statistics for table for all indexes for all indexed columns;
    analyze table Tree_Privilege       
            compute statistics for table for all indexes for all indexed columns;
    

    收集完毕,发现执行依旧缓慢,执行计划有变,但是罗极读和递归依旧大到惊人!

    3.2 物理分布OR碎片?

    于是检查如下:

    UNIX服务器上表和索引的情况

     select table_name,num_rows,blocks from user_tables where lower(table_name) in
        ('ne_alarm_list'
        ,'tp_domain_listvalues'
        ,'net_element'
        ,'kpi_code_list'
        ,'manage_region'
        ,'kpi_mapping_cfg'
        ,'ne_trans_alarm'
        ,'tree_privilege')
       order by table_name;
    
    TABLE_NAME                       NUM_ROWS     BLOCKS
    ------------------------------ ---------- ----------
    KPI_CODE_LIST                       64018       1378
    KPI_MAPPING_CFG                      4063         20
    MANAGE_REGION                         237          5
    NET_ELEMENT                        103485       1882
    NE_ALARM_LIST                      138278      15197
    NE_TRANS_ALARM                          0          5
    TP_DOMAIN_LISTVALUES                 1121         13
    TREE_PRIVILEGE                       2443       4780
    
     select table_name,index_name,blevel,leaf_blocks from user_indexes where lower(table_name) in
        ('ne_alarm_list'
        ,'tp_domain_listvalues'
        ,'net_element'
        ,'kpi_code_list'
        ,'manage_region'
        ,'kpi_mapping_cfg'
        ,'ne_trans_alarm'
        ,'tree_privilege')
       order by table_name;
    
    TABLE_NAME                     INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ------------------------------ ---------- -----------
    KPI_CODE_LIST                  PK_KPI_CODE_LIST                        1         170
    KPI_MAPPING_CFG                PK_KPI_MAPPING_CFG                      1          19
    MANAGE_REGION                  PK_MANAGE_REGION                        0           1
    NET_ELEMENT                    IDX_NE_ELEM_PARENT_NE_ID                1         241
    NET_ELEMENT                    IDX_NE_ELEM_NE_NAME                     2         367
    NET_ELEMENT                    IDX_STATE_PATH                          2         768
    NET_ELEMENT                    PK_NET_ELEMENT                          1         369
    NET_ELEMENT                    IDX_NE_ELEM_STATE                       1         234
    NET_ELEMENT                    IDX_NE_PATH                             2         656
    NET_ELEMENT                    IDX_NEID_NETYPEID                       1         317
    NET_ELEMENT                    IDX_NE_ELEM_NET_TYPE_ID                 1         237
    NE_ALARM_LIST                  PK_NE_ALARM_LIST                        1         338
    NE_TRANS_ALARM                 PK_NE_TRANS_ALARM                       0           0
    TREE_PRIVILEGE                 PK_TREE_PRIVILEGE                       2        5059
    

    笔记本上表和索引的情况

     select table_name,num_rows,blocks from user_tables where lower(table_name) in
        ('ne_alarm_list'
        ,'tp_domain_listvalues'
        ,'net_element'
        ,'kpi_code_list'
        ,'manage_region'
        ,'kpi_mapping_cfg'
        ,'ne_trans_alarm'
        ,'tree_privilege')
       order by table_name;
    
    TABLE_NAME                       NUM_ROWS     BLOCKS
    ------------------------------ ---------- ----------
    KPI_CODE_LIST                       64018       1378
    KPI_MAPPING_CFG                      4063         20
    MANAGE_REGION                         237          5
    NET_ELEMENT                        103485       1440
    NE_ALARM_LIST                      138278      12901
    NE_TRANS_ALARM                          0          0
    TP_DOMAIN_LISTVALUES                 1121         13
    TREE_PRIVILEGE                       2443         21
    
     select table_name,index_name,blevel,leaf_blocks from user_indexes where lower(table_name) in
       ('ne_alarm_list'
        ,'tp_domain_listvalues'
        ,'net_element'
        ,'kpi_code_list'
        ,'manage_region'
        ,'kpi_mapping_cfg'
        ,'ne_trans_alarm'
        ,'tree_privilege')
       order by table_name;
    
    TABLE_NAME                     INDEX_NAME                         BLEVEL LEAF_BLOCKS
    ------------------------------ ------------------------------ ---------- -----------
    KPI_CODE_LIST                  PK_KPI_CODE_LIST                        1         230
    KPI_MAPPING_CFG                PK_KPI_MAPPING_CFG                      1          16
    MANAGE_REGION                  PK_MANAGE_REGION                        0           1
    MANAGE_REGION                  IDX_REGION_141                          0           1
    NET_ELEMENT                    PK_NET_ELEMENT                          1         332
    NET_ELEMENT                    IDX_NE_ELEM_PARENT_NE_ID                1         365
    NE_ALARM_LIST                  IDX_CREATE_TIME                         2         544
    NE_ALARM_LIST                  IDX_ALARM_LIST_NE_ID                    1         435
    NE_ALARM_LIST                  IDX_ALARM_NE_CONFIG_ID                  1         339
    NE_ALARM_LIST                  PK_NE_ALARM_LIST                        1         425
    NE_ALARM_LIST                  IDX_ALARM_LIST_FLOW_ID                  1         377
    NE_TRANS_ALARM                 IDX_NE_TRAN_LIST_ID                     0           0
    NE_TRANS_ALARM                 PK_NE_TRANS_ALARM                       0           0
    TREE_PRIVILEGE                 PK_TREE_PRIVILEGE                       1          13
    

    心头一喜,好像看出TREE_PRIVILEGE 的表和索引的统计信息明显有问题,块是4780对21,叶子的高度是2对1 ,差异还算明显啊!

    于是操刀如下,从数据字典中捞出如下两批语句,分别执行,做好了表重组和索引重建两大工作

    select table_name,index_name,blevel,leaf_blocks
    ,'alter table '||table_name||' move;','alter index '||index_name||' rebuild;'
     from user_indexes where lower(table_name) in 
    ('ne_alarm_list'           
    ,'tp_domain_listvalues'  
    ,'net_element'  
    ,'kpi_code_list'  
    ,'manage_region'  
    ,'kpi_mapping_cfg'  
    ,'ne_trans_alarm'  
    ,'tree_privilege') 
    order by table_name;
    

    操作完以后刚才的4780对21和2对1的情况不见了(具体就不贴出了),变得非常接近了,我感觉问题好像已经搞定了。。。

    欢天喜地一跑脚本,天,依旧奇慢无比,再一看autotrace 跟踪的结果,心凉了半截,执行计划又略为小变动了一下,但是巨大惊人的逻辑读和递归调用形同鬼魅,如影随形,挥之不去!

    3.3 版本,BUG?

    为什么会有这么多逻辑读,为什么我的笔记本没有这么多逻辑读?我们的环境可是一摸一样啊,并且在别的同事的机器上也是跑的和我笔记本一样顺畅,百思不得其解。。。。。

    会不会是版本问题,BUG? 跳起身来,进行了以下查询

    服务器版本为

    SQL> show parameter feat
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    optimizer_features_enable            string      10.2.0.4
    

    我笔记本版本为

    SQL> show parameter feat
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    optimizer_features_enable            string      10.2.0.1
    

    和我笔记本一样OK的别人环境数据库版本为

    SQL> show parameter feat
    NAME                                 TYPE        VALUE
    ------------------------------------ ----------- ------------------------------
    optimizer_features_enable            string      10.2.0.3
    

    眼睛一亮,看来版本问题啊,版本不一样,怪不得莫名奇妙,心跳加速,心里寻思:离谱真的源于BUG?

    猜测不如动手,咋确认呢?来狠的,直接把服务器的版本降下来

    SQL> alter system set optimizer_features_enable='10.2.0.3';
    System altered.
    

    继续执行服务器环境SQL语句,发现没效果,重启数据库,依然如故!

    3.4 分页调优?

    大家对分页调优写法都有了解,本案例中由于如下子查询语句和整个结果集关联,并有 And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0的条件,导致无法将整个结果集先分页再和子查询关联,因为这样只取ROWNUM范围15行的值参与关联,结果就错了!

    所以本案例中,分页调优难以凑效!

    Select t.Primary_Id "$PRIMARY_ID",
           Sum(t.Has_Read) "$HAS_READ"
        From Tree_Privilege t
       Where t.Tree_Cfg_Name = 'NET_ELEMENT'
         And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
    Group By t.Primary_Id) "$PRI_VIEW"
    

    所以本案例中,分页调优难以凑效!

    唉,长叹一声,无所适从!


    4、峰回路转

    为什么会有那么多次的逻辑读和递归?看来看执行计划还是太表面了,难以回答ORACLE内部在做啥小动作,不查出问题所在,不将谜团大白于天下,誓不罢休!

    试试10046trace吧,看看整个执行开始到结束,到底在执行什么,等待什么?玩啥猫腻!

    分别做了两个10046 TRACE

    服务器上的10046 TRACE

    ********************************************************************************
    count    = number of times OCI procedure was executed
    cpu      = cpu time in seconds executing 
    elapsed  = elapsed time in seconds executing
    disk     = number of physical reads of buffers from disk
    query    = number of buffers gotten for consistent read
    current  = number of buffers gotten in current mode (usually for update)
    rows     = number of rows processed by the fetch or execute call
    ********************************************************************************
    
    Select *
       From (Select Rownum Sid, a.*
               From (Select a.Ne_Alarm_List_Id Id,
                            a.Flow_Id, /*流程ID*/
                            a.Oprt_State Oprtstate, /*操作状态*/
                            a.Alarm_State Alarmstate, /*告警状态*/
                            a.Alarm_Level Alarmlevel, /*告警级别*/
                            a.Perf_Msg_Id, /*性能消息ID*/
                            a.Alarm_Class Alarmclass, /*告警类别*/
                            To_Char(a.Generate_Time, 'yyyy-mm-dd') Generatedate, /*产生日期*/
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd') Lastdate, /*最后产生日期*/
                            B1.List_Label Alarm_Type,
                            C1.Ne_Name,
                            d.Kpi_Name,
                            a.Kpi_Value,
                            Decode(C1.Ne_Flag, 6, e.Region_Name, C2.Ne_Name) Datasource,
                            '' ||
                            Decode(a.Alarm_Level,
                                   '1',
                                   '严重',
                                   '2',
                                   '重要',
                                   '3',
                                   '一般',
                                   '未知') || '' Alarm_Level,
                            To_Char(a.Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Generate_Time,
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Last_Generate_Time,
                            Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator,
                            a.Alarm_Times,
                            Decode(a.Oprt_State,
                                   '20',
                                   To_Char(a.Confirm_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '25',
                                   To_Char(a.Suspend_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '30',
                                   To_Char(a.Clear_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '40',
                                   To_Char(a.Delete_Time, 'yyyy-mm-dd hh24:mi:ss')) Executetim,
                            B2.List_Label Alarm_State,
                            Decode(a.Flow_Id, '', '未派单', '已派单') Work_State /*自定义列项*/
                       From Ne_Alarm_List a,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_NE_ALARM_TYPE') B1,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_STATE') B2,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_DR_ID_FLAG') B3,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_OPRT_STATE') B4,
                            Net_Element C1,
                            Net_Element C2,
                            Kpi_Code_List d,
                            Manage_Region e,
                            Kpi_Mapping_Cfg f,
                            Ne_Trans_Alarm Nta,
                            (Select t.Primary_Id "$PRIMARY_ID",
                                    Sum(t.Has_Read) "$HAS_READ"
                               From Tree_Privilege t
                              Where t.Tree_Cfg_Name = 'NET_ELEMENT'
                                And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
                         Group By t.Primary_Id) "$PRI_VIEW"
                 Where Nvl(a.Config_Ne_Id, a.Ne_Id) ="$PRI_VIEW"."$PRIMARY_ID"(+)
                        And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0
                        And B1.List_Value = a.Alarm_Type
                        And a.Ne_Id = C1.Ne_Id
                        And B2.List_Value = a.Alarm_State
                        And B3.List_Value = a.Dr_Id || ''
                        And B4.List_Value = a.Oprt_State
                        And a.Config_Ne_Id = C2.Ne_Id(+)
                        And a.Kpi_Id = d.Kpi_Id
                        And a.Kpi_Id = f.Kpi_Id(+)
                        And Nvl(a.Alarm_Region_Origin, '-1') =
                            To_Char(e.Region_Id(+))
                        And a.Ne_Alarm_List_Id = Nta.Ne_Alarm_List_Id(+)
                        And (Select Path || '/'
                               From Net_Element
                              Where Ne_Id = a.Ne_Id) Not Like
                            (Select Path || '/%'
                               From Net_Element
                              Where Ne_Type_Id = 30
                                And Ne_Flag = '6'
                                And State = '0SA') /*非业务系统*/
                        And To_Number(a.Dr_Id) = 0
                      Order By B2.Sort_Id,
                               B4.Sort_Id,
                               a.Alarm_Type,
                               a.Ne_Id,
                               Nvl(a.Last_Send_Time, a.Create_Time) Desc) a) b
      Where b.Sid >= 1
        And b.Sid <= 15
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.58       0.59          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch        2     49.06      49.32      12758     665549          0          15
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4     49.65      49.91      12758     665549          0          15
    
    Misses in library cache during parse: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 58  
    
    Rows     Row Source Operation
    -------  ---------------------------------------------------
         15  VIEW  (cr=37490604 pr=12759 pw=0 time=272081968 us)
      65251   COUNT  (cr=37490604 pr=12759 pw=0 time=272342955 us)
      65251    FILTER  (cr=37490604 pr=12759 pw=0 time=272212450 us)
      65339     VIEW  (cr=37490592 pr=12759 pw=0 time=272270985 us)
      65339      SORT ORDER BY (cr=37490592 pr=12759 pw=0 time=272205640 us)
      65339       HASH JOIN OUTER (cr=665537 pr=12758 pw=0 time=5914370 us)
      65339        HASH JOIN  (cr=665534 pr=12758 pw=0 time=5616725 us)
      65339         FILTER  (cr=665518 pr=12758 pw=0 time=5507572 us)
      94744          HASH JOIN OUTER (cr=665518 pr=12758 pw=0 time=5595780 us)
      94744           NESTED LOOPS OUTER (cr=665485 pr=12758 pw=0 time=5642876 us)
      94744            NESTED LOOPS  (cr=486659 pr=12758 pw=0 time=4600678 us)
      94744             NESTED LOOPS  (cr=297169 pr=12758 pw=0 time=3463708 us)
      94744              NESTED LOOPS OUTER (cr=107679 pr=12758 pw=0 time=2231991 us)
      94744               HASH JOIN  (cr=12933 pr=12758 pw=0 time=1758206 us)
      94744                HASH JOIN  (cr=12917 pr=12758 pw=0 time=1454846 us)
     138299                 NESTED LOOPS OUTER (cr=12901 pr=12758 pw=0 time=1661499 us)
     138299                  HASH JOIN  (cr=12899 pr=12758 pw=0 time=969972 us)
          2                   TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=292 us)
     138299                   TABLE ACCESS FULL NE_ALARM_LIST (cr=12883 pr=12758 pw=0 time=554956 us)
          0                  INDEX RANGE SCAN IDX_NE_TRAN_LIST_ID (cr=2 pr=0 pw=0 time=389048 us)(object id 105412)
         24                 TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=177 us)
          5                TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=78 us)
        845               INDEX UNIQUE SCAN PK_KPI_MAPPING_CFG (cr=94746 pr=0 pw=0 time=311122 us)(object id 66011)
      94744              TABLE ACCESS BY INDEX ROWID KPI_CODE_LIST (cr=189490 pr=0 pw=0 time=888405 us)
      94744               INDEX UNIQUE SCAN PK_KPI_CODE_LIST (cr=94746 pr=0 pw=0 time=393651 us)(object id 66007)
      94744             TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=189490 pr=0 pw=0 time=896002 us)
      94744              INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=94746 pr=0 pw=0 time=383605 us)(object id 66108)
      89412            TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=178826 pr=0 pw=0 time=836400 us)
      89412             INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=89414 pr=0 pw=0 time=373511 us)(object id 66108)
        195           VIEW  (cr=33 pr=0 pw=0 time=3003 us)
        195            HASH GROUP BY (cr=33 pr=0 pw=0 time=2612 us)
        195             TABLE ACCESS BY INDEX ROWID TREE_PRIVILEGE (cr=33 pr=0 pw=0 time=1030 us)
        195              INDEX RANGE SCAN PK_TREE_PRIVILEGE (cr=12 pr=0 pw=0 time=433 us)(object id 100132)
          3         TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=287 us)
        237        TABLE ACCESS FULL MANAGE_REGION (cr=3 pr=0 pw=0 time=548 us)
          3     TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=9 pr=0 pw=0 time=6953 us)
          3      INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=5 pr=0 pw=0 time=109 us)(object id 66108)
          1     TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=3 pr=0 pw=0 time=56 us)
          1      INDEX RANGE SCAN IDX_NE_ELEM_NET_TYPE_ID (cr=2 pr=0 pw=0 time=40 us)(object id 79075)
    
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       2        0.00          0.00
      db file scattered read                        860        0.00          0.16
      db file sequential read                         9        0.00          0.00
      SQL*Net message from client                     2      201.06        201.06
    ********************************************************************************
    
    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65339      0.78       0.75          0          0          0           0
    Fetch   130670    148.79     145.17          0   32714110          0       65331
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   196010    149.58     145.93          0   32714110          0       65331
    
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 59     (recursive depth: 1)
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      latch: cache buffers chains                     1        0.00          0.00
    ********************************************************************************
    
    SELECT PKP_FLOW_FUNCTION.GETSTAFFNAMEBYID(LTRIM(RTRIM(TO_CHAR(:B1 )))) 
    FROM
     DUAL
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute 522102      7.43       6.78          0          0          0           0
    Fetch   522102     42.59      42.17          0          0          0      522102
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   1044205     50.03      48.96          0          0          0      522102
    
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 59     (recursive depth: 1)
    ********************************************************************************
    
    SELECT NAME 
    FROM
     STAFF_INFO WHERE STAFF_ID = :B1 
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute 522102      8.51       7.78          0          0          0           0
    Fetch   522102     21.35      20.21          0    4110939          0      456771
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   1044205     29.86      27.99          0    4110939          0      456771
    
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: ALL_ROWS
    Parsing user id: 59     (recursive depth: 2)
    ********************************************************************************
    
    ********************************************************************************
    
    OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        2      0.58       0.59          0          0          0           0
    Execute      2      0.00       0.00          0          0          0           0
    Fetch        2     49.06      49.32      12758     665549          0          15
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        6     49.65      49.91      12758     665549          0          15
    
    Misses in library cache during parse: 1
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       3        0.00          0.00
      SQL*Net message from client                     3      201.06        213.83
      SQL*Net more data from client                   2        0.00          0.00
      db file scattered read                        860        0.00          0.16
      db file sequential read                         9        0.00          0.00
    
    
    OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse       11      0.00       0.01          0          0          0           0
    Execute 1109552     16.74      15.33          0          0          0           0
    Fetch   1174883    212.73     207.57          1   36825069          0     1044211
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   2284446    229.48     222.92          1   36825069          0     1044211
    
    Misses in library cache during parse: 6
    Misses in library cache during execute: 6
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      db file sequential read                         1        0.00          0.00
      latch: cache buffers chains                     1        0.00          0.00
    
        5  user  SQL statements in session.
        9  internal SQL statements in session.
       14  SQL statements in session.
    ********************************************************************************
    

    笔记本上的10046 TRACE

    ********************************************************************************
    count    = number of times OCI procedure was executed
    cpu      = cpu time in seconds executing 
    elapsed  = elapsed time in seconds executing
    disk     = number of physical reads of buffers from disk
    query    = number of buffers gotten for consistent read
    current  = number of buffers gotten in current mode (usually for update)
    rows     = number of rows processed by the fetch or execute call
    ********************************************************************************
    
    alter session set events '10046 trace name context forever , level 8'
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        0      0.00       0.00          0          0          0           0
    Execute      1      0.00       0.04          0          0          0           0
    Fetch        0      0.00       0.00          0          0          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        1      0.00       0.04          0          0          0           0
    
    Misses in library cache during parse: 0
    Misses in library cache during execute: 1
    Optimizer mode: CHOOSE
    Parsing user id: 56  
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       1        0.00          0.00
      SQL*Net message from client                     1        0.00          0.00
    ********************************************************************************
    
    Select *
       From (Select Rownum Sid, a.*
               From (Select a.Ne_Alarm_List_Id Id,
                            a.Flow_Id, /*流程ID*/
                            a.Oprt_State Oprtstate, /*操作状态*/
                            a.Alarm_State Alarmstate, /*告警状态*/
                            a.Alarm_Level Alarmlevel, /*告警级别*/
                            a.Perf_Msg_Id, /*性能消息ID*/
                            a.Alarm_Class Alarmclass, /*告警类别*/
                            To_Char(a.Generate_Time, 'yyyy-mm-dd') Generatedate, /*产生日期*/
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd') Lastdate, /*最后产生日期*/
                            B1.List_Label Alarm_Type,
                            C1.Ne_Name,
                            d.Kpi_Name,
                            a.Kpi_Value,
                            Decode(C1.Ne_Flag, 6, e.Region_Name, C2.Ne_Name) Datasource,
                            '' ||
                            Decode(a.Alarm_Level,
                                   '1',
                                   '严重',
                                   '2',
                                   '重要',
                                   '3',
                                   '一般',
                                   '未知') || '' Alarm_Level,
                            To_Char(a.Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Generate_Time,
                            To_Char(a.Last_Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Last_Generate_Time,
                            Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator,
                            a.Alarm_Times,
                            Decode(a.Oprt_State,
                                   '20',
                                   To_Char(a.Confirm_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '25',
                                   To_Char(a.Suspend_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '30',
                                   To_Char(a.Clear_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                   '40',
                                   To_Char(a.Delete_Time, 'yyyy-mm-dd hh24:mi:ss')) Executetim,
                            B2.List_Label Alarm_State,
                            Decode(a.Flow_Id, '', '未派单', '已派单') Work_State /*自定义列项*/
                       From Ne_Alarm_List a,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_NE_ALARM_TYPE') B1,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_STATE') B2,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_DR_ID_FLAG') B3,
                            (Select *
                               From Tp_Domain_Listvalues
                              Where Domain_Code = 'DOMAIN_ALARM_OPRT_STATE') B4,
                            Net_Element C1,
                            Net_Element C2,
                            Kpi_Code_List d,
                            Manage_Region e,
                            Kpi_Mapping_Cfg f,
                            Ne_Trans_Alarm Nta,
                            (Select t.Primary_Id "$PRIMARY_ID",
                                    Sum(t.Has_Read) "$HAS_READ"
                               From Tree_Privilege t
                              Where t.Tree_Cfg_Name = 'NET_ELEMENT'
                                And t.Assign_Object In ('STAFF_3', 'ORG_2', 'STATION_22')
                         Group By t.Primary_Id) "$PRI_VIEW"
                 Where Nvl(a.Config_Ne_Id, a.Ne_Id) ="$PRI_VIEW"."$PRIMARY_ID"(+)
                        And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0
                        And B1.List_Value = a.Alarm_Type
                        And a.Ne_Id = C1.Ne_Id
                        And B2.List_Value = a.Alarm_State
                        And B3.List_Value = a.Dr_Id || ''
                        And B4.List_Value = a.Oprt_State
                        And a.Config_Ne_Id = C2.Ne_Id(+)
                        And a.Kpi_Id = d.Kpi_Id
                        And a.Kpi_Id = f.Kpi_Id(+)
                        And Nvl(a.Alarm_Region_Origin, '-1') =
                            To_Char(e.Region_Id(+))
                        And a.Ne_Alarm_List_Id = Nta.Ne_Alarm_List_Id(+)
                        And (Select Path || '/'
                               From Net_Element
                              Where Ne_Id = a.Ne_Id) Not Like
                            (Select Path || '/%'
                               From Net_Element
                              Where Ne_Type_Id = 30
                                And Ne_Flag = '6'
                                And State = '0SA') /*非业务系统*/
                        And To_Number(a.Dr_Id) = 0
                      Order By B2.Sort_Id,
                               B4.Sort_Id,
                               a.Alarm_Type,
                               a.Ne_Id,
                               Nvl(a.Last_Send_Time, a.Create_Time) Desc) a) b
      Where b.Sid >= 1
        And b.Sid <= 15
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.28       1.03          0          0          0           0
    Execute      1      0.00       0.00          0          0          0           0
    Fetch        2     40.96      41.37          0    1388865          0          15
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        4     41.25      42.40          0    1388865          0          15
    
    Misses in library cache during parse: 1
    Optimizer mode: CHOOSE
    Parsing user id: 56  
    
    Rows     Row Source Operation
    -------  ---------------------------------------------------
         15  VIEW  (cr=1454108 pr=0 pw=0 time=44434215 us)
      65249   COUNT  (cr=1454108 pr=0 pw=0 time=45608654 us)
      65249    VIEW  (cr=1454108 pr=0 pw=0 time=45151896 us)
      65249     SORT ORDER BY (cr=1454108 pr=0 pw=0 time=44760386 us)
      65249      FILTER  (cr=1388865 pr=0 pw=0 time=24109934 us)
      65332       HASH JOIN  (cr=1387411 pr=0 pw=0 time=23581540 us)
      65332        HASH JOIN  (cr=1387395 pr=0 pw=0 time=23224331 us)
      65332         HASH JOIN  (cr=1387379 pr=0 pw=0 time=23125008 us)
      65332          HASH JOIN  (cr=1387363 pr=0 pw=0 time=22935658 us)
      65332           HASH JOIN OUTER (cr=1387347 pr=0 pw=0 time=22771025 us)
      65332            FILTER  (cr=1387340 pr=0 pw=0 time=22949387 us)
     138272             HASH JOIN OUTER (cr=1387340 pr=0 pw=0 time=22856756 us)
     138272              NESTED LOOPS OUTER (cr=1387320 pr=0 pw=0 time=26824900 us)
     138272               NESTED LOOPS OUTER (cr=1387318 pr=0 pw=0 time=24059444 us)
     138272                NESTED LOOPS  (cr=1119088 pr=0 pw=0 time=17975481 us)
     138272                 NESTED LOOPS  (cr=704272 pr=0 pw=0 time=11200155 us)
     138272                  NESTED LOOPS OUTER (cr=289456 pr=0 pw=0 time=4286553 us)
     138272                   TABLE ACCESS FULL NE_ALARM_LIST (cr=12912 pr=0 pw=0 time=829731 us)
      44379                   INDEX UNIQUE SCAN PK_KPI_MAPPING_CFG (cr=276544 pr=0 pw=0 time=1871242 us)(object id 52751)
     138272                  TABLE ACCESS BY INDEX ROWID KPI_CODE_LIST (cr=414816 pr=0 pw=0 time=5009011 us)
     138272                   INDEX UNIQUE SCAN PK_KPI_CODE_LIST (cr=276544 pr=0 pw=0 time=2152340 us)(object id 52747)
     138272                 TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=414816 pr=0 pw=0 time=5000605 us)
     138272                  INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=276544 pr=0 pw=0 time=2149586 us)(object id 52861)
      89410                TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=268230 pr=0 pw=0 time=3695667 us)
      89410                 INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=178820 pr=0 pw=0 time=1808726 us)(object id 52861)
          0               INDEX RANGE SCAN IDX_NE_TRAN_LIST_ID (cr=2 pr=0 pw=0 time=1327629 us)(object id 52966)
        195              VIEW  (cr=20 pr=0 pw=0 time=2866 us)
        195               SORT GROUP BY (cr=20 pr=0 pw=0 time=1690 us)
        195                TABLE ACCESS FULL TREE_PRIVILEGE (cr=20 pr=0 pw=0 time=1047 us)
        237            TABLE ACCESS FULL MANAGE_REGION (cr=7 pr=0 pw=0 time=1302 us)
          5           TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=118 us)
          3          TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=267 us)
          2         TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=115 us)
         24        TABLE ACCESS FULL TP_DOMAIN_LISTVALUES (cr=16 pr=0 pw=0 time=183 us)
          3       TABLE ACCESS BY INDEX ROWID NET_ELEMENT (cr=9 pr=0 pw=0 time=161 us)
          3        INDEX UNIQUE SCAN PK_NET_ELEMENT (cr=6 pr=0 pw=0 time=79 us)(object id 52861)
          1       TABLE ACCESS FULL NET_ELEMENT (cr=1445 pr=0 pw=0 time=6876 us)
    
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       2        0.00          0.00
      SQL*Net message from client                     2        0.54          0.54
    ********************************************************************************
    
    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
    
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65249      2.26       1.90          0          0          0           0
    Fetch    65249      1.23       1.30          0      65243          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   130499      3.50       3.20          0      65243          0           0
    
    Misses in library cache during parse: 1
    Misses in library cache during execute: 1
    Optimizer mode: CHOOSE
    Parsing user id: 57     (recursive depth: 1)
    ********************************************************************************
    
    ********************************************************************************
    
    OVERALL TOTALS FOR ALL NON-RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        2      0.29       1.03          0          0          0           0
    Execute      3      0.00       0.04          0          0          0           0
    Fetch        2     40.96      41.37          0    1388865          0          15
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total        7     41.26      42.45          0    1388865          0          15
    
    Misses in library cache during parse: 2
    Misses in library cache during execute: 1
    
    Elapsed times include waiting on following events:
      Event waited on                             Times   Max. Wait  Total Waited
      ----------------------------------------   Waited  ----------  ------------
      SQL*Net message to client                       5        0.00          0.00
      SQL*Net message from client                     5       22.57         31.68
      SQL*Net more data from client                   2        0.00          0.00
    
    
    OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse       30      0.00       0.00          0          0          0           0
    Execute  65278      2.26       1.92          0          0          0           0
    Fetch    65278      1.23       1.30          0      65301          0          29
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   130586      3.50       3.23          0      65301          0          29
    
    Misses in library cache during parse: 2
    Misses in library cache during execute: 2
    
        4  user  SQL statements in session.
       29  internal SQL statements in session.
       33  SQL statements in session.
    ********************************************************************************
    

    仔细观察,终于有重大发现了,笔记本中环境10046 TRACE

    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
     
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65249      2.26       1.90          0          0          0           0
    Fetch    65249      1.23       1.30          0      65243          0           0
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   130499      3.50       3.20          0      65243          0           0
    

    这里怎么跑出一个这样的语句,Eeecute6万多次最终获取0行

    而服务器上10046 TRACE中类似的地方确实如下,rows显示为65331和前者为0!有所发现

    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65339      0.78       0.75          0          0          0           0
    Fetch   130670    148.79     145.17          0   32714110          0       65331
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   196010    149.58     145.93          0   32714110          0       65331
    

    继续分析,发现笔记本上的10046 TRACE根本找不到如下两处调用!

    SELECT PKP_FLOW_FUNCTION.GETSTAFFNAMEBYID(LTRIM(RTRIM(TO_CHAR(:B1 )))) 
    FROM
     DUAL
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute 522102      7.43       6.78          0          0          0           0
    Fetch   522102     42.59      42.17          0          0          0      522102
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   1044205     50.03      48.96          0          0          0     522102
    
    SELECT NAME 
    FROM
     STAFF_INFO WHERE STAFF_ID = :B1 
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute 522102      8.51       7.78          0          0          0           0
    Fetch   522102     21.35      20.21          0    4110939          0      456771
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   1044205     29.86      27.99          0    4110939          0     456771
    

    仔细观察,还有新发现,服务器的执行计划有递归调用,而笔记本上的执行计划没有递归调用。

    RECURSIVE STATEMENTS
    OVERALL TOTALS FOR ALL RECURSIVE STATEMENTS
    
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse       11      0.00       0.01          0          0          0           0
    Execute 1109552     16.74      15.33         0          0          0           0
    Fetch   1174883    212.73     207.57         1   36825069          0     1044211
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   2284446    229.48     222.92         1   36825069          0     1044211
    

    看来,这个递归毫无疑问来自函数的调用,仔细回到复杂的SQL脚本一看,发现了原先SQL中有函数 Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator, 查看这个函数调用的代码写法

    FUNCTION getFlowCurStaffName(in_flow_id in number) RETURN VARCHAR2 IS
        ss         VARCHAR2(255);
        person1    VARCHAR2(255);
        v_tabArray PKP_STRING_UTIL.varArray;
        CURSOR c2 IS
          SELECT person
            FROM tache
           WHERE flow_id = in_flow_id
             AND STATE <> 'F';
      BEGIN
        --循环从表中取得
        FOR rel1 IN c2 LOOP
          v_tabArray := PKP_STRING_UTIL.split(rel1.person, ',');
          FOR i IN 1 .. v_tabArray.COUNT() LOOP
            SELECT getStaffNameById(LTRIM(RTRIM(TO_CHAR(v_tabArray(i)))))
              INTO person1
              FROM dual;
            ss := ss || person1 || ' ';
          END LOOP;
        END LOOP;
    
        RETURN(ss);
      END getFlowCurStaffName;
    

    其中getFlowCurStaffName函数还调用了getStaffNameById函数,具体如下:

    FUNCTION getStaffNameById(in_staff_id IN varchar2) return varchar2 is
      ret varchar2(255);
    begin
      select NAME INTO ret FROM STAFF_INFO WHERE STAFF_ID = in_staff_id;
      return(ret);
    end getStaffNameById;
    

    而这些语句都出现在刚才的服务器的10046 TRACE文件中,看来问题就出在这个函数调用中了! 的是函数中的语句为什么与部分没有出现在笔记本的10046 TRACE文件中呢?这个问题我还无法回答。

    不过我相信。。。。。。真相已经离我们很近了!


    5、大白天下

    .

    目标既锁定函数为“元凶巨恶”,那就除恶务尽!

    在剔除了Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator 后,发现服务器环境的SQL从原先的230秒瞬间降低为3秒,乖乖,料到有效果不假,但没料到这么有效果!

    再一看autotrace ,不得不相信真能有这样的提升,逻辑读为: 82617 consistent gets 而原先为:37494576 consistent gets

    到此其实还有一事未明,就是为什么说笔记本环境没有那么多递归调用和逻辑读呢,在前面的基础上继续探讨,问题就变的容易了,因为前面的函数中

    CURSOR c2 IS
       SELECT person
         FROM tache
        WHERE flow_id = in_flow_id
          AND STATE <> 'F';
    

    这个 tache表在笔记本及其他环境中是没记录的,而服务器及生产是有2万多条记录!

    发现这个以后,一切谜团都解开了!

    1、为什么笔记本环境没有看到递归调用?

    这个外循环没值,递啥归啊?

    2、为什么服务器环境看到大量逻辑读?

    一层套一层,在C2的CURSOR不是空的情况下,自然就调用到了getFlowCurStaffName,加上外面的6万次循环,能不产生大量逻辑读吗?笔记本环境C2是空的,做啥啊,呵呵。

    3、为什么生产服务器环境时快时慢

    存在函数调用,由于函数调用次数和外层结果集的记录有关,所以时快时慢,笔记本环境C2游标是空的,当然稳定!


    6、乘胜追击

    现在题是如何解决这个问题,因为去除该函数毕竟不是解决问题的正道!

    先实验一个自定义普通函数

    (其中set_clinet_info函数这个是为了测试最终该函数被调用多少次,实用技术,简单易学,值得推广)

    create or replace function f(id in number)
      Return  number
    is
    begin
      dbms_application_info.set_client_info(userenv('client_info')+1); 
      return id;
    end a;
    /
    

    接着改写代码,将 Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator 修改为f(a.Flow_Id) Flow_Operator,

    发现整体运行时间非常短,大致5秒,说明Pkp_Flow_Function确实是做太多事了。

    通过开始exec dbms_application_info.set_client_info(0);

    后续执行select userenv('client_info') from dual;来观察,发现该函数F(a.Flow_Id) 是执行了65339次,和10046 trace的一致。具体见附件《自定义函数测试调用次数.txt》 另外Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id)也是执行了65339次 ,具体见附件《测试原函数调用次数.txt》 而该函数的子函数却调用了456771次!!!原因在于 FOR i IN 1 .. v_tabArray.COUNT() LOOP这循环是对表的字段里的内容进行循环,(字段中的内容用逗号分割符进行分割,如何拆开循环) 具体见附件《测试原函数的子函数调用次数.txt》

    看来解决的思路如下

    1、尽量减少外部调用,如何能避免调用65339次

    经过分析,由于分页的缘故,每次都只会展现15条记录,而这个函数经确认只是取得姓名的字段,完全可以放在最后执行,修改为将 Pkp_Flow_Function.Getflowcurstaffname(a.Flow_Id) Flow_Operator放置于整个结果集的外面,这样,调用立减为15次。性能瞬间提高仅4-5秒完成任务!优化的最终代码见《终极解决.sql》,其中分页其实并无效果,只是写写而已。

    Select b.*,Pkp_Flow_Function.Getflowcurstaffname(b.Flow_Id) Flow_Operator
      From (Select Rownum Sid, a.*
              From (Select a.Ne_Alarm_List_Id Id,
                           a.Flow_Id, /*流程ID*/
                           a.Oprt_State Oprtstate, /*操作状态*/
                           a.Alarm_State Alarmstate, /*告警状态*/
                           a.Alarm_Level Alarmlevel, /*告警级别*/
                           a.Perf_Msg_Id, /*性能消息ID*/
                           a.Alarm_Class Alarmclass, /*告警类别*/
                           To_Char(a.Generate_Time, 'yyyy-mm-dd') Generatedate, /*产生日期*/
                           To_Char(a.Last_Generate_Time, 'yyyy-mm-dd') Lastdate, /*最后产生日期*/
                           B1.List_Label Alarm_Type,
                           C1.Ne_Name,
                           d.Kpi_Name,
                           a.Kpi_Value,
                           Decode(C1.Ne_Flag, 6, e.Region_Name, C2.Ne_Name) Datasource,
                           '' ||
                           Decode(a.Alarm_Level,
                                  '1',
                                  '严重',
                                  '2',
                                  '重要',
                                  '3',
                                  '一般',
                                  '未知') || '' Alarm_Level,
                           To_Char(a.Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Generate_Time,
                           To_Char(a.Last_Generate_Time, 'yyyy-mm-dd hh24:mi:ss') Last_Generate_Time,
                           a.Alarm_Times,
                           Decode(a.Oprt_State,
                                  '20',
                                  To_Char(a.Confirm_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                  '25',
                                  To_Char(a.Suspend_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                  '30',
                                  To_Char(a.Clear_Time, 'yyyy-mm-dd hh24:mi:ss'),
                                  '40',
                                  To_Char(a.Delete_Time, 'yyyy-mm-dd hh24:mi:ss')) Executetim,
                           B2.List_Label Alarm_State,
                           Decode(a.Flow_Id, '', '未派单', '已派单') Work_State /*自定义列项*/
                      From Ne_Alarm_List a,
                           (Select *
                              From Tp_Domain_Listvalues
                             Where Domain_Code = 'DOMAIN_NE_ALARM_TYPE') B1,
                           (Select *
                              From Tp_Domain_Listvalues
                             Where Domain_Code = 'DOMAIN_ALARM_STATE') B2,
                           (Select *
                              From Tp_Domain_Listvalues
                             Where Domain_Code = 'DOMAIN_DR_ID_FLAG') B3,
                           (Select *
                              From Tp_Domain_Listvalues
                             Where Domain_Code = 'DOMAIN_ALARM_OPRT_STATE') B4,
                           Net_Element C1,
                           Net_Element C2,
                           Kpi_Code_List d,
                           Manage_Region e,
                           Kpi_Mapping_Cfg f,
                           Ne_Trans_Alarm Nta,
                           (Select t.Primary_Id "$PRIMARY_ID",
                                   Sum(t.Has_Read) "$HAS_READ"
                              From Tree_Privilege t
                             Where t.Tree_Cfg_Name = 'NET_ELEMENT'
                               And t.Assign_Object In
                                   ('STAFF_3', 'ORG_2', 'STATION_22')
                             Group By t.Primary_Id) "$PRI_VIEW"
                     Where Nvl(a.Config_Ne_Id, a.Ne_Id) =
                           "$PRI_VIEW"."$PRIMARY_ID"(+)
                       And Nvl("$PRI_VIEW"."$HAS_READ", 0) > 0
                       And B1.List_Value = a.Alarm_Type
                       And a.Ne_Id = C1.Ne_Id
                       And B2.List_Value = a.Alarm_State
                       And B3.List_Value = a.Dr_Id || ''
                       And B4.List_Value = a.Oprt_State
                       And a.Config_Ne_Id = C2.Ne_Id(+)
                       And a.Kpi_Id = d.Kpi_Id
                       And a.Kpi_Id = f.Kpi_Id(+)
                       And Nvl(a.Alarm_Region_Origin, '-1') =
                           To_Char(e.Region_Id(+))
                       And a.Ne_Alarm_List_Id = Nta.Ne_Alarm_List_Id(+)
                       And (Select Path || '/'
                              From Net_Element
                             Where Ne_Id = a.Ne_Id) Not Like
                           (Select Path || '/%'
                              From Net_Element
                             Where Ne_Type_Id = 30
                               And Ne_Flag = '6'
                               And State = '0SA') /*非业务系统*/
                       And To_Number(a.Dr_Id) = 0
                     Order By B2.Sort_Id,
                              B4.Sort_Id,
                              a.Alarm_Type,
                              a.Ne_Id,
                              Nvl(a.Last_Send_Time, a.Create_Time) Desc) a Where Rownum<=15) b
     Where b.Sid >= 1
    

    2、进一步完善,Pkp_Flow_Function调用次数过多,可考虑用表关联改写该函数的代码,减少不必要的递归和调用(这个待后续再考虑完善)

    另外细心的朋友在分析《10046trace_service.txt》的时候或许注意到

    SELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F' 
    

    该语句有着惊人的逻辑读!32714110/65331=500意味着每获取一行需要进行500次的逻辑读,由此可见,该语句应该可以优化,果然,当前语句是全表扫描。 建如下索引后,继续执行,立即发现代价从原先的111缩短到2,收效相当明显!

    create index idx_union on tache (flow_id,person,state)
    

    收效相当明显,在我未进行上述调整函数调用的时候,优化的效果是:从每获取一行需要进行500次的逻辑读到每获取一行需要3次逻辑读。

    ELECT PERSON 
    FROM
     TACHE WHERE FLOW_ID = :B1 AND STATE <> 'F'
     
    call     count       cpu    elapsed       disk      query    current        rows
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    Parse        1      0.00       0.00          0          0          0           0
    Execute  65339      0.85       0.79          0          0          0           0
    Fetch   130670      1.27       1.11          3     195993          0       65331
    ------- ------  -------- ---------- ---------- ---------- ----------  ----------
    total   196010      2.13       1.90          3     195993          0       65331
    

    大家可以看到由于函数调用这个硬伤未处理,递归依然极大,但是逻辑读变小确实不争的事实。事实上是,这个语句在66秒完成,虽说不如意,但总比以前的230多秒好,加上致命硬伤函数调用一解决,最终在3秒多执行完毕。(只改函数调用的致命伤而不优化这个语句,执行是在5秒完成)


    7、余音绕梁

    事情算是最终解决了,想想,有如下几点总结

    1、开发人员尽量避免写过多的函数调用,这样将有可能导致惊人的递归调用和逻辑读,此外在SQL中大量调用函数,也容易导致SQL引擎和PLSQL引擎进行大量的上下文切换,这样性能必定收到影响。

    2、查问题不能想当然,其实我们的笔记本环境和服务器环境大多记录是一样的,唯独tache表是不同,笔记本是空记录,就这一处差别,让我们限入迷茫中。

    3、解决问题要所有的疑惑都解决了,才是真正解决问题,比如我们考虑改写SQL,分页调优,即便真的变快了,能回答为什么笔记本不需要改写SQL就能快吗?

    4、解决问题可以从 看执行计划----看逻辑读递归-------10046trace 这种方式深入。 当然中间少不了各种猜想。

    比如降低数据库版本,在本案例中无用,或许在下一个案例中,就是你发现问题的一个方法。

    比如分页查询,在本案例中或许无用,但是在别的场合,就有可能会有用武之地。

    比如看统计信息收集情况及表物理索引物理分布,这些或许就是在下一案例中的有力武器。

    此外优化的思路莫过于减少访问量和增大吞吐量(吞吐量有些可遇不可求),而访问量是由单次访问量和执行的次数决定的,本次优化处理函数写法就是减少了执行次数,而修改那条SQL则是减少单次的访问量!

    5、其实本案例并未完,该过程的调用写法一定不是最佳的写法,所以要有精益求精的精神,后续,考虑优化这个过程。因为如果不是需求每次只要展现15页,优化这个函数就已成必然!

  • 相关阅读:
    基础JavaScript练习(一)总结
    零基础JavaScript编码(三)总结
    零基础JavaScript编码(二)
    零基础JavaScript编码(一)
    响应式网格(栅格化)布局总结
    实现常见的技术产品官网的页面架构及样式布局总结
    通过HTML及CSS模拟报纸排版总结
    任务四:定位和居中问题总结
    1. Two Sum
    任务三:三栏式布局总结
  • 原文地址:https://www.cnblogs.com/killkill/p/1687920.html
Copyright © 2020-2023  润新知