• Oracle RAC环境下定位并杀掉最终阻塞的会话-续


    之前在《Oracle RAC环境下定位并杀掉最终阻塞的会话》中,最终使用一个SQL查询出RAC实例之间的所有阻塞关系。但是实际在某些极端的生产环境,是不允许执行复杂的SQL语句,即使允许执行可能现场也不方便复制SQL,手敲的话效率低下,那么本文就介绍另一种简单的方法来快速定位最终阻塞会话,也就是DBA常用的oradebug hanganalyze。

    1.模拟故障

    直接根据《[Oracle RAC环境下定位并杀掉最终阻塞的会话](http://www.cnblogs.com/jyzhao/p/8716546.html)》中的来模拟,不再赘述。 如果按照之前的SQL查询,结果如下:
    SYS@jyzhao1 >--cascade blocking@gv$session
    SYS@jyzhao1 >select *
      2    from (select a.inst_id, a.sid, a.serial#,
      3                 a.sql_id,
      4                 a.event,
      5                 a.status,
      6                 connect_by_isleaf as isleaf,
      7                 sys_connect_by_path(a.SID||'@'||a.inst_id, ' <- ') tree,
      8                 level as tree_level
      9            from gv$session a
     10           start with a.blocking_session is not null
     11          connect by (a.sid||'@'||a.inst_id) = prior (a.blocking_session||'@'||a.blocking_instance))
     12   where isleaf = 1
     13   order by tree_level asc;
    
       INST_ID        SID    SERIAL# SQL_ID        EVENT                               STATUS       ISLEAF TREE                           TREE_LEVEL
    ---------- ---------- ---------- ------------- ----------------------------------- -------- ---------- ------------------------------ ----------
             1         26       3479               SQL*Net message from client         INACTIVE          1  <- 148@2 <- 26@1                       2
             1         26       3479               SQL*Net message from client         INACTIVE          1  <- 29@1 <- 148@2 <- 26@1               3
             1         26       3479               SQL*Net message from client         INACTIVE          1  <- 23@2 <- 148@2 <- 26@1               3
    
    SYS@jyzhao1 >
    

    2.oradebug hanganalyze

    如今假设我们的环境的确不方便使用SQL查询,那么就需要用到oradebug hanganalyze来分析。 oradebug hanganalyze 3

    因为我这里是RAC环境,需要分析所有实例:
    oradebug -g all hanganalyze 3

    SYS@jyzhao1 >oradebug -g all hanganalyze 3
    Hang Analysis in /opt/app/oracle/diag/rdbms/jyzhao/jyzhao1/trace/jyzhao1_diag_1919.trc
    

    3.分析trace文件

    我们去分析这个生成的trc文件,可以很清楚的看到HANG分析部分,存在两个chain,比如我这个实验的情况就是: Chain 1: 可以看到实例1的会话29被实例2的会话148阻塞,实例2的会话148又被实例1的会话26阻塞; Chain 2: 可以看到实例2的会话23被实例2的会话148阻塞,而实例2的会话148又在第一个chain中。 可以发现这与我之前用SQL查询的结果是一样的意思,都可以做到快速定位最终阻塞会话是实例1的会话26,与客户确认后杀掉即可。

    附:oradebug hanganalyze 3分析的trace文件中的核心信息

    *** 2018-04-21 07:51:44.975
    ===============================================================================
    HANG ANALYSIS:
      instances (db_name.oracle_sid): jyzhao.jyzhao1, jyzhao.jyzhao2
      oradebug_node_dump_level: 3
      analysis initiated by oradebug
      os thread scheduling delay history: (sampling every 1.000000 secs)
        0.000000 secs at [ 07:51:44 ]
          NOTE: scheduling delay has not been sampled for 0.305046 secs    0.000000 secs from [ 07:51:40 - 07:51:45 ], 5 sec avg
        0.000000 secs from [ 07:50:45 - 07:51:45 ], 1 min avg
        0.000000 secs from [ 07:46:45 - 07:51:45 ], 5 min avg
      vktm time drift history
    ===============================================================================
     
    Chains most likely to have caused the hang:
     [a] Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock contention'
         Chain 1 Signature Hash: 0x42598823
     [b] Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock contention'
         Chain 2 Signature Hash: 0x42598823
     
    ===============================================================================
    Non-intersecting chains:
     
    -------------------------------------------------------------------------------
    Chain 1:
    -------------------------------------------------------------------------------
        Oracle session identified by:
        {
                    instance: 1 (jyzhao.jyzhao1)
                       os id: 2712
                  process id: 42, oracle@jyrac1 (TNS V1-V3)
                  session id: 29
            session serial #: 81
        }
        is waiting for 'enq: TX - row lock contention' with wait info:
        {
                          p1: 'name|mode'=0x54580006
                          p2: 'usn<<16 | slot'=0xe0002
                          p3: 'sequence'=0x3a3
                time in wait: 8 min 21 sec
               timeout after: never
                     wait id: 37
                    blocking: 0 sessions
                 current sql: update emp set job = 'CEO' where empno = 7839
                 short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2022<-kslwaitctx()+163<-k
    jusuc()+3400<-ksipgetctxi()+1759<-ksqcmi()+20798<-ksqgtlctx()+3501<-ksqgelctx()+557<-ktuGetTxForXid()+131<-ktcwit1()+336<-kdddgb()+8587<-kdusru()+460<-kauupd()+412<-updrow
    ()+2167<-qerupFetch()+860<-updaul()+1378<-updThreePhaseExe()+318<-updexe()+638<-opiexe()+10378<-kpoal8()+2380<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-op
    iodr()+917<-opidrv()+570<-sou2o(
                wait history:
                  * time between current wait and wait #1: 0.010123 sec
                  1.       event: 'gc current block 2-way'
                     time waited: 0.000622 sec
                         wait id: 36              p1: ''=0x7
                                                  p2: ''=0x6483
                                                  p3: ''=0x2000001
                  * time between wait #1 and #2: 0.007260 sec
                  2.       event: 'gc cr block 2-way'
                     time waited: 0.000501 sec
                         wait id: 35              p1: ''=0x6
                                                  p2: ''=0x523
                                                  p3: ''=0x2c
                  * time between wait #2 and #3: 0.000462 sec
                  3.       event: 'gc cr block 2-way'
                     time waited: 0.000689 sec
                         wait id: 34              p1: ''=0x6
                                                  p2: ''=0xb0
                                                  p3: ''=0x2b
        }
        and is blocked by
     => Oracle session identified by:
        {
                    instance: 2 (jyzhao.jyzhao2)
                       os id: 23427
                  process id: 41, oracle@jyrac2 (TNS V1-V3)
                  session id: 148
            session serial #: 17715
        }
        which is waiting for 'enq: TX - row lock contention' with wait info:
        {
                          p1: 'name|mode'=0x54580006
                          p2: 'usn<<16 | slot'=0x50008
                          p3: 'sequence'=0x9e6
                time in wait: 9 min 9 sec
               timeout after: never
                     wait id: 152
                    blocking: 2 sessions
                 current sql: update emp set job = 'MANAGER' where empno = 7788
                 short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2022<-kslwaitctx()+163<-k
    jusuc()+3400<-ksipgetctxi()+1759<-ksqcmi()+20798<-ksqgtlctx()+3501<-ksqgelctx()+557<-ktuGetTxForXid()+131<-ktcwit1()+336<-kdddgb()+8587<-kdusru()+460<-kauupd()+412<-updrow
    ()+2167<-qerupFetch()+860<-updaul()+1378<-updThreePhaseExe()+318<-updexe()+638<-opiexe()+10378<-kpoal8()+2380<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-op
    iodr()+917<-opidrv()+570<-sou2o(
                wait history:
                  * time between current wait and wait #1: 0.005023 sec
                  1.       event: 'gc cr block 2-way'
                     time waited: 0.000478 sec
                         wait id: 151             p1: ''=0x3
                                                  p2: ''=0xc0
                                                  p3: ''=0x19
                  * time between wait #1 and #2: 0.000295 sec
                  2.       event: 'gc cr block 2-way'
                     time waited: 0.000821 sec
                         wait id: 150             p1: ''=0x3
                                                  p2: ''=0x362
                                                  p3: ''=0x1a
                  * time between wait #2 and #3: 0.000294 sec
                  3.       event: 'gc cr block 2-way'
                     time waited: 0.000431 sec
                         wait id: 149             p1: ''=0x3
                                                  p2: ''=0xc0
                                                  p3: ''=0x19
        }
        and is blocked by
     => Oracle session identified by:
        {
                    instance: 1 (jyzhao.jyzhao1)
                       os id: 1648
                  process id: 32, oracle@jyrac1 (TNS V1-V3)
                  session id: 26
            session serial #: 3479
        }
        which is waiting for 'SQL*Net message from client' with wait info:
        {
                          p1: 'driver id'=0x62657100
                          p2: '#bytes'=0x1
                time in wait: 9 min 28 sec
               timeout after: never
                     wait id: 168
                    blocking: 3 sessions
                 current sql: <none>
                 short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-read()+14<-ntpfprd()+117<-nsbasic_brc()+376<-nsbrecv()+69<-nioqrc()+4
    95<-opikndf2()+978<-opitsk()+831<-opiino()+969<-opiodr()+917<-opidrv()+570<-sou2o()+103<-opimai_real()+133<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
                wait history:
                  * time between current wait and wait #1: 0.000010 sec
                  1.       event: 'SQL*Net message to client'
                     time waited: 0.000003 sec
                         wait id: 167             p1: 'driver id'=0x62657100
                                                  p2: '#bytes'=0x1
                  * time between wait #1 and #2: 0.000239 sec
                  2.       event: 'gc current grant 2-way'
                     time waited: 0.000337 sec
                         wait id: 166             p1: ''=0x7
                                                  p2: ''=0x6483
                                                  p3: ''=0x2010001
                  * time between wait #2 and #3: 0.000196 sec
                  3.       event: 'db file sequential read'
                     time waited: 0.000824 sec
                         wait id: 165             p1: 'file#'=0x7
                                                  p2: 'block#'=0x6483
                                                  p3: 'blocks'=0x1
        }
     
    Chain 1 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock contention'
    Chain 1 Signature Hash: 0x42598823
    -------------------------------------------------------------------------------
     
    ===============================================================================
    Intersecting chains:
     
    -------------------------------------------------------------------------------
    Chain 2:
    -------------------------------------------------------------------------------
        Oracle session identified by:
        {
                    instance: 2 (jyzhao.jyzhao2)
                       os id: 23488
                  process id: 42, oracle@jyrac2 (TNS V1-V3)
                  session id: 23
            session serial #: 12635
        }
        is waiting for 'enq: TX - row lock contention' with wait info:
        {
                          p1: 'name|mode'=0x54580006
                          p2: 'usn<<16 | slot'=0xe0002
                          p3: 'sequence'=0x3a3
                time in wait: 8 min 34 sec
               timeout after: never
                     wait id: 39
                    blocking: 0 sessions
                 current sql: update emp set sal = 15000 where empno = 7839
                 short stack: ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178<-ksliwat()+2022<-kslwaitctx()+163<-k
    jusuc()+3400<-ksipgetctxi()+1759<-ksqcmi()+20798<-ksqgtlctx()+3501<-ksqgelctx()+557<-ktuGetTxForXid()+131<-ktcwit1()+336<-kdddgb()+8587<-kdusru()+460<-kauupd()+412<-updrow
    ()+2167<-qerupFetch()+860<-updaul()+1378<-updThreePhaseExe()+318<-updexe()+638<-opiexe()+10378<-kpoal8()+2380<-opiodr()+917<-ttcpip()+2183<-opitsk()+1710<-opiino()+969<-op
    iodr()+917<-opidrv()+570<-sou2o(
                wait history:
                  * time between current wait and wait #1: 0.000226 sec
                  1.       event: 'gc cr block 2-way'
                     time waited: 0.000605 sec
                         wait id: 38              p1: ''=0x3
                                                  p2: ''=0xc0
                                                  p3: ''=0x19
                  * time between wait #1 and #2: 0.011878 sec
                  2.       event: 'gc cr block 2-way'
                     time waited: 0.000610 sec
                         wait id: 37              p1: ''=0x3
                                                  p2: ''=0xc0
                                                  p3: ''=0x19
                  * time between wait #2 and #3: 0.000316 sec
                  3.       event: 'Disk file operations I/O'
                     time waited: 0.000007 sec
                         wait id: 36              p1: 'FileOperation'=0x2
                                                  p2: 'fileno'=0x3
                                                  p3: 'filetype'=0x2
        }
        and is blocked by 'instance: 2, os id: 23427, session id: 148',
        which is a member of 'Chain 1'.
     
    Chain 2 Signature: 'SQL*Net message from client'<='enq: TX - row lock contention'<='enq: TX - row lock contention'
    Chain 2 Signature Hash: 0x42598823
    -------------------------------------------------------------------------------
     
    ===============================================================================
    

    可以看到这种oradebug hanganalyze生成的trc文件并没有那么的难理解,反而这种chain的描述更加清楚,身为一名专业的Oracle DBA,我们是必须要掌握这种方法的。

  • 相关阅读:
    Bootstrap Div 居中的方法
    JQuery 实现 锚点跳转
    JQuery 获取页面某一元素的位置
    BootStrap 实现导航栏nav透明,nav子元素文字不透明
    bootstrap3中container与container_fluid容器的区别
    常见向量范数和矩阵范数
    leetcode 202. Happy Number
    leetcode 172. Factorial Trailing Zeroes
    leetcode 168. Excel Sheet Column Title 171 Excel Sheet Column Number
    leetcode Two Sum II
  • 原文地址:https://www.cnblogs.com/jyzhao/p/8897989.html
Copyright © 2020-2023  润新知