• Row Cache lock Problem


    一套AIX上的11.1.0.7系统,应用启动时出现大量row cache lock等待,具体的systemstate dump信息如下:
    FILE VERSIONS
    -----------------------
    Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
    With the Partitioning, Real Application Clusters, Data Mining and Real Application Testing options
    ORACLE_HOME = /oracle/product/11.1.0/database
    System name: AIX
    Node name: HA5_4ADB01
    Release: 3
    Version: 5
    Machine: 000687C2D900
    Instance name: HN4A1
    
    
    TRACE FILE
    ---------------------------
    Filename=HN4A1_ora_2061038.trc
    
    
    
    Chains most likely to have caused the hang:
     [a] Chain 1 Signature: 'row cache lock'<='row cache lock' (cycle)
         Chain 1 Signature Hash: 0x75bdd0c
     [b] Chain 2 Signature: 'row cache lock'<='row cache lock' (cycle)
         Chain 2 Signature Hash: 0x75bdd0c
     [c] Chain 3 Signature: 'row cache lock'<='row cache lock' (cycle)
         Chain 3 Signature Hash: 0x75bdd0c
     
    ===============================================================================
    Cycles:
    -------------------------------------------------------------------------------
    Chain 1:
    -------------------------------------------------------------------------------
        Oracle session identified by:
        {
                    instance: 1 (hn4a.hn4a1)
                       os id: 2114372
                  process id: 211, oracle@HA5_4ADB01
                  session id: 610
            session serial #: 5
        }
        is waiting for 'row cache lock' with wait info:
        {
                          p1: 'cache id'=0x7
                          p2: 'mode'=0x0
                          p3: 'request'=0x5
                time in wait: 0.491964 sec
          heur. time in wait: 26.859741 sec
               timeout after: 2.508036 sec
                     wait id: 311
                    blocking: 1 session
                wait history:
                  1.     event: 'row cache lock'
                       wait id: 310             p1: 'cache id'=0x7
                   time waited: 2.929713 sec    p2: 'mode'=0x0
                                                p3: 'request'=0x5
                  2.     event: 'row cache lock'
                       wait id: 309             p1: 'cache id'=0x7
                   time waited: 2.929726 sec    p2: 'mode'=0x0
                                                p3: 'request'=0x5
                  3.     event: 'row cache lock'
                       wait id: 308             p1: 'cache id'=0x7
                   time waited: 2.929720 sec    p2: 'mode'=0x0
                                                p3: 'request'=0x5
        }
        and is blocked by
     => Oracle session identified by:
        {
                    instance: 1 (hn4a.hn4a1)
                       os id: 2118476
                  process id: 209, oracle@HA5_4ADB01
                  session id: 616
            session serial #: 5
        }
        which is waiting for 'row cache lock' with wait info:
        {
                          p1: 'cache id'=0x7
                          p2: 'mode'=0x0
                          p3: 'request'=0x5
                time in wait: 0.491910 sec
          heur. time in wait: 26.859705 sec
               timeout after: 2.508090 sec
                     wait id: 308
                    blocking: 1 session
                wait history:
                  1.     event: 'row cache lock'
                       wait id: 307             p1: 'cache id'=0x7
                   time waited: 2.929713 sec    p2: 'mode'=0x0
                                                p3: 'request'=0x5
                  2.     event: 'row cache lock'
                       wait id: 306             p1: 'cache id'=0x7
                   time waited: 2.929729 sec    p2: 'mode'=0x0
                                                p3: 'request'=0x5
                  3.     event: 'row cache lock'
                       wait id: 305             p1: 'cache id'=0x7
                   time waited: 2.929715 sec    p2: 'mode'=0x0
                                                p3: 'request'=0x5
        }
        and is blocked by the session at the start of the chain.
     
    Chain 1 Signature: 'row cache lock'<='row cache lock' (cycle)
    Chain 1 Signature Hash: 0x75bdd0c
    
    ...==>many similar chains
    
    PROCESS 211:
      ----------------------------------------
      SO: 0x7000008a1370178, type: 2, owner: 0x0, flag: INIT/-/-/0x00 if: 0x3 c: 0x3
       proc=0x7000008a1370178, name=process, file=ksu.h LINE:10706 ID:, pg=0
      (process) Oracle pid:211, ser:3, calls cur/top: 0x7000008ac61a0e8/0x7000008ac61a0e8
                flags : (0x0) -
                flags2: (0x0),  flags3: (0x0)
                int error: 0, call error: 0, sess error: 0, txn error 0
      ksudlp FALSE at location: 0
      (post info) last post received: 0 0 201
                  last post received-location: kqr.h LINE:2181 ID:kqrbgl: compatible mode
                  last process to post me: 70000089134b800 1 6
                  last post sent: 0 0 0
                  last post sent-location: No post
                  last process posted by me: none
        (latch info) wait_event=0 bits=0
        Process Group: DEFAULT, pseudo proc: 0x7000008a153a4c8
        O/S info: user: oracle, term: UNKNOWN, ospid: 2114372
        OSD pid info: Unix process pid: 2114372, image: oracle@HA5_4ADB01
        Short stack dump:
    <-ksedsts()+0254<-ksdxfstk()+0028<-ksdxcb()+05d8<-sspuser()+0074<-44c0<-skgpwwait()+00b8<-ksliwat()+0c48<-kslwaitctx()+0150<-kqrigt()+06ac<-kqrLockAndPinPo()+0250<-kqrpre1()+061c<-kqrpre()+001c<-kziavua()+0384<-kpolnb()+0ffc<-kpoauth()+07a0<-opiodr()+0b98<-ttcpip()+115c<-opitsk()+1700<-opiino()+09f0<-opiodr()+0b98<-opidrv()+0440<-sou2o()+0090<-opimai_real()+01b0<-main()+0090<-__start()+0098
    
    
        service name: SYS$USERS
        client details:
          O/S info: user: aiuap, term: unknown, ospid: 1234
          machine: HA5-4A26 program: JDBC Thin Client
          application name: JDBC Thin Client, hash value=2546894660
    ...
             proc=0x7000008a1370178, name=row cache enqueues, file=kqr.h LINE:2004 ID:, pg=0
            row cache enqueue: count=1 session=7000008916dee00 object=7000008dfd84428, request=X
            savepoint=0x3e
            row cache parent object: address=7000008dfd84428 cid=7(dc_users)
            hash=481062cf typ=21 transaction=700000888da17a0 flags=00000002
            own=7000008dfd844f0[7000008bf6e18f0,7000008bf6e18f0] wat=7000008dfd84500[7000008bf5d2750,7000008ef34e818] mode=X
            status=VALID/-/-/-/-/-/-/-/-
            request=N release=FALSE flags=8
            instance lock=QK 481062cf bb6f8fe9
    ...
          LIBRARY OBJECT LOCK: 7000008ef5069d8 handle=7000008dfd22338 mod=N
          pnc=0 pns=0 cbb=1 rpr=1 exc=1 ilh=0 ctx=0
          use=7000008916dee00 ses=7000008a15c44f0 cnt=1 flg=CNB/[0001] spn=0x4c3d09ea
          LIBRARY HANDLE:7000008dfd22338 bid=105620 hid=a3c59c94 lmd=N pmd=0 sta=VALD
          name=select privilege#,level from sysauth$ connect by grantee#=prior privilege# and privilege#>0 start with grantee#=:1 and privilege#>0
          hash=8973f7c629f09e3f081962c2a3c59c94 idn=0
    
    众多wait chain的源头均是row cache lock,systemstate转储中该等待事件相关的p1,也就是cache id均为7;cache#为7的字典缓存代表着dc_users,即字典中的用户信息(包括用户名,密码等信息):
    SQL> select cache#,parameter from v$rowcache group by cache#,parameter order by cache#;
    
        CACHE# PARAMETER
    ---------- --------------------------------
             0 dc_tablespaces
             1 dc_free_extents
             2 dc_segments
             3 dc_rollback_segments
             4 dc_used_extents
             5 dc_tablespace_quotas
             6 dc_files
             7 dc_users
             8 dc_object_grants
             8 dc_objects
             9 dc_qmc_cache_entries
    
        CACHE# PARAMETER
    ---------- --------------------------------
            10 dc_usernames
            11 dc_object_ids
            12 dc_constraints
            13 dc_sequences
            14 dc_profiles
            15 dc_database_links
            16 dc_histogram_data
            16 dc_histogram_defs
            17 dc_global_oids
            18 dc_outlines
            19 dc_partition_scns
    
        CACHE# PARAMETER
    ---------- --------------------------------
            19 dc_table_scns
            20 rule_info
            21 rule_fast_operators
            21 rule_or_piece
            22 dc_awr_control
            23 dc_qmc_ldap_cache_entries
            24 outstanding_alerts
            25 dc_hintsets
            26 global database name
            27 qmtmrcin_cache_entries
            28 qmtmrctn_cache_entries
    
        CACHE# PARAMETER
    ---------- --------------------------------
            29 qmtmrcip_cache_entries
            30 qmtmrctp_cache_entries
            31 qmtmrciq_cache_entries
            32 qmtmrctq_cache_entries
            33 kqlsubheap_object
            34 realm auth
            34 realm cache
            35 Command rule cache
            36 Realm Object cache
            36 Realm Subordinate Cache
            37 event map
    
        CACHE# PARAMETER
    ---------- --------------------------------
            38 format
            39 audit collector
            40 Rule Set Cache
    
    47 rows selected.
    大量的dc_users字典缓存锁可能引起后续会话登录挂起,用户登录无法成功,乃至整个实例hang住。那么又是什么样的操作可能引起这种row cache lock并发锁等待事件呢? 应用维护人员反馈问题发生时部分应用程序配置文件中的数据库用户密码不正确,同时该应用会在短期内发起大量会话登录数据库,但应用人员观察到这一问题后并没有联想到可能引发进一步的实例上的dc_user row cache lock问题,仅尝试修改密码并重启该部分的应用程序。至此该问题的root cause已经较为明了:由于短期内出现大量连接请求但连接发起端给出的密码均不正确,引起了Server端严重的用户信息字典缓存锁(users dictionary cache  lock)争用。 该问题诊断过程中提交了SR,Oracle GCS给出了进一步的结论,在11g release 1上Bug 7715339也可能造成类似的row cache lock问题,且该Bug一开始在AIX平台上没有one-off的patch,之后客户要求MOS开发出了该平台上的补丁,目前AIX上也有7715339的one-off patch了:
    1. On 11.1.0.7, Many 'row cache lock' happened , application can not login to database. 2. The issue is caused by password has been changed but some application still used old password, so these application continuously trying connect to database and leads to dc_users contention. 3. This symptom matches Bug 7715339, but there is no patch for AIX available. 4. Customer requested one-off patch 7715339 on AIX Power System(64-bit), because they are afraid the issues again after changing password.
    其实我一直认为Oracle的登录机制存在一定的问题,实际上不管用户提供的密码是否正确,Oracle都会为新的connection分配一个shadow process,该服务进程为了进一步验证用户信息都不得不获取少量资源;如果以此为契机,即便在无法正常登录的情况下依然能在短期内造成实例僵死。
  • 相关阅读:
    MD5值算法原理
    AUTH过程
    锁定应用,解锁应用,锁卡,解卡,更改密码指令
    借/贷记卡的应用
    借记卡,贷记卡,准贷记卡三者的区别
    PBOC2.0与PBOC3.0的区别
    ED/EP简介
    与恒宝有关的一些常用知识
    java卡与native卡的区别
    计算机组和域的区别
  • 原文地址:https://www.cnblogs.com/macleanoracle/p/2967550.html
Copyright © 2020-2023  润新知