• shared pool latch和library cache latch


    一套关键的系统,由于”_kghdsidx_count”这个参数设置为1,导致了严重的性能问题。从故障现象上看是大量的library cache latch的等待,以及shared pool latch的等待,但是前者的等待时间比后者长得多。在文章中,我提到,在当时我推断,由于”_kghdsidx_count”这个隐含参数设置为1,导致shared pool只有1个subpool,引起了shared pool latch的严重竞争,进而引起了library cache cache的更为严重的竞争,这个竞争的过程如下:

    1 由于”_kghdsidx_count”=1,使得shared pool latch只有1个child latch。而library cache latch的child latch数量跟CPU数量有关,最大值为67,编号为child #1-#672 会话1持有shared pool latch。
    3 会话2解析SQL语句,首先持有了library cache latch的child latch,假设这里为child #1,然后去请求shared pool latch。很显然,这时候被会话1持有,那么会话2就会等待shared pool latch。
    4 会话3解析SQL语句,首先需要请求library cache latch,如果请求的library cache child latch刚好是#1,那么由于会话2持有了这个child latch,就会等待library cache latch。
    5 因此,实际上会话1和会话2的shared pool latch的竞争引起了会话3的library cache latch的等待。如果并发数不是太高,那么shared pool latch的竞争看上去就会比library cache latch的竞争多一些。但是如果有几百个活动会话,这个时候,就会有大量的会话首先等待library cache latch,因为在解析SQL时是首先需要获取library cache latch再获取shared pool latch。由于大量的软解析,甚至不需要获取shared pool latch,同时一个大型的OLTP系统中,某几条相同的SQL并发执行的概率很高,这样会使很多会话同时请求同一library cache child latch;另外,在解析过程中,可能会多次请求library cache latch和shared pool latch,而前者请求和释放的次数会比后者多得多;这样大量的会话在获取library cache latch时处于等待状态,从现象上看就比shared pool latch的等待多得多。

    而本文主要表达的是,怎么来验证在解析时,Oracle进程在持有了library cache latch的情况下去请求shared pool latch,而不是在请求shared pool时不需要持有library cache latch。

    由于这个验证过程过于internal,所以没有在《DBA手记III》中描述出来。这里写出来,供有兴趣的朋友参考。

    验证上面的这一点,有2个方法。下面以测试过程来详细描述。
    测试环境 :Oracle 10.2.0.5.1 for Linux X86.

    方法一:使用oradebug。

    1. 将数据库的”_kghdsidx_count”参数值设为1,并重启数据库,以保证只有一个shared pool child latch。
    2. 使用sqlplus连接到数据库,假设这个会话是session-1,查询当前的SID:

    1 SQL> select sid from v$mystat where rownum=1;  
    2   
    3        SID  
    4 ----------  
    5        159  

    同时获得当前连接的spid为2415。
    3. 新建一个连接到数据库,假设会话是session-2,查询shared pool latch的地址,并使用oradebug将这个地址对应的值置为1,以表示该latch已经被持有:

     1 SQL> select addr,latch#,level#,child#,name,gets  from v$latch_children where name='shared pool';  
     2   
     3 ADDR         LATCH#     LEVEL#     CHILD# NAME                                                     GETS  
     4 -------- ---------- ---------- ---------- -------------------------------------------------- ----------  
     5 200999BC        216          7          1 shared pool                                             34949  
     6 20099A20        216          7          2 shared pool                                                 6  
     7 20099A84        216          7          3 shared pool                                                 6  
     8 20099AE8        216          7          4 shared pool                                                 6  
     9 20099B4C        216          7          5 shared pool                                                 6  
    10 20099BB0        216          7          6 shared pool                                                 6  
    11 20099C14        216          7          7 shared pool                                                 6  
    12   
    13 SQL> oradebug poke 0x200999BC 4 1  
    14 BEFORE: [200999BC, 200999C0) = 00000000  
    15 AFTER:  [200999BC, 200999C0) = 00000001

    4. 在session-1会话中执行下面的SQL:

    1 SQL> select sysdate from dual;l

    正如预料之中的反映,这个会话hang住。

    5. 在session-2中,对session-1的进程作process dump。(注意这个时候不能查询v$session_wait、v$latchholder等视图)

    1 SQL> oradebug setospid 2415  
    2 Oracle pid: 15, Unix process pid: 2415, image: oracle@xty (TNS V1-V3)  
    3 SQL> oradebug dump processstate 10  
    4 Statement processed.  
    5 SQL> oradebug tracefile_name  
    6 /oracle/app/oracle/admin/xty/udump/xty_ora_2415.trc  

    然后从/oracle/app/oracle/admin/xty/udump/xty_ora_2415.trc这个TRACE文件中可以找到下面的信息:

     1 Process global information:  
     2      process: 0x51a2dce8, call: 0x51b465e4, xact: (nil), curses: 0x51b2a4b8, usrses: 0x51b2a4b8  
     3   ----------------------------------------  
     4   SO: 0x51a2dce8, type: 2, owner: (nil), flag: INIT/-/-/0x00  
     5   (process) Oracle pid=15, calls cur/top: 0x51b465e4/0x51b465e4, flag: (0) -  
     6             int error: 0, call error: 0, sess error: 0, txn error 0  
     7   (post info) last post received: 110 0 4  
     8               last post received-location: kslpsr  
     9               last process to post me: 51a2a904 1 6  
    10               last post sent: 0 0 24  
    11               last post sent-location: ksasnd  
    12               last process posted by me: 51a2a904 1 6  
    13     (latch info) wait_event=0 bits=20  
    14         Location from where call was made: kghupr1: Chunk Header  
    15         Context saved from call: 1306572176  
    16       waiting for 200999bc Child shared pool level=7 child#=1  
    17         Location from where latch is held: kghupr1: Chunk Header  
    18         Context saved from call: 1299065692  
    19         state=busy, wlstate=free  
    20           waiters [orapid (seconds since: put on list, posted, alive check)]:  
    21            10 (54, 1304393092, 54)  
    22            11 (54, 1304393092, 54)  
    23            24 (15, 1304393092, 15)  
    24            15 (6, 1304393092, 6)  
    25            waiter count=4  
    26           gotten 35138 times wait, failed first 502 sleeps 5  
    27           gotten 0 times nowait, failed: 0  
    28       on wait list for 200999bc  
    29       holding    (efd=3) 4f7c75f8 Child library cache level=5 child#=1  
    30         Location from where latch is held: kgllkdl: child: no lock handle: latch  
    31         Context saved from call: 2  
    32         state=busy, wlstate=free  

    从waiting for 200999bc Child shared pool level=7 child#=1和holding (efd=3) 4f7c75f8 Child library cache level=5 child#=1可以看到,进程的确是在持有了library cache latch的同时去请求 shared pool latch。

    最后使用“oradebug poke 0×200999BC 4 0”命令将shared pool latch置为free状态。

    方法二:使用gdb。
    1. 还原”_kghdsidx_count”参数,重启数据库。
    2. 使用sqlplus连接到数据库,获取对应的Oracle Process的spid为4868。
    3. 使用gdb:

    1 [oracle@xty ~]$ gdb -p 4868
    2 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)  
    3 Copyright (C) 2009 Free Software Foundation, Inc.  
    4 ...省略部分输出...  
    5 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.  
    6 Loaded symbols for /lib/libnss_files.so.2  
    7 0x0059b402 in __kernel_vsyscall ()  

    4. 在gdb中,在latch的获取和释放的函数上设置断点:

    注:最后几个函数只是猜测跟latch有关....latch的请求和释放所调用的函数主要为:kslgetl、kslgetsl和kslfre。

     1 (gdb) b kslgpl 
     2 Breakpoint 1 at 0x8322f53
     3 (gdb) b kslgetsl
     4 Breakpoint 2 at 0x8317351
     5 (gdb) b kslgetl
     6 Breakpoint 3 at 0x83170ba
     7 (gdb) b kslfre
     8 Breakpoint 4 at 0x8318d49
     9 (gdb) c
    10 Continuing.

    5. 在sqlplus中执行SQL:

    SQL> select sysdate from dual;l

    正如预期的那样,会话hang住。这是因为运行到了gdb设置的断点。

    6.在gdb中可以看到:

    Breakpoint 2, 0x08317351 in kslgetsl ()
    (gdb) info f
    Stack level 0, frame at 0xbf8386b0:
     eip = 0x8317351 in kslgetsl; saved eip 0x83f5e10
     called by frame at 0xbf8386ec
     Arglist at 0xbf8386a8, args: 
     Locals at 0xbf8386a8, Previous frame's sp is 0xbf8386b0
     Saved registers:
      ebp at 0xbf8386a8, eip at 0xbf8386ac
    (gdb) c
    Continuing.
    
    Breakpoint 4, 0x08318d49 in kslfre ()
    (gdb) info f
    Stack level 0, frame at 0xbf8386c0:
     eip = 0x8318d49 in kslfre; saved eip 0x83f5e49
     called by frame at 0xbf8386ec
     Arglist at 0xbf8386b8, args: 
     Locals at 0xbf8386b8, Previous frame's sp is 0xbf8386c0
     Saved registers:
      ebp at 0xbf8386b8, eip at 0xbf8386bc
    (gdb) x /10x 0xbf8386b8
    0xbf8386b8:     0xbf8386e4      0x083f5e49      0x2fb2c108      0xbf8386e4
    0xbf8386c8:     0x2fafb5a0      0x0cbd2dfc      0x0cbd2d00      0x00000002
    0xbf8386d8:     0x00000000      0x2f7abf68
    (gdb) c
    Continuing.
    ----下面很重要-----  
    Breakpoint 3, 0x083170ba in kslgetl ()
    (gdb) info f
    Stack level 0, frame at 0xbf837acc:
     eip = 0x83170ba in kslgetl; saved eip 0x8329508
     called by frame at 0xbf837af4
     Arglist at 0xbf837ac4, args: 
     Locals at 0xbf837ac4, Previous frame's sp is 0xbf837acc
     Saved registers:
      ebp at 0xbf837ac4, eip at 0xbf837ac8
    (gdb) x /10x 0xbf837ac4
    0xbf837ac4:     0xbf837aec      0x08329508      0x2e3c3a74      0x00000001
    0xbf837ad4:     0x00000000      0x000009f8      0x00000000      0x00000001
    0xbf837ae4:     0x0cb84378      0x00dcf346
    (gdb) c
    Continuing.
    
    Breakpoint 4, 0x08318d49 in kslfre ()
    (gdb) info f
    Stack level 0, frame at 0xbf837b74:
     eip = 0x8318d49 in kslfre; saved eip 0x8329627
     called by frame at 0xbf837b8c
     Arglist at 0xbf837b6c, args: 
     Locals at 0xbf837b6c, Previous frame's sp is 0xbf837b74
     Saved registers:
      ebp at 0xbf837b6c, eip at 0xbf837b70
    (gdb) x /10x 0xbf837b6c
    0xbf837b6c:     0xbf837b84      0x08329627      0x2e3c3a74      0x0cb84378
    0xbf837b7c:     0x0001ffff      0x0832954c      0xbf837d1c      0x0ae96833
    0xbf837b8c:     0x0cbd2d00      0x2e3c3a74
    (gdb) c
    Continuing.
    
    Breakpoint 3, 0x083170ba in kslgetl ()
    (gdb) info f
    Stack level 0, frame at 0xbf837acc:
     eip = 0x83170ba in kslgetl; saved eip 0x8329508
     called by frame at 0xbf837af4
     Arglist at 0xbf837ac4, args: 
     Locals at 0xbf837ac4, Previous frame's sp is 0xbf837acc
     Saved registers:
      ebp at 0xbf837ac4, eip at 0xbf837ac8
    (gdb) x /10x 0xbf837ac4
    0xbf837ac4:     0xbf837aec      0x08329508      0x2e3c3a74      0x00000001
    0xbf837ad4:     0x00000000      0x000009f8      0x00000000      0x00000001
    0xbf837ae4:     0x0cb84378      0x00dcf346
    (gdb) c
    Continuing.
    
    Breakpoint 3, 0x083170ba in kslgetl ()
    (gdb) info f
    Stack level 0, frame at 0xbf837968:
     eip = 0x83170ba in kslgetl; saved eip 0x8329508
     called by frame at 0xbf837990
     Arglist at 0xbf837960, args: 
     Locals at 0xbf837960, Previous frame's sp is 0xbf837968
     Saved registers:
      ebp at 0xbf837960, eip at 0xbf837964
    (gdb) x /10x 0xbf837960
    0xbf837960:     0xbf837988      0x08329508      0x20095ad0      0x00000001
    0xbf837970:     0x00000000      0x0000097f      0x0bfaf900      0x00000158
    0xbf837980:     0x0cb84378      0xbf8379d4
    (gdb) c
    Continuing.
    
    Breakpoint 4, 0x08318d49 in kslfre ()
    (gdb) info f
    Stack level 0, frame at 0xbf837984:
     eip = 0x8318d49 in kslfre; saved eip 0x8329627
     called by frame at 0xbf83799c
     Arglist at 0xbf83797c, args: 
     Locals at 0xbf83797c, Previous frame's sp is 0xbf837984
     Saved registers:
      ebp at 0xbf83797c, eip at 0xbf837980
    (gdb) x /10x 0xbf83797c
    0xbf83797c:     0xbf837994      0x08329627      0x20095ad0      0x0cb84378
    0xbf83798c:     0x00000000      0x2001af7c      0xbf837a20      0x0abfc335
    0xbf83799c:     0x0cbd2d00      0x20095ad0
    
    (gdb) c
    Continuing.
    
    Breakpoint 4, 0x08318d49 in kslfre ()
    (gdb) info f
    Stack level 0, frame at 0xbf837b74:
     eip = 0x8318d49 in kslfre; saved eip 0x8329627
     called by frame at 0xbf837b8c
     Arglist at 0xbf837b6c, args: 
     Locals at 0xbf837b6c, Previous frame's sp is 0xbf837b74
     Saved registers:
      ebp at 0xbf837b6c, eip at 0xbf837b70
    (gdb) x /10x 0xbf837b6c
    0xbf837b6c:     0xbf837b84      0x08329627      0x2e3c3a74      0x0cb84378
    0xbf837b7c:     0xbf837e5c      0x2bd8d494      0xbf837d1c      0x0ae96c1b
    0xbf837b8c:     0x0cbd2d00      0x2e3c3a74
    (gdb) c
    Continuing.

    同时可以从数据库中查到:

    SQL> select addr,latch#,child#,level#,name,gets from v$latch_children  
      2  where name in ('shared pool','library cache');
    
    ADDR         LATCH#     CHILD#     LEVEL# NAME                                           GETS
    -------- ---------- ---------- ---------- ---------------------------------------- ----------
    20095AD0        213          1          7 shared pool                                  403174
    20095B34        213          2          7 shared pool                                      18
    20095B98        213          3          7 shared pool                                      18
    20095BFC        213          4          7 shared pool                                      18
    20095C60        213          5          7 shared pool                                      18
    20095CC4        213          6          7 shared pool                                      18
    20095D28        213          7          7 shared pool                                      18
    2E3C3A74        214          1          5 library cache                                507555

    标记为”下面很重要“的输出,汇总起来就是:

    kslgetl  0x2e3c3a74  library cache  
    kslfre    0x2e3c3a74  library cache  
    kslgetl  0x2e3c3a74    library cache  
    kslgetl  0x20095ad0  shared pool  
    kslfre    0x20095ad0  shared pool  
    kslfre    0x2e3c3a74    library cache 

    这个过程可以理解为:Oracle进程首先获取library cache latch,在library cache中查找是否已经有要执行的SQL语句(这里显然没有找到),然后释放library cache。接下来是硬解析SQL的后半部分,首先获取library cache latch,然后获取shared pool latch,从shared pool中申请内存,然后释放shared pool latch,最后再获取library cache latch。

    后面不再继续跟踪解析的过程,这个过程会有相当多的latch的获取和释放,但是从上面可以很清晰地看到,进程在持有了library cache latch的同时去请求shared pool latch。

  • 相关阅读:
    zhanwei
    站位
    Web前端开发:SQL Jsp小项目(二)------添加修改
    Web前端开发:SQL Jsp小项目(一)
    JDBC(用Eclipse操作数据库Oracle)的基础操作集合
    Thread多线程stopSleep顺序问题
    IO流详解之代码详解
    IO流详解(半教材类型)
    用JQuery编写textarea,input,checkbox,select
    Java编写ArrayBasic制作一个简单的酒店管理系统
  • 原文地址:https://www.cnblogs.com/taowang2016/p/2959258.html
Copyright © 2020-2023  润新知