shared pool latch和library cache latch

一套关键的系统，由于”_kghdsidx_count”这个参数设置为1，导致了严重的性能问题。从故障现象上看是大量的library cache latch的等待，以及shared pool latch的等待，但是前者的等待时间比后者长得多。在文章中，我提到，在当时我推断，由于”_kghdsidx_count”这个隐含参数设置为1，导致shared pool只有1个subpool，引起了shared pool latch的严重竞争，进而引起了library cache cache的更为严重的竞争，这个竞争的过程如下：

1 由于”_kghdsidx_count”=1，使得shared pool latch只有1个child latch。而library cache latch的child latch数量跟CPU数量有关，最大值为67，编号为child #1-#67。
2 会话1持有shared pool latch。
3 会话2解析SQL语句，首先持有了library cache latch的child latch，假设这里为child #1，然后去请求shared pool latch。很显然，这时候被会话1持有，那么会话2就会等待shared pool latch。
4 会话3解析SQL语句，首先需要请求library cache latch，如果请求的library cache child latch刚好是#1，那么由于会话2持有了这个child latch，就会等待library cache latch。
5 因此，实际上会话1和会话2的shared pool latch的竞争引起了会话3的library cache latch的等待。如果并发数不是太高，那么shared pool latch的竞争看上去就会比library cache latch的竞争多一些。但是如果有几百个活动会话，这个时候，就会有大量的会话首先等待library cache latch，因为在解析SQL时是首先需要获取library cache latch再获取shared pool latch。由于大量的软解析，甚至不需要获取shared pool latch，同时一个大型的OLTP系统中，某几条相同的SQL并发执行的概率很高，这样会使很多会话同时请求同一library cache child latch；另外，在解析过程中，可能会多次请求library cache latch和shared pool latch，而前者请求和释放的次数会比后者多得多；这样大量的会话在获取library cache latch时处于等待状态，从现象上看就比shared pool latch的等待多得多。

而本文主要表达的是，怎么来验证在解析时，Oracle进程在持有了library cache latch的情况下去请求shared pool latch，而不是在请求shared pool时不需要持有library cache latch。

由于这个验证过程过于internal，所以没有在《DBA手记III》中描述出来。这里写出来，供有兴趣的朋友参考。

验证上面的这一点，有2个方法。下面以测试过程来详细描述。
测试环境：Oracle 10.2.0.5.1 for Linux X86.

方法一：使用oradebug。

1. 将数据库的”_kghdsidx_count”参数值设为1，并重启数据库，以保证只有一个shared pool child latch。
2. 使用sqlplus连接到数据库，假设这个会话是session-1，查询当前的SID:

1 SQL> select sid from v$mystat where rownum=1;  
2   
3        SID  
4 ----------  
5        159

同时获得当前连接的spid为2415。
3. 新建一个连接到数据库，假设会话是session-2，查询shared pool latch的地址，并使用oradebug将这个地址对应的值置为1，以表示该latch已经被持有：

 1 SQL> select addr,latch#,level#,child#,name,gets  from v$latch_children where name='shared pool';  
 2   
 3 ADDR         LATCH#     LEVEL#     CHILD# NAME                                                     GETS  
 4 -------- ---------- ---------- ---------- -------------------------------------------------- ----------  
 5 200999BC        216          7          1 shared pool                                             34949  
 6 20099A20        216          7          2 shared pool                                                 6  
 7 20099A84        216          7          3 shared pool                                                 6  
 8 20099AE8        216          7          4 shared pool                                                 6  
 9 20099B4C        216          7          5 shared pool                                                 6  
10 20099BB0        216          7          6 shared pool                                                 6  
11 20099C14        216          7          7 shared pool                                                 6  
12   
13 SQL> oradebug poke 0x200999BC 4 1  
14 BEFORE: [200999BC, 200999C0) = 00000000  
15 AFTER:  [200999BC, 200999C0) = 00000001

4. 在session-1会话中执行下面的SQL：

1 SQL> select sysdate from dual;l

正如预料之中的反映，这个会话hang住。

5. 在session-2中，对session-1的进程作process dump。（注意这个时候不能查询v$session_wait、v$latchholder等视图）

1 SQL> oradebug setospid 2415  
2 Oracle pid: 15, Unix process pid: 2415, image: oracle@xty (TNS V1-V3)  
3 SQL> oradebug dump processstate 10  
4 Statement processed.  
5 SQL> oradebug tracefile_name  
6 /oracle/app/oracle/admin/xty/udump/xty_ora_2415.trc

然后从/oracle/app/oracle/admin/xty/udump/xty_ora_2415.trc这个TRACE文件中可以找到下面的信息：

 1 Process global information:  
 2      process: 0x51a2dce8, call: 0x51b465e4, xact: (nil), curses: 0x51b2a4b8, usrses: 0x51b2a4b8  
 3   ----------------------------------------  
 4   SO: 0x51a2dce8, type: 2, owner: (nil), flag: INIT/-/-/0x00  
 5   (process) Oracle pid=15, calls cur/top: 0x51b465e4/0x51b465e4, flag: (0) -  
 6             int error: 0, call error: 0, sess error: 0, txn error 0  
 7   (post info) last post received: 110 0 4  
 8               last post received-location: kslpsr  
 9               last process to post me: 51a2a904 1 6  
10               last post sent: 0 0 24  
11               last post sent-location: ksasnd  
12               last process posted by me: 51a2a904 1 6  
13     (latch info) wait_event=0 bits=20  
14         Location from where call was made: kghupr1: Chunk Header  
15         Context saved from call: 1306572176  
16       waiting for 200999bc Child shared pool level=7 child#=1  
17         Location from where latch is held: kghupr1: Chunk Header  
18         Context saved from call: 1299065692  
19         state=busy, wlstate=free  
20           waiters [orapid (seconds since: put on list, posted, alive check)]:  
21            10 (54, 1304393092, 54)  
22            11 (54, 1304393092, 54)  
23            24 (15, 1304393092, 15)  
24            15 (6, 1304393092, 6)  
25            waiter count=4  
26           gotten 35138 times wait, failed first 502 sleeps 5  
27           gotten 0 times nowait, failed: 0  
28       on wait list for 200999bc  
29       holding    (efd=3) 4f7c75f8 Child library cache level=5 child#=1  
30         Location from where latch is held: kgllkdl: child: no lock handle: latch  
31         Context saved from call: 2  
32         state=busy, wlstate=free

从waiting for 200999bc Child shared pool level=7 child#=1和holding (efd=3) 4f7c75f8 Child library cache level=5 child#=1可以看到，进程的确是在持有了library cache latch的同时去请求 shared pool latch。

最后使用“oradebug poke 0×200999BC 4 0”命令将shared pool latch置为free状态。

方法二：使用gdb。
1. 还原”_kghdsidx_count”参数，重启数据库。
2. 使用sqlplus连接到数据库，获取对应的Oracle Process的spid为4868。
3. 使用gdb：

1 [oracle@xty ~]$ gdb -p 4868
2 GNU gdb (GDB) Red Hat Enterprise Linux (7.0.1-23.el5)  
3 Copyright (C) 2009 Free Software Foundation, Inc.  
4 ...省略部分输出...  
5 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done.  
6 Loaded symbols for /lib/libnss_files.so.2  
7 0x0059b402 in __kernel_vsyscall ()

4. 在gdb中，在latch的获取和释放的函数上设置断点：

注：最后几个函数只是猜测跟latch有关....latch的请求和释放所调用的函数主要为：kslgetl、kslgetsl和kslfre。

 1 (gdb) b kslgpl 
 2 Breakpoint 1 at 0x8322f53
 3 (gdb) b kslgetsl
 4 Breakpoint 2 at 0x8317351
 5 (gdb) b kslgetl
 6 Breakpoint 3 at 0x83170ba
 7 (gdb) b kslfre
 8 Breakpoint 4 at 0x8318d49
 9 (gdb) c
10 Continuing.

5. 在sqlplus中执行SQL：

SQL> select sysdate from dual;l

正如预期的那样，会话hang住。这是因为运行到了gdb设置的断点。

6.在gdb中可以看到：

Breakpoint 2, 0x08317351 in kslgetsl ()
(gdb) info f
Stack level 0, frame at 0xbf8386b0:
 eip = 0x8317351 in kslgetsl; saved eip 0x83f5e10
 called by frame at 0xbf8386ec
 Arglist at 0xbf8386a8, args: 
 Locals at 0xbf8386a8, Previous frame's sp is 0xbf8386b0
 Saved registers:
  ebp at 0xbf8386a8, eip at 0xbf8386ac
(gdb) c
Continuing.

Breakpoint 4, 0x08318d49 in kslfre ()
(gdb) info f
Stack level 0, frame at 0xbf8386c0:
 eip = 0x8318d49 in kslfre; saved eip 0x83f5e49
 called by frame at 0xbf8386ec
 Arglist at 0xbf8386b8, args: 
 Locals at 0xbf8386b8, Previous frame's sp is 0xbf8386c0
 Saved registers:
  ebp at 0xbf8386b8, eip at 0xbf8386bc
(gdb) x /10x 0xbf8386b8
0xbf8386b8:     0xbf8386e4      0x083f5e49      0x2fb2c108      0xbf8386e4
0xbf8386c8:     0x2fafb5a0      0x0cbd2dfc      0x0cbd2d00      0x00000002
0xbf8386d8:     0x00000000      0x2f7abf68
(gdb) c
Continuing.
----下面很重要-----  
Breakpoint 3, 0x083170ba in kslgetl ()
(gdb) info f
Stack level 0, frame at 0xbf837acc:
 eip = 0x83170ba in kslgetl; saved eip 0x8329508
 called by frame at 0xbf837af4
 Arglist at 0xbf837ac4, args: 
 Locals at 0xbf837ac4, Previous frame's sp is 0xbf837acc
 Saved registers:
  ebp at 0xbf837ac4, eip at 0xbf837ac8
(gdb) x /10x 0xbf837ac4
0xbf837ac4:     0xbf837aec      0x08329508      0x2e3c3a74      0x00000001
0xbf837ad4:     0x00000000      0x000009f8      0x00000000      0x00000001
0xbf837ae4:     0x0cb84378      0x00dcf346
(gdb) c
Continuing.

Breakpoint 4, 0x08318d49 in kslfre ()
(gdb) info f
Stack level 0, frame at 0xbf837b74:
 eip = 0x8318d49 in kslfre; saved eip 0x8329627
 called by frame at 0xbf837b8c
 Arglist at 0xbf837b6c, args: 
 Locals at 0xbf837b6c, Previous frame's sp is 0xbf837b74
 Saved registers:
  ebp at 0xbf837b6c, eip at 0xbf837b70
(gdb) x /10x 0xbf837b6c
0xbf837b6c:     0xbf837b84      0x08329627      0x2e3c3a74      0x0cb84378
0xbf837b7c:     0x0001ffff      0x0832954c      0xbf837d1c      0x0ae96833
0xbf837b8c:     0x0cbd2d00      0x2e3c3a74
(gdb) c
Continuing.

Breakpoint 3, 0x083170ba in kslgetl ()
(gdb) info f
Stack level 0, frame at 0xbf837acc:
 eip = 0x83170ba in kslgetl; saved eip 0x8329508
 called by frame at 0xbf837af4
 Arglist at 0xbf837ac4, args: 
 Locals at 0xbf837ac4, Previous frame's sp is 0xbf837acc
 Saved registers:
  ebp at 0xbf837ac4, eip at 0xbf837ac8
(gdb) x /10x 0xbf837ac4
0xbf837ac4:     0xbf837aec      0x08329508      0x2e3c3a74      0x00000001
0xbf837ad4:     0x00000000      0x000009f8      0x00000000      0x00000001
0xbf837ae4:     0x0cb84378      0x00dcf346
(gdb) c
Continuing.

Breakpoint 3, 0x083170ba in kslgetl ()
(gdb) info f
Stack level 0, frame at 0xbf837968:
 eip = 0x83170ba in kslgetl; saved eip 0x8329508
 called by frame at 0xbf837990
 Arglist at 0xbf837960, args: 
 Locals at 0xbf837960, Previous frame's sp is 0xbf837968
 Saved registers:
  ebp at 0xbf837960, eip at 0xbf837964
(gdb) x /10x 0xbf837960
0xbf837960:     0xbf837988      0x08329508      0x20095ad0      0x00000001
0xbf837970:     0x00000000      0x0000097f      0x0bfaf900      0x00000158
0xbf837980:     0x0cb84378      0xbf8379d4
(gdb) c
Continuing.

Breakpoint 4, 0x08318d49 in kslfre ()
(gdb) info f
Stack level 0, frame at 0xbf837984:
 eip = 0x8318d49 in kslfre; saved eip 0x8329627
 called by frame at 0xbf83799c
 Arglist at 0xbf83797c, args: 
 Locals at 0xbf83797c, Previous frame's sp is 0xbf837984
 Saved registers:
  ebp at 0xbf83797c, eip at 0xbf837980
(gdb) x /10x 0xbf83797c
0xbf83797c:     0xbf837994      0x08329627      0x20095ad0      0x0cb84378
0xbf83798c:     0x00000000      0x2001af7c      0xbf837a20      0x0abfc335
0xbf83799c:     0x0cbd2d00      0x20095ad0

(gdb) c
Continuing.

Breakpoint 4, 0x08318d49 in kslfre ()
(gdb) info f
Stack level 0, frame at 0xbf837b74:
 eip = 0x8318d49 in kslfre; saved eip 0x8329627
 called by frame at 0xbf837b8c
 Arglist at 0xbf837b6c, args: 
 Locals at 0xbf837b6c, Previous frame's sp is 0xbf837b74
 Saved registers:
  ebp at 0xbf837b6c, eip at 0xbf837b70
(gdb) x /10x 0xbf837b6c
0xbf837b6c:     0xbf837b84      0x08329627      0x2e3c3a74      0x0cb84378
0xbf837b7c:     0xbf837e5c      0x2bd8d494      0xbf837d1c      0x0ae96c1b
0xbf837b8c:     0x0cbd2d00      0x2e3c3a74
(gdb) c
Continuing.

同时可以从数据库中查到：

SQL> select addr,latch#,child#,level#,name,gets from v$latch_children  
  2  where name in ('shared pool','library cache');

ADDR         LATCH#     CHILD#     LEVEL# NAME                                           GETS
-------- ---------- ---------- ---------- ---------------------------------------- ----------
20095AD0        213          1          7 shared pool                                  403174
20095B34        213          2          7 shared pool                                      18
20095B98        213          3          7 shared pool                                      18
20095BFC        213          4          7 shared pool                                      18
20095C60        213          5          7 shared pool                                      18
20095CC4        213          6          7 shared pool                                      18
20095D28        213          7          7 shared pool                                      18
2E3C3A74        214          1          5 library cache                                507555

标记为”下面很重要“的输出，汇总起来就是：

kslgetl  0x2e3c3a74  library cache  
kslfre    0x2e3c3a74  library cache  
kslgetl  0x2e3c3a74    library cache  
kslgetl  0x20095ad0  shared pool  
kslfre    0x20095ad0  shared pool  
kslfre    0x2e3c3a74    library cache

这个过程可以理解为：Oracle进程首先获取library cache latch，在library cache中查找是否已经有要执行的SQL语句（这里显然没有找到），然后释放library cache。接下来是硬解析SQL的后半部分，首先获取library cache latch，然后获取shared pool latch，从shared pool中申请内存，然后释放shared pool latch，最后再获取library cache latch。

后面不再继续跟踪解析的过程，这个过程会有相当多的latch的获取和释放，但是从上面可以很清晰地看到，进程在持有了library cache latch的同时去请求shared pool latch。

相关阅读:
zhanwei
站位
 Web前端开发：SQL Jsp小项目（二）------添加修改
 Web前端开发：SQL Jsp小项目（一）
JDBC（用Eclipse操作数据库Oracle）的基础操作集合
 Thread多线程stopSleep顺序问题
 IO流详解之代码详解
 IO流详解（半教材类型）
用JQuery编写textarea,input,checkbox,select
Java编写ArrayBasic制作一个简单的酒店管理系统
原文地址：https://www.cnblogs.com/taowang2016/p/2959258.html