如何用oradebug dump系统状态
如果系统hang住,可用oradebug,就可方便、安全的查看系统的此刻的状态,同进如果sqlplus 不能连入系统,则可用操作系统的命令来dump systemstate。
后面会有linux下用gdb来作为dubugger的一些扩展的例子文章,供大家来参考。
如果可以用sqlplus 登录,则可用oradebug来dump system state
oradebug setmypid
oradebug dump systemstate 10,得到trace文件。
运行如下命令来查看waiting信息(这里模拟的是cache buffers lru chain等待,至于如何模拟,可参看《cache buffer中存在缓存时,update同样需要持有cache buffers lru chain》)。
1 nl wilson_ora_32374.trc | grep 'waiting' | grep -v 'rdbms ipc' | grep -v 'timer' | grep -v 'message' 2 3 4 3823 waiting for 2ed83244 Child cache buffers lru chain level=2 child#=3 5 3898 waiting for 'latch: cache buffers lru chain' blocking sess=0x(nil) seq=197 wait_time=0 seconds since wait started=12 6 5648 waiting for 'row cache lock' blocking sess=0x(nil) seq=871 wait_time=0 seconds since wait started=6 7 9561 waiting for 2ed83244 Child cache buffers lru chain level=2 child#=3 8 9595 waiting for 'latch: cache buffers lru chain' blocking sess=0x(nil) seq=1 wait_time=0 seconds since wait started=66 9 10224 waiting for 'Streams AQ: qmn coordinator idle wait' blocking sess=0x(nil) seq=10 wait_time=0 seconds since wait started=4145 10 10367 waiting for 'ksdxexeotherwait' blocking sess=0x(nil) seq=26 wait_time=0 seconds since wait started=0 11 10759 waiting for 2ed83244 Child cache buffers lru chain level=2 child#=3 12 10804 waiting for 'latch: cache buffers lru chain' blocking sess=0x(nil) seq=280 wait_time=0 seconds since wait started=93 13 13209 waiting for 2ed83244 Child cache buffers lru chain level=2 child#=3 14 13245 waiting for 'latch: cache buffers lru chain' blocking sess=0x(nil) seq=1 wait_time=0 seconds since wait started=66 15 13654 waiting for 'library cache load lock' blocking sess=0x(nil) seq=1 wait_time=0 seconds since wait started=27 16 13803 waiting for 'Streams AQ: waiting for time management or cleanup tasks' blocking sess=0x(nil) seq=99 wait_time=0 seconds since wait started=4147 17 13806 for 'Streams AQ: waiting for time management or cleanup tasks' count=1 wait_time=3412791267 18 13808 for 'Streams AQ: waiting for time management or cleanup tasks' count=1 wait_time=0 19 15212 waiting for 'Streams AQ: qmn slave idle wait' blocking sess=0x(nil) seq=1 wait_time=0 seconds since wait started=4150
分别查看3823、9561、10759、13209这几行,查看waiting for的信息
1 PROCESS 10: 2 ---------------------------------------- 3 SO: 0x2fa195d0, type: 2, owner: (nil), flag: INIT/-/-/0x00 4 (process) Oracle pid=10, calls cur/top: 0x2d7b5220/0x2fb27bbc, flag: (2) SYSTEM 5 int error: 0, call error: 0, sess error: 0, txn error 0 6 (post info) last post received: 0 0 48 7 last post received-location: ksoreq_reply 8 last process to post me: 2fa1dff4 4 0 9 last post sent: 0 0 24 10 last post sent-location: ksasnd 11 last process posted by me: 2fa16de4 1 6 12 (latch info) wait_event=0 bits=2 13 Location from where call was made: kcbzgws_1: 14 waiting for 2ed83244 Child cache buffers lru chain level=2 child#=3 15 Location from where latch is held: kcbzib: 16 Context saved from call: 0 17 state=busy, wlstate=free 18 waiters [orapid (seconds since: put on list, posted, alive check)]: 19 21 (93, 1353877293, 93) 20 18 (66, 1353877293, 66) 21 22 (66, 1353877293, 66) 22 10 (12, 1353877293, 12) 23 waiter count=4 24 gotten 38981 times wait, failed first 17 sleeps 21 25 gotten 30522 times nowait, failed: 0 26 on wait list for 2ed83244 27 holding (efd=30) 2ecd0034 Child cache buffers chains level=1 child#=101 28 Location from where latch is held: kcbgtcr: fast path: 29 Context saved from call: 4197194 30 state=busy(exclusive) (val=0x2000000a) holder orapid = 10 31 Process Group: DEFAULT, pseudo proc: 0x2fa4cd24 32 O/S info: user: oracle, term: UNKNOWN, ospid: 3364 33 OSD pid info: Unix process pid: 3364, image: oracle@node01 (CJQ0)
1 PROCESS 18: 2 ---------------------------------------- 3 SO: 0x2fa1c370, type: 2, owner: (nil), flag: INIT/-/-/0x00 4 (process) Oracle pid=18, calls cur/top: 0x2d7b4c40/0x2fb2905c, flag: (2) SYSTEM 5 int error: 0, call error: 0, sess error: 0, txn error 0 6 (post info) last post received: 0 0 0 7 last post received-location: No post 8 last process to post me: none 9 last post sent: 0 0 48 10 last post sent-location: ksoreq_reply 11 last process posted by me: 2fa19b84 1 2 12 (latch info) wait_event=0 bits=2 13 Location from where call was made: kcbzgws_1: 14 waiting for 2ed83244 Child cache buffers lru chain level=2 child#=3 15 Location from where latch is held: kcbzib: 16 Context saved from call: 0 17 state=busy, wlstate=free 18 waiters [orapid (seconds since: put on list, posted, alive check)]: 19 21 (93, 1353877293, 93) 20 18 (66, 1353877293, 66) 21 22 (66, 1353877293, 66) 22 10 (12, 1353877293, 12) 23 waiter count=4 24 gotten 38981 times wait, failed first 17 sleeps 21 25 gotten 30522 times nowait, failed: 0 26 on wait list for 2ed83244 27 holding (efd=25) 2eccb828 Child cache buffers chains level=1 child#=72 28 Location from where latch is held: kcbgtcr: fast path: 29 Context saved from call: 4221057 30 state=busy(exclusive) (val=0x20000012) holder orapid = 18 31 Process Group: DEFAULT, pseudo proc: 0x2fa4cd24 32 O/S info: user: oracle, term: UNKNOWN, ospid: 32381 33 OSD pid info: Unix process pid: 32381, image: oracle@node01 (m000)
1 PROCESS 21: 2 ---------------------------------------- 3 SO: 0x2fa1d48c, type: 2, owner: (nil), flag: INIT/-/-/0x00 4 (process) Oracle pid=21, calls cur/top: 0x2fb29f2c/0x2fb29f2c, flag: (0) - 5 int error: 0, call error: 0, sess error: 0, txn error 0 6 (post info) last post received: 0 0 0 7 last post received-location: No post 8 last process to post me: none 9 last post sent: 0 0 0 10 last post sent-location: No post 11 last process posted by me: none 12 (latch info) wait_event=0 bits=2 13 Location from where call was made: kcbzgb: posted for free bufs: 14 waiting for 2ed83244 Child cache buffers lru chain level=2 child#=3 15 Location from where latch is held: kcbzib: 16 Context saved from call: 0 17 state=busy, wlstate=free 18 waiters [orapid (seconds since: put on list, posted, alive check)]: 19 21 (93, 1353877293, 93) 20 18 (66, 1353877293, 66) 21 22 (66, 1353877293, 66) 22 10 (12, 1353877293, 12) 23 waiter count=4 24 gotten 38981 times wait, failed first 17 sleeps 21 25 gotten 30522 times nowait, failed: 0 26 on wait list for 2ed83244 27 holding (efd=8) 2ece5988 Child cache buffers chains level=1 child#=240 28 Location from where latch is held: kcbgcur: kslbegin: 29 Context saved from call: 25166114 30 state=busy(exclusive) (val=0x20000015) holder orapid = 21 31 Process Group: DEFAULT, pseudo proc: 0x2fa4cd24 32 O/S info: user: oracle, term: pts/3, ospid: 32374 33 OSD pid info: Unix process pid: 32374, image: oracle@node01 (TNS V1-V3)
1 PROCESS 22: 2 ---------------------------------------- 3 SO: 0x2fa1da40, type: 2, owner: (nil), flag: INIT/-/-/0x00 4 (process) Oracle pid=22, calls cur/top: 0x2fb2a21c/0x2d7b4db8, flag: (0) - 5 int error: 0, call error: 0, sess error: 0, txn error 0 6 (post info) last post received: 0 0 0 7 last post received-location: No post 8 last process to post me: none 9 last post sent: 0 0 48 10 last post sent-location: ksoreq_reply 11 last process posted by me: 2fa195d0 1 2 12 (latch info) wait_event=0 bits=2 13 Location from where call was made: kcbzgws_1: 14 waiting for 2ed83244 Child cache buffers lru chain level=2 child#=3 15 Location from where latch is held: kcbzib: 16 Context saved from call: 0 17 state=busy, wlstate=free 18 waiters [orapid (seconds since: put on list, posted, alive check)]: 19 21 (93, 1353877293, 93) 20 18 (66, 1353877293, 66) 21 22 (66, 1353877293, 66) 22 10 (12, 1353877293, 12) 23 waiter count=4 24 gotten 38981 times wait, failed first 17 sleeps 21 25 gotten 30522 times nowait, failed: 0 26 on wait list for 2ed83244 27 holding (efd=14) 2ecc3bf0 Child cache buffers chains level=1 child#=22 28 Location from where latch is held: kcbgtcr: fast path: 29 Context saved from call: 4205666 30 state=busy(exclusive) (val=0x20000016) holder orapid = 22 31 Process Group: DEFAULT, pseudo proc: 0x2fa4cd24 32 O/S info: user: oracle, term: UNKNOWN, ospid: 32383 33 OSD pid info: Unix process pid: 32383, image: oracle@node01 (J000) 34 Dump of memory from 0x2FA09AC8 to 0x2FA09C4C
从上面的4个process中基本上看出是在等待地址为2ed83244的cache buffers lru chain的child latch,
而从每个进程最后的image部分可以看出哪个是用户进程,即image: oracle@node01 (TNS V1-V3)的进程。