• 用oradebug short_stack及strace -p分析oracle进程是否dead或出现故障


    1,可以采用oradebug或者strace -p跟踪后台或前台进程是否dead或hang住
    2,如果进程出现故障,必会在对应的TRC文件写入最新信息,基于此可以获取非常重要的信息进一步分析与诊断
       日志文件在background_dump_dest
    3,采用 ll -lhrt *lgwr*|tail -10f  获取最新的进程的TRC文件
    4,而且出现故障时,多半会在ALERT日志记录相关信息,此是排除故障重要且首要的方法及思路
    5,oradebug setospid ospid
      oradebug short_stack
      会显示进程的堆栈信息,注意:可以间隔多次运行,如果多次显示的堆栈信息一致,可以肯定此进程肯定是dead或出现故障了

    6,可以用strace -p ospid跟踪分析,

    ---hang或故障时的类似信息如下
    semtimedop(9273344, 0x7fffe66199d0, 1, {1, 0}) = -1 EAGAIN (Resource temporarily unavailable)

    ---正常时的类似信息如下
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440015944
    semtimedop(9273344, 0x7fffe661b1f0, 1, {1, 800000000}) = -1 EAGAIN (Resource temporarily unavailable)
    getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
    getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016124
    semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
    getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
    getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016424
    semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
    getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
    getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440016725
    semtimedop(9273344, 0x7fffe661b1f0, 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
    getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
    getrusage(RUSAGE_SELF, {ru_utime={0, 123981}, ru_stime={0, 132979}, ...}) = 0
    times({tms_utime=12, tms_stime=13, tms_cutime=0, tms_cstime=0}) = 440017025
    open("/proc/4385/stat", O_RDONLY)       = 35
    read(35, "4385 (oracle) S 1 4385 4385 0 -1"..., 999) = 225

    说白了,就是看信息有没有变化,有变化就说明进程是正常的,否则就说明是不正常的

    测试


    SQL> select * from v$version where rownum=1;

    BANNER
    --------------------------------------------------------------------------------
    Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production

    查看后台进程
    SQL> select pid,spid,pname,username from v$process order by 1;

           PID SPID       PNAME      USERNAME
    ---------- ---------- ---------- ------------------------------
             1
             2 4385       PMON       oracle
             3 4387       VKTM       oracle
             4 4391       GEN0       oracle
             5 4393       DIAG       oracle
             6 4395       DBRM       oracle
             7 4397       PSP0       oracle
             8 4399       DIA0       oracle
             9 4401       MMAN       oracle
            10 4403       DBW0       oracle
            11 4405       LGWR       oracle


           PID SPID       PNAME      USERNAME
    ---------- ---------- ---------- ------------------------------
            12 4407       CKPT       oracle
            13 4409       SMON       oracle
            14 4411       RECO       oracle
            15 4413       MMON       oracle
            16 4415       MMNL       oracle
            17 4417       D000       oracle
            18 4419       S000       oracle
            19 4652       SMCO       oracle
            20 5266       W000       oracle
            21 4936                  oracle
            27 4468       ARC0       oracle
           PID SPID       PNAME      USERNAME
    ---------- ---------- ---------- ------------------------------
            28 4481       ARC1       oracle
            29 4486       ARC2       oracle
            30 4489       ARC3       oracle
            31 4496       QMNC       oracle
            32 4549       Q000       oracle
            33 4551       Q001       oracle
            34 4568                  oracle

    29 rows selected.

    SQL> 
    ---查看TRC文件目录
    [oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
    -rw-r----- 1 oracle oinstall  213 Dec 14 19:05 guowang_lgwr_5297.trm
    -rw-r----- 1 oracle oinstall 2.4K Dec 14 19:05 guowang_lgwr_5297.trc
    -rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
    -rw-r----- 1 oracle oinstall  27K Dec 15 01:05 guowang_lgwr_22295.trc
    -rw-r----- 1 oracle oinstall   63 Dec 15 02:18 guowang_lgwr_31280.trm
    -rw-r----- 1 oracle oinstall  903 Dec 15 02:18 guowang_lgwr_31280.trc
    -rw-r----- 1 oracle oinstall   63 Dec 15 02:44 guowang_lgwr_32077.trm
    -rw-r----- 1 oracle oinstall  906 Dec 15 02:44 guowang_lgwr_32077.trc
    -rw-r----- 1 oracle oinstall   62 Dec 15 03:27 guowang_lgwr_1032.trm
    -rw-r----- 1 oracle oinstall  887 Dec 15 03:27 guowang_lgwr_1032.trc


    ---HANG LGWR
    SQL> oradebug setospid 4405
    Oracle pid: 11, Unix process pid: 4405, image: oracle@seconary (LGWR)
    SQL> oradebug suspend
    Statement processed.

    --ALERT同步记录上述信息
    Tue Dec 15 04:46:15 2015
    Unix process pid: 4405, image: oracle@seconary (LGWR) flash frozen [ command #1 ]

    ---TRC目录同步记录上述信息
    [oracle@seconary trace]$ ll -lhrt *lgwr*|tail -10f
    -rw-r----- 1 oracle oinstall 2.3K Dec 15 01:05 guowang_lgwr_22295.trm
    -rw-r----- 1 oracle oinstall  27K Dec 15 01:05 guowang_lgwr_22295.trc
    -rw-r----- 1 oracle oinstall   63 Dec 15 02:18 guowang_lgwr_31280.trm
    -rw-r----- 1 oracle oinstall  903 Dec 15 02:18 guowang_lgwr_31280.trc
    -rw-r----- 1 oracle oinstall   63 Dec 15 02:44 guowang_lgwr_32077.trm
    -rw-r----- 1 oracle oinstall  906 Dec 15 02:44 guowang_lgwr_32077.trc
    -rw-r----- 1 oracle oinstall   62 Dec 15 03:27 guowang_lgwr_1032.trm
    -rw-r----- 1 oracle oinstall  887 Dec 15 03:27 guowang_lgwr_1032.trc
    -rw-r----- 1 oracle oinstall   63 Dec 15 04:46 guowang_lgwr_4405.trm
    -rw-r----- 1 oracle oinstall  896 Dec 15 04:46 guowang_lgwr_4405.trc
    [oracle@seconary trace]$ 

  • 相关阅读:
    使用C#代码审批/转签K2 Blackpearl流程
    K2 K2Blackpearl安装步骤详解(服务端)
    部署K2 Blackpearl流程时出错(由于目标计算机积极拒绝,无法连接)
    部署K2 Blackpearl流程时出错(与基础事务管理器的通信失败或Communication with the underlying transaction manager has failed.
    解析AFNetWorking 网络框架(二)
    python学习——练习题(5)
    python学习——练习题(4)
    python学习——练习题(3)
    python学习——练习题(2)
    python学习——练习题(1)
  • 原文地址:https://www.cnblogs.com/andy6/p/7502125.html
Copyright © 2020-2023  润新知