• 操作系统crash分析grid集群重启原因


    默认情况下linux是不能分析core文件需要安装内核调试和crash分析工具

    从以下网址安装内核调试rpm和 crash
    https://oss.oracle.com/ol7/debuginfo/
    kernel-uek-debuginfo-4.14.35-1902.3.2.el7uek.x86_64.rpm
    kernel-uek-debuginfo-common-4.14.35-1902.3.2.el7uek.x86_64.rpm
    yum install crash

    安装完毕后检查

    [root@ht02 ~]# rpm -qa|grep kernel-uek-debuginfo
    kernel-uek-debuginfo-common-4.14.35-1902.3.2.el7uek.x86_64
    kernel-uek-debuginfo-4.14.35-1902.3.2.el7uek.x86_64
    [root@ht02 ~]# uname -r
    4.14.35-1902.3.2.el7uek.x86_64
    [root@ht02 ~]# rpm -qa|grep crash
    crash-7.2.3-10.el7.x86_64

    19c设置cssd、cssdmoniter属性当grid驱逐或者crash时,操作系统生成core文件

    开启crash dump

    /u01/app/grid/bin/crsctl modify type ora.cssd.type -attr "ATTRIBUTE=REBOOT_OPTS, TYPE=string, DEFAULT_VALUE=,FLAGS=CONFIG" -init
    /u01/app/grid/bin/crsctl modify type ora.cssdmonitor.type -attr "ATTRIBUTE=REBOOT_OPTS,TYPE=string, DEFAULT_VALUE=,FLAGS=CONFIG" -init
    /u01/app/grid/bin/crsctl modify res ora.cssd -attr "REBOOT_OPTS=CRASHDUMP" -init
    /u01/app/grid/bin/crsctl modify res ora.cssdmonitor -attr "REBOOT_OPTS=CRASHDUMP" -init


    关闭 crash dump

    /u01/app/grid/bin/crsctl modify res ora.cssd -attr "REBOOT_OPTS=" -init
    /u01/app/grid/bin/crsctl modify res ora.cssdmonitor -attr "REBOOT_OPTS=" -init

    11g开启crash dump 参考mosPre-11.2: Using Diagwait as a diagnostic to get more information for diagnosing Oracle Clusterware Node evictions (Doc ID 559365.1)
    [+ASM1]@ht01[/home/grid]$crsctl get css diagwait
    CRS-4678: Successful get diagwait 0 for Cluster Synchronization Services.
    [root@ht01 ~]# /u01/app/grid/bin/crsctl set css diagwait 13
    CRS-4684: Successful set of parameter diagwait to 13 for Cluster Synchronization Services.
    [+ASM1]@ht01[/home/grid]$crsctl get css diagwait
    CRS-4678: Successful get diagwait 13 for Cluster Synchronization Services
    11g关闭 crash dump
    crsctl unset css diagwait -force

    kill ocssd.bin 进程   cssdmonitor导致操作系统自动重启

    [root@ht02 ~]# crash /lib/debug/lib/modules/4.14.35-1902.3.2.el7uek.x86_64/vmlinux /var/crash/127.0.0.1-2022-06-23-05:27:58/vmcore

    crash 7.2.3-10.el7
    Copyright (C) 2002-2017 Red Hat, Inc.
    Copyright (C) 2004, 2005, 2006, 2010 IBM Corporation
    Copyright (C) 1999-2006 Hewlett-Packard Co
    Copyright (C) 2005, 2006, 2011, 2012 Fujitsu Limited
    Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
    Copyright (C) 2005, 2011 NEC Corporation
    Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
    Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
    This program is free software, covered by the GNU General Public License,
    and you are welcome to change it and/or distribute copies of it under
    certain conditions. Enter "help copying" to see the conditions.
    This program has absolutely no warranty. Enter "help warranty" for details.

    GNU gdb (GDB) 7.6
    Copyright (C) 2013 Free Software Foundation, Inc.
    License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
    This is free software: you are free to change and redistribute it.
    There is NO WARRANTY, to the extent permitted by law. Type "show copying"
    and "show warranty" for details.
    This GDB was configured as "x86_64-unknown-linux-gnu"...

    WARNING: kernel relocated [752MB]: patching 90846 gdb minimal_symbol values

    please wait... (patching 90846 gdb minimal_symbol values)
    KERNEL: /lib/debug/lib/modules/4.14.35-1902.3.2.el7uek.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2022-06-23-05:27:58/vmcore [PARTIAL DUMP]
    CPUS: 4
    DATE: Thu Jun 23 17:27:50 2022
    UPTIME: 00:10:25
    LOAD AVERAGE: 1.51, 1.43, 0.84
    TASKS: 769
    NODENAME: ht02
    RELEASE: 4.14.35-1902.3.2.el7uek.x86_64
    VERSION: #2 SMP Tue Jul 30 03:59:02 GMT 2019
    MACHINE: x86_64 (3194 Mhz)
    MEMORY: 14.6 GB
    PANIC: "sysrq: SysRq : Trigger a crash"
    PID: 3405
    COMMAND: "cssdmonitor"
    TASK: ffff96f176ddaf80 [THREAD_INFO: ffff96f176ddaf80]
    CPU: 1
    STATE: TASK_RUNNING (SYSRQ)

     查看ohasd_orarootagent_root.trc

    2022-06-23 17:27:50.559 : CSSCLNT:3548346112: clsssRecvMsgA: got a disconnect from the server while waiting for message type 27
    2022-06-23 17:27:50.559 :GIPCXCPT:3548346112:  gipcInternalSend: connection not valid for send operation endp 0x7f78b40811b0 [00000000000006de] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=dd87820c-fc4df1b5-3382))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ht02_)(GIPCID=fc4df1b5-dd87820c-3433))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 3433, readyRef (nil), ready 0, wobj 0x7f78b406c760, sendp (nil) status 0flags 0x2003861e, flags-2 0x0, usrFlags 0x20010 }, ret gipcretConnectionLost (12)
    2022-06-23 17:27:50.559 :GIPCXCPT:3548346112:  gipcSendSyncF [clsssServerRPC_int : clsss.c : 8292]: EXCEPTION[ ret gipcretConnectionLost (12) ]  failed to send on endp 0x7f78b40811b0 [00000000000006de] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=dd87820c-fc4df1b5-3382))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_ht02_)(GIPCID=fc4df1b5-dd87820c-3433))', numPend 0, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 3433, readyRef (nil), ready 0, wobj 0x7f78b406c760, sendp (nil) status 0flags 0x2003861e, flags-2 0x0, usrFlags 0x20010 }, addr 0000000000000000, buf 0x7f78d37eb6f8, len 80, flags 0x8000000
    2022-06-23 17:27:50.559 : CSSCLNT:3548346112: clsssServerRPC: send failed with err 12, msg type 7
    
    2022-06-23 17:27:50.559 : CSSCLNT:3548346112: clsssCommonClientExit: RPC failure, rc 3
    
    2022-06-23 17:27:50.559 : USRTHRD:4038760192: [     INFO]  clsnpoll_BlockMsg: lost connection with CSS
    2022-06-23 17:27:50.559 : USRTHRD:4038760192: [     INFO]  clsnpoll_BlockMsg: calling sync
    Trace file /u01/app/11.2.0/grid/diag/crs/ht02/crs/trace/ohasd_cssdmonitor_root.trc
    Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production
    Version 19.3.0.0.0 Copyright 1996, 2019 Oracle. All rights reserved.
        CLSB:429202688: [     INFO] Argument count (argc) for this daemon is 1
        CLSB:429202688: [     INFO] Argument 0 is: /u01/app/grid/bin/cssdmonitor
    

      

  • 相关阅读:
    编译安装redis-3.2.9(latest stable version)
    MySQL之从忘记密码到重置密码
    Linux时间和时区设定
    java.net.UnknownHostException 异常处理(转)
    制作FastDFS的RPM包
    RPM包安装MySQL 5.7.18
    白鹭http请求post
    iframe嵌套页面 跨域
    git 配置 https和ssh 免密码登录 常用操作命令
    php 错误提示开启
  • 原文地址:https://www.cnblogs.com/omsql/p/16415216.html
Copyright © 2020-2023  润新知