• 由ORACLE RAC心跳异常引起的生产库故障


    一、问题描述

    环境描述:

    节点siddb_namesoftware_version备注
    172.16.2.22 hdls1 HDLS 11.2.0.4 rac节点
    172.16.2.23 hdls2 HDLS 11.2.0.4 rac 节点

    事件原因:

    两个节点的心跳网络异常,导致RAC脑裂,中断了节点运行的oracle实列进程,数据库服务宕掉。

    二、过程

    2.1 时间:16:45报障处理

    检查发现两台oracle实例进程中止,无法正常连接。

    2.2 时间:17:25恢复23节点

    恢复23节点,保证业务作业可正常进行,排查22节点故障。等待作业完成处理。

    • 重启22节点后,23节点的数据服务恢复正常

    reboot -f
    • 检查23节点的数据库服务状态

    crs_stat -t 

     

    2.3 对节点22进行分析

    1、EVMD日志

    2022-09-06 22:37:17.970: [GIPCHTHR][3844073216]gipchaWorkerCreateInterface: created remote interface for node 'hdls02', haName 'fe0a-b4a2-f838-ac00', inf 'udp://11.0.0.23:19879'
    2022-09-06 22:37:17.970: [GIPCHGEN][3844073216]gipchaWorkerAttachInterface: Interface attached inf 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 0, flags 0x6 }
    2022-09-06 22:37:17.970: [GIPCXCPT][3844073216]gipchaLowerRecv: message from unrecognized node 'udp://11.0.0.23:19879', hdr 0x7f21c002bf68 { len 80, seq 0, type gipchaHdrTypeAck (3), lastSeq 1, lastAck 0, minAck 2, flags 0x1, srcLuid 24d64699-7050de6f, dstLuid 6678805d-500d8712, msgId 1 }, ret gipcretFail (1)
    2022-09-06 22:37:17.970: [GIPCHALO][3844073216]gipchaLowerCallback: EXCEPTION[ ret gipcretFail (1) ] error while processing req 0x7f21e51fbe60 { type gipcreqtypeRecv, endp 0000000000001950, ret gipcretSuccess, local 'udp://11.0.0.22:18417', peer 'udp://11.0.0.23:19879', buf 0x7f21c002bf68, len 10240, olen 80 }, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }
    2022-09-06 22:37:17.971: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:17.971: [GIPCHALO][3844073216]gipchaLowerSend: deffering startup of hdr 0x7f21c001f2d8 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAck 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 00000000-00000000 numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [1 : 1], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x4 }
    2022-09-06 22:37:17.981: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:17.991: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.001: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.011: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.021: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.031: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.035: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.045: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.055: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.060: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.070: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.075: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.079: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.087: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.097: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.107: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.117: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.127: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    ]No connection to peer(2, 18699304) will retry send msgid=0. rc=9

    2022-09-06 22:37:18.972: [GIPCHALO][3844073216]gipchaLowerProcessAcks: ESTABLISH finished for node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 2, lastAck 2, lastValidAck 0, sendSeq [1 : 1], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x20c }
    2022-09-06 22:37:18.972: [GIPCHALO][3844073216]gipchaLowerProcessWaitQ: triggering deffered startup of msg 0x7f21c001f2d8 { len 232, seq 0, type gipchaHdrTypeSend (1), lastSeq 0, lastAck 0, minAck 0, flags 0x0, srcLuid 00000000-00000000, dstLuid 00000000-00000000, msgId 0 }, node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 2, lastAck 2, lastValidAck 0, sendSeq [2 : 2], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x208 }
    2022-09-06 22:37:18.973: [GIPCXCPT][3844073216]gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'hdls01', port '13b5-9956-d0b9-0552', hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, ret gipcretKeyNotFound (36)
    2022-09-06 22:37:18.973: [GIPCHGEN][3844073216]gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 815]: EXCEPTION[ ret gipcretKeyNotFound (36) ] failed to resolve ctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, host 'hdls01', port '13b5-9956-d0b9-0552', flags 0x0
    2022-09-06 22:37:18.973: [ CRSCCL][3835287296]No connection to peer(2, 18699304) will retry send msgid=0. rc=9
    2022-09-06 22:37:18.973: [GIPCXCPT][3844073216]gipchaInternalResolve: failed to resolve ret gipcretKeyNotFound (36), host 'hdls01', port '0277-7bff-f7af-8073', hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, ret gipcretKeyNotFound (36)
    2022-09-06 22:37:18.973: [GIPCHGEN][3844073216]gipchaResolveF [gipcmodGipcResolve : gipcmodGipc.c : 815]: EXCEPTION[ ret gipcretKeyNotFound (36) ] failed to resolve ctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }, host 'hdls01', port '0277-7bff-f7af-8073', flags 0x0
    2022-09-06 22:37:18.973: [ CRSCCL][3837388544]clsCclNewConn: added new conn to tempConList: newPeerCon = bc007ba0
    2022-09-06 22:37:18.973: [ CRSCCL][3837388544]PNC: Disconnecting conn from node (2,18699304).
    2022-09-06 22:37:18.973: [ CRSCCL][3837388544]PNC: Keeping our connection to node (2,18699304).
    2022-09-06 22:37:18.973: [GIPCHAUP][3844073216]gipchaUpperDisconnect: initiated discconnect umsg 0x7f21c0010c20 { msg 0x7f21c002dc88, ret gipcretRequestPending (15), flags 0x2 }, msg 0x7f21c002dc88 { type gipchaMsgTypeDisconnect (5), srcCid 00000000-00001a62, dstCid 00000000-000005da }, endp 0x7f21c0016c00 [0000000000001a62] { gipchaEndpoint : port 'EVMDMAIN2_1/2b61-5a3a-b3b0-633e', peer 'hdls02:d60b-3fd7-4897-3466', srcCid 00000000-00001a62, dstCid 00000000-000005da, numSend 0, maxSend 100, groupListType 2, hagroup 0x21d35a0, usrFlags 0x4000, flags 0x21c }
    2022-09-06 22:37:18.973: [ CRSCCL][3837388544]ConnAccepted from Peer:msgTag= 0xcccccccc version= 0 msgType= 4 msgId= 0 msglen = 0 clschdr.size_clscmsgh= 88 src= (2, 18699304) dest= (1, 4294793640)
    2022-09-06 22:37:18.973: [GIPCXCPT][3844073216]gipchaUpperProcessDisconnect: dropping Disconnect to unknown msg 0x7f21c0036a68 { type gipchaMsgTypeDisconnect (5), srcCid 00000000-000005da, dstCid 00000000-00001a62 }, node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 7, lastAck 5, lastValidAck 6, sendSeq [6 : 6], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x208 }, ret gipcretFail (1)
    2022-09-06 22:37:18.973: [GIPCHAUP][3844073216]gipchaUpperProcessDisconnect: EXCEPTION[ ret gipcretFail (1) ] error during DISCONNECT processing for node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 7, lastAck 5, lastValidAck 6, sendSeq [6 : 6], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x208 }
    2022-09-06 22:37:18.973: [GIPCHAUP][3844073216]gipchaUpperCallbackDisconnect: completed DISCONNECT ret gipcretSuccess (0), umsg 0x7f21c0010c20 { msg 0x7f21c002dc88, ret gipcretSuccess (0), flags 0x2 }, msg 0x7f21c002dc88 { type gipchaMsgTypeDisconnect (5), srcCid 00000000-00001a62, dstCid 00000000-000005da }, hendp 0x7f21c0016c00 [0000000000001a62] { gipchaEndpoint : port 'EVMDMAIN2_1/2b61-5a3a-b3b0-633e', peer 'hdls02:d60b-3fd7-4897-3466', srcCid 00000000-00001a62, dstCid 00000000-000005da, numSend 0, maxSend 100, groupListType 2, hagroup 0x21d35a0, usrFlags 0x4000, flags 0x21c }
    2022-09-06 22:37:18.984: [   EVMD][3964278592] Authorization database built successfully.
    2022-09-06 22:37:19.042: [   CLSE][3964278592]clse_get_auth_loc: Returning default authloc: /oracle/grid/crs_1/auth/evm/hdls01
    2022-09-06 22:37:23.254: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: [network] failed send attempt endp 0x7f21c001ecd0 [0000000000001950] { gipcEndpoint : localAddr 'udp://11.0.0.22:18417', remoteAddr '', numPend 5, numReady 1, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7f21c001eae0, sendp 0x7f21c0011550flags 0x3, usrFlags 0x4000 }, req 0x7f21c0011250 [0000000000001c19] { gipcSendRequest : addr 'udp://11.0.0.23:19879', data 0x7f21b0013448, len 1384, olen 0, parentEndp 0x7f21c001ecd0, ret gipcretEndpointNotAvailable (40), objFlags 0x0, reqFlags 0x2 }
    2022-09-06 22:37:23.254: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: slos op  : sgipcnValidateSocket
    2022-09-06 22:37:23.255: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: slos dep : Invalid argument (22)
    2022-09-06 22:37:23.255: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: slos loc : address not
    2022-09-06 22:37:23.255: [ GIPCNET][3844073216]gipcmodNetworkProcessSend: slos info: addr '11.0.0.22:18417', len 1384, buf 0x7f21b0013448, cookie 0x7f21c0011250
    2022-09-06 22:37:23.255: [GIPCXCPT][3844073216]gipcInternalSendSync: failed sync request, ret gipcretEndpointNotAvailable (40)
    2022-09-06 22:37:23.255: [GIPCXCPT][3844073216]gipcSendSyncF [gipchaLowerInternalSend : gipchaLower.c : 846]: EXCEPTION[ ret gipcretEndpointNotAvailable (40) ] failed to send on endp 0x7f21c001ecd0 [0000000000001950] { gipcEndpoint : localAddr 'udp://11.0.0.22:18417', remoteAddr '', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7f21c001eae0, sendp 0x7f21c0011550flags 0x3, usrFlags 0x4000 }, addr 0x7f21c00183a0 [0000000000001a10] { gipcAddress : name 'udp://11.0.0.23:19879', objFlags 0x0, addrFlags 0x1 }, buf 0x7f21b0013448, len 1384, flags 0x0
    2022-09-06 22:37:23.255: [GIPCHGEN][3844073216]gipchaInterfaceFail: marking interface failing 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x6 }
    2022-09-06 22:37:23.255: [GIPCHALO][3844073216]gipchaLowerInternalSend: failed to initiate send on interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x86 }, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 1, usrFlags 0x0, flags 0x5 }
    2022-09-06 22:37:23.255: [GIPCHGEN][3844073216]gipchaInterfaceDisable: disabling interface 0x7f21c402b1b0 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:18417', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 1, idxBoot 0, flags 0x10d }
    2022-09-06 22:37:23.255: [GIPCHGEN][3844073216]gipchaInterfaceDisable: disabling interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x86 }
    2022-09-06 22:37:23.255: [GIPCHALO][3844073216]gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0xa6 }
    2022-09-06 22:37:23.255: [GIPCHGEN][3844073216]gipchaInterfaceReset: resetting interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c402b1b0, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0xa6 }
    2022-09-06 22:37:23.372: [GIPCHDEM][3844073216]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7f21c402b1b0 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:18417', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 0, idxBoot 0, flags 0x12d }
    2022-09-06 22:37:23.372: [GIPCHTHR][3844073216]gipchaWorkerCreateInterface: created remote interface for node 'hdls02', haName 'fe0a-b4a2-f838-ac00', inf 'udp://11.0.0.23:19879'
    2022-09-06 22:37:23.373: [GIPCXCPT][3841971968]gipchaDaemonProcessRecv: dropping unrecognized daemon request 17, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x5 }, ret gipcretFail (1)
    2022-09-06 22:37:23.373: [GIPCHDEM][3841971968]gipchaDaemonProcessRecv: EXCEPTION[ ret gipcretFail (1) ] exception processing requset type 17, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x5 }
    2022-09-06 22:37:27.377: [GIPCHDEM][3841971968]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x1 } to gipcd
    2022-09-06 22:37:28.916: [GIPCHALO][3844073216]gipchaLowerProcessNode: no valid interfaces found to node for 5660 ms, node 0x7f21c4013650 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', srcLuid 6678805d-500d8712, dstLuid 24d64699-7050de6f numInf 1, contigSeq 9, lastAck 20, lastValidAck 9, sendSeq [21 : 27], createTime 7114604, sentRegister 1, localMonitor 0, flags 0x8 }
    2022-09-06 22:37:32.598: [GIPCHDEM][3841971968]gipchaDaemonInfRequest: sent local interfaceRequest, hctx 0x21a9430 [0000000000000010] { gipchaContext : host 'hdls01', name '5e52-0b6f-5d73-b878', luid '6678805d-00000000', numNode 1, numInf 0, usrFlags 0x0, flags 0x1 } to gipcd
    2022-09-06 22:37:32.612: [GIPCHGEN][3841971968]gipchaNodeAddInterface: adding interface information for inf 0x7f21c4024d60 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 0, idxBoot 0, flags 0x1 }
    2022-09-06 22:37:33.409: [GIPCHTHR][3844073216]gipchaWorkerCreateInterface: created local interface for node 'hdls01', haName '5e52-0b6f-5d73-b878', inf 'udp://11.0.0.22:23405'
    2022-09-06 22:37:33.409: [GIPCHGEN][3844073216]gipchaWorkerAttachInterface: Interface attached inf 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x6 }
    2022-09-07 00:25:13.978: [GIPCHGEN][3841971968]gipchaInterfaceFail: marking interface failing 0x7f21c4024d60 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:23405', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 1, numFail 0, idxBoot 0, flags 0xd }
    2022-09-07 00:25:14.397: [GIPCHGEN][3844073216]gipchaInterfaceFail: marking interface failing 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x6 }
    2022-09-07 00:25:15.398: [GIPCHGEN][3844073216]gipchaInterfaceDisable: disabling interface 0x7f21c4024d60 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:23405', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 1, idxBoot 0, flags 0x18d }
    2022-09-07 00:25:15.398: [GIPCHGEN][3844073216]gipchaInterfaceDisable: disabling interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0x86 }
    2022-09-07 00:25:15.398: [GIPCHALO][3844073216]gipchaLowerCleanInterfaces: performing cleanup of disabled interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0xa6 }
    2022-09-07 00:25:15.398: [GIPCHGEN][3844073216]gipchaInterfaceReset: resetting interface 0x7f21c4021fa0 { host 'hdls02', haName 'fe0a-b4a2-f838-ac00', local 0x7f21c4024d60, ip '11.0.0.23:19879', subnet '11.0.0.0', mask '255.255.255.0', mac '', ifname '', numRef 0, numFail 0, idxBoot 4, flags 0xa6 }
    2022-09-07 00:25:16.399: [GIPCHDEM][3844073216]gipchaWorkerCleanInterface: performing cleanup of disabled interface 0x7f21c4024d60 { host '', haName '5e52-0b6f-5d73-b878', local (nil), ip '11.0.0.22:23405', subnet '11.0.0.0', mask '255.255.255.0', mac 'ec-c0-1b-08-5e-b6', ifname 'eno3', numRef 0, numFail 0, idxBoot 0, flags 0x1ad }

     

    image-20220907110330352

    通过上面的日志可以看出,两个节点之间心跳网络通信异常,不能各自获取对端节点的信息,导致oracle实例进程中止。

     

    2、系统日志

    image-20220907111743474

    通过上述日志可以看出eno3心跳网口状态一直在DOWN和UP之间循环,状态不稳定。

     

    2.4 时间:22:30出现23单节点监听挂起

    由于心跳网络故障,两节点无法正常通信,22:30,23节点实例中断,23:38,23节点数据库服务恢复。

    2.5 时间:00:30作业结束后,更换心跳6类线

    • 等业务作业运行结束后,对心跳线进行更换,更换心跳6类线。

    • 22节点尝试启动数据库服务,成功。

      srvctl start instance -d HDLS -i hdls1

    2.6 时间:00:30 数据库恢复正常

    • 监听状态

      [grid@hdls01 ~]$ srvctl status listener 
      Listener LISTENER is enabled
      Listener LISTENER is running on node(s): hdls01,hdls02
    • 数据库实例状态

    [grid@hdls01 ~]$ ps -ef |grep -Ei "ora_"
    oracle   13648     1  0 15:05 ?        00:00:00 ora_w002_hdls1
    oracle   20745     1  0 00:48 ?        00:00:08 ora_pmon_hdls1
    oracle   20747     1  0 00:48 ?        00:00:02 ora_psp0_hdls1
    oracle   20749     1  0 00:48 ?        00:01:46 ora_vktm_hdls1
    oracle   20754     1  0 00:48 ?        00:00:00 ora_gen0_hdls1
    oracle   20756     1  0 00:48 ?        00:00:08 ora_diag_hdls1
    oracle   20758     1  0 00:48 ?        00:00:03 ora_dbrm_hdls1
    oracle   20760     1  0 00:48 ?        00:00:01 ora_ping_hdls1
    oracle   20762     1  0 00:48 ?        00:00:00 ora_acms_hdls1
    oracle   20764     1  0 00:48 ?        00:02:48 ora_dia0_hdls1
    oracle   20766     1  0 00:48 ?        00:01:36 ora_lmon_hdls1
    oracle   20768     1  0 00:48 ?        00:00:17 ora_lmd0_hdls1
    oracle   20770     1  0 00:48 ?        00:01:17 ora_lms0_hdls1
    oracle   20774     1  0 00:48 ?        00:01:18 ora_lms1_hdls1
    oracle   20778     1  0 00:48 ?        00:01:14 ora_lms2_hdls1
    oracle   20782     1  0 00:48 ?        00:01:14 ora_lms3_hdls1
    oracle   20786     1  0 00:48 ?        00:01:14 ora_lms4_hdls1
    oracle   20790     1  0 00:48 ?        00:00:00 ora_rms0_hdls1
    oracle   20792     1  0 00:48 ?        00:00:01 ora_lmhb_hdls1
    oracle   20794     1  0 00:48 ?        00:00:14 ora_mman_hdls1
    oracle   20796     1  0 00:48 ?        00:00:03 ora_dbw0_hdls1
    oracle   20798     1  0 00:48 ?        00:00:03 ora_dbw1_hdls1
    oracle   20800     1  0 00:48 ?        00:00:03 ora_dbw2_hdls1
    oracle   20802     1  0 00:48 ?        00:00:03 ora_dbw3_hdls1
    oracle   20804     1  0 00:48 ?        00:00:03 ora_dbw4_hdls1
    oracle   20806     1  0 00:48 ?        00:00:03 ora_dbw5_hdls1
    oracle   20808     1  0 00:48 ?        00:00:03 ora_dbw6_hdls1
    oracle   20810     1  0 00:48 ?        00:00:03 ora_dbw7_hdls1
    oracle   20812     1  0 00:48 ?        00:00:03 ora_dbw8_hdls1
    oracle   20814     1  0 00:48 ?        00:00:03 ora_dbw9_hdls1
    oracle   20816     1  0 00:48 ?        00:00:03 ora_dbwa_hdls1
    oracle   20818     1  0 00:48 ?        00:00:03 ora_dbwb_hdls1
    oracle   20820     1  0 00:48 ?        00:01:30 ora_lgwr_hdls1
    oracle   20822     1  0 00:48 ?        00:00:34 ora_ckpt_hdls1
    oracle   20824     1  0 00:48 ?        00:00:15 ora_smon_hdls1
    oracle   20826     1  0 00:48 ?        00:00:00 ora_reco_hdls1
    oracle   20828     1  0 00:48 ?        00:00:00 ora_rbal_hdls1
    oracle   20830     1  0 00:48 ?        00:00:00 ora_asmb_hdls1
    oracle   20832     1  0 00:48 ?        00:00:45 ora_mmon_hdls1
    oracle   20834     1  0 00:48 ?        00:01:01 ora_mmnl_hdls1
    oracle   20838     1  0 00:48 ?        00:00:00 ora_d000_hdls1
    oracle   20840     1  0 00:48 ?        00:00:00 ora_mark_hdls1
    oracle   20842     1  0 00:48 ?        00:00:00 ora_s000_hdls1
    oracle   20899     1  0 00:48 ?        00:00:27 ora_lck0_hdls1
    oracle   20901     1  0 00:48 ?        00:00:01 ora_rsmn_hdls1
    oracle   20916     1  0 00:48 ?        00:00:06 ora_o000_hdls1
    oracle   21003     1  0 00:48 ?        00:00:00 ora_arc0_hdls1
    oracle   21005     1  0 00:48 ?        00:00:00 ora_arc1_hdls1
    oracle   21007     1  0 00:48 ?        00:00:01 ora_arc2_hdls1
    oracle   21009     1  0 00:48 ?        00:00:00 ora_arc3_hdls1
    oracle   21053     1  0 00:48 ?        00:00:06 ora_o001_hdls1
    oracle   21251     1  0 00:48 ?        00:00:25 ora_nsa2_hdls1
    oracle   21253     1  0 00:48 ?        00:00:06 ora_o002_hdls1
    oracle   21264     1  0 00:48 ?        00:00:00 ora_gtx0_hdls1
    oracle   21268     1  0 00:48 ?        00:00:01 ora_rcbg_hdls1
    oracle   21274     1  0 00:48 ?        00:00:00 ora_qmnc_hdls1
    oracle   21296     1  0 00:48 ?        00:00:00 ora_q000_hdls1
    oracle   21349     1  0 00:48 ?        00:00:04 ora_cjq0_hdls1
    oracle   22074     1  0 00:49 ?        00:00:00 ora_q002_hdls1
    oracle   26225     1  0 00:53 ?        00:00:00 ora_smco_hdls1
    oracle   54283     1  0 15:56 ?        00:00:00 ora_j000_hdls1
    oracle   71251     1  0 16:17 ?        00:00:00 ora_w001_hdls1
    oracle   72429     1  0 16:18 ?        00:00:00 ora_pz99_hdls1
    oracle   72500     1  0 16:18 ?        00:00:00 ora_j001_hdls1
    grid     74000 71958  0 16:19 pts/1    00:00:00 grep --color=auto -Ei ora_
    [grid@hdls01 ~]$

     

     

    2.7 处理后RAC状态检查

    • 检查rac集群服务

    [grid@hdls01 ~]$ crs_stat -t
    Name           Type           Target   State     Host        
    ------------------------------------------------------------
    ora.ARCHLOG.dg ora....up.type ONLINE   ONLINE   hdls01      
    ora.DATA.dg   ora....up.type ONLINE   ONLINE   hdls01      
    ora....ER.lsnr ora....er.type ONLINE   ONLINE   hdls01      
    ora....N1.lsnr ora....er.type ONLINE   ONLINE   hdls02      
    ora.OCRVT.dg   ora....up.type ONLINE   ONLINE   hdls01      
    ora.asm       ora.asm.type   ONLINE   ONLINE   hdls01      
    ora.cvu       ora.cvu.type   ONLINE   ONLINE   hdls02      
    ora.gsd       ora.gsd.type   OFFLINE   OFFLINE              
    ora.hdls.db   ora....se.type ONLINE   ONLINE   hdls01      
    ora....SM1.asm application   ONLINE   ONLINE   hdls01      
    ora....01.lsnr application   ONLINE   ONLINE   hdls01      
    ora.hdls01.gsd application   OFFLINE   OFFLINE              
    ora.hdls01.ons application   ONLINE   ONLINE   hdls01      
    ora.hdls01.vip ora....t1.type ONLINE   ONLINE   hdls01      
    ora....SM2.asm application   ONLINE   ONLINE   hdls02      
    ora....02.lsnr application   ONLINE   ONLINE   hdls02      
    ora.hdls02.gsd application   OFFLINE   OFFLINE              
    ora.hdls02.ons application   ONLINE   ONLINE   hdls02      
    ora.hdls02.vip ora....t1.type ONLINE   ONLINE   hdls02      
    ora....network ora....rk.type ONLINE   ONLINE   hdls01      
    ora.oc4j       ora.oc4j.type ONLINE   ONLINE   hdls01      
    ora.ons       ora.ons.type   ONLINE   ONLINE   hdls01      
    ora.scan1.vip ora....ip.type ONLINE   ONLINE   hdls02    
    • 检查数据库

    [grid@hdls01 ~]$ srvctl status listener 
    Listener LISTENER is enabled
    Listener LISTENER is running on node(s): hdls01,hdls02

     

    SQL> select name,status from v$datafile;

    NAME                                                                             STATUS
    -------------------------------------------------------------------------------- -------
    +DATA/hdls/datafile/system.306.1100288753                                        SYSTEM
    +DATA/hdls/datafile/sysaux.264.1100288753                                        ONLINE
    +DATA/hdls/datafile/undotbs1.263.1100288753                                      ONLINE
    +DATA/hdls/datafile/users.260.1100288753                                         ONLINE
    +DATA/hdls/datafile/undotbs2.277.1100288833                                      ONLINE
    +DATA/hdls/oauser01.dbf                                                          ONLINE
    +DATA/hdls/hdls2001.dbf                                                          ONLINE
    +DATA/hdls/oa01.dbf                                                              ONLINE
    +DATA/hdls/hdls01.dbf                                                            ONLINE
    +DATA/hdls/hdls02.dbf                                                            ONLINE
    +DATA/hdls/hdls03.dbf                                                            ONLINE
    +DATA/hdls/hdls04.dbf                                                            ONLINE
    +DATA/hdls/hdls05.dbf                                                            ONLINE
    +DATA/hdls/hdls06.dbf                                                            ONLINE
    +DATA/hdls/hdls07.dbf                                                            ONLINE
    +DATA/hdls/hdls08.dbf                                                            ONLINE
    +DATA/hdls/hdls09.dbf                                                            ONLINE
    +DATA/hdls/hdls10.dbf                                                            ONLINE
    +DATA/hdls/others01.dbf                                                          ONLINE
    +DATA/hdls/others02.dbf                                                          ONLINE
    +DATA/hdls/others03.dbf                                                          ONLINE
    +DATA/hdls/indx01.dbf                                                            ONLINE
    +DATA/hdls/indx02.dbf                                                            ONLINE
    +DATA/hdls/indx03.dbf                                                            ONLINE
    +DATA/hdls/indx04.dbf                                                            ONLINE
    +DATA/hdls/indx05.dbf                                                            ONLINE
    +DATA/hdls/indx06.dbf                                                            ONLINE
    +DATA/hdls/indx07.dbf                                                            ONLINE
    +DATA/hdls/hdls131701.dbf                                                        ONLINE
    +DATA/hdls/hdls131702.dbf                                                        ONLINE
    +DATA/hdls/hdls131703.dbf                                                        ONLINE
    +DATA/hdls/hdls131704.dbf                                                        ONLINE
    +DATA/hdls/hdls131705.dbf                                                        ONLINE
    +DATA/hdls/hdls131706.dbf                                                        ONLINE
    +DATA/hdls/cdc01.dbf                                                             ONLINE

    35 rows selected.

    SQL>

     

    三、小结

    1、节点之间连接心跳网络的网线有问题,导致心跳网络异常,RAC节点之间不能正常通信,脑裂,ORACLE的服务被中止。RAC集群为了保证一致性和完整性,在心跳网络异常的情况下,会发生脑裂,ORACLE实例会被强制中止。

    2、更换心跳6类线后,数据库恢复正常。

  • 相关阅读:
    Aruduino un0 spi oled官方代码
    排序--之快速排序
    用arduino UNO R3板为pro mini板烧录bootloaders
    数码管显示
    gdb高级功能与配置
    ROS中调试c++程序
    自引用结构--之创建双向遍历的链表
    数据文件——将从键盘设备文件读取文本将其写入显示器设备文件
    数据文件——将文本写入显示器设备文件
    ifcfg-eth0
  • 原文地址:https://www.cnblogs.com/lkj371/p/16826784.html
Copyright © 2020-2023  润新知