I/O Errors in Alert log with ORA-29701, with "gipcWait failed with 16" in trace (文档 ID 1496329.1)
1. Database alert log
Fri May 04 10:56:59 2018 Errors in file /oracle/app/oracle/diag/rdbms/orcl/rocl1/trace/rocl1_ora_65536796.trc: ORA-01114: 将块写入文件 时出现 IO 错误 (块 # ) Fri May 04 10:57:00 2018
2. trace file
Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /oracle/app/oracle/product/11.2.0/db_1 System name: AIX Node name: rac1 Release: 1 Version: 7 Machine: 00F6E7C84C00 Instance name: rocl1 Redo thread mounted by this instance: 1 Oracle process number: 1540 Unix process pid: 13962128, image: oracle@rac1 *** 2018-05-04 10:56:58.840 *** SESSION ID:(292.52991) 2018-05-04 10:56:58.840 *** CLIENT ID:() 2018-05-04 10:56:58.840 *** SERVICE NAME:(orcl) 2018-05-04 10:56:58.840 *** MODULE NAME:(JDBC Thin Client) 2018-05-04 10:56:58.840 *** ACTION NAME:() 2018-05-04 10:56:58.840 2018-05-04 10:56:58.828: [ CSSCLNT]clssscConnect: gipcWait failed with 16 (12) 2018-05-04 10:56:58.840: [ CSSCLNT]clsssInitNative: connect to (ADDRESS=(PROTOCOL=ipc)(KEY=OCSSD_LL_scdb02_)) failed, rc 16 kgxgncin: CLSS init failed with status 3 kgxgncin: return status 3 (1311719766 SKGXN not av) from CLSS kjfmsgr: unable to connect to NM for reg in shared group ORA-01114: 将块写入文件 时出现 IO 错误 (块 # ) Dump of memory from 0x070001209CBA0328 to 0x070001209CBA0D3B 70001209CBA0320 57495448 20544F44 [WITH TOD]
3. ocssd.log
-- 检查/oracle/app/11.2.0/grid/log/rac1/cssd/ocssd.log 文件 2018-05-04 10:56:59.495: [ CSSD][1029]clssgmQueueShare: (11ba99f10) target global grock DBORCL member 1 type 1 queued from client (1176496b0), global grock DBORCL, refcount 757 2018-05-04 10:56:59.495: [ CSSD][1029]clssgmRegisterShared: global grock DBORCL member 1 share type 1, refcount 757 2018-05-04 10:56:59.743: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11ba653d0, ret gipcretAuthFail (22) 2018-05-04 10:56:59.743: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretAuthFail (22) ] error during accept on endp 1112a2970 2018-05-04 10:56:59.744: [GIPCXCPT][1029] gipcmodClscCallback: async request failed req 1172b0bf0 [00000000e3b63bc0] { gipcSendRequest : addr '', data 11727c490, len 48, olen 0, parentEndp 11abbcef 0, ret gipcretConnectionLost (12), objFlags 0x0, reqFlags 0x224 }, ret gipcretConnectionLost (12) 2018-05-04 10:56:59.745: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11abbcef0, ret gipcretConnectionInvalid (13) 2018-05-04 10:56:59.745: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretConnectionInvalid (13) ] error during accept on endp 1112a2970 2018-05-04 10:56:59.804: [ CSSD][1029]clssscSelect: cookie accept request 11ad57f10 2018-05-04 10:56:59.804: [ CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ad57f10 2018-05-04 10:56:59.804: [ CSSD][1029]clssgmRegisterClient: proc(7589/11ad57f10), client(2/1174aaa90) 2018-05-04 10:56:59.804: [ CSSD][1029]clssscSelect: cookie accept request 11ba74630 2018-05-04 10:56:59.804: [ CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ba74630 2018-05-04 10:56:59.804: [ CSSD][1029]clssgmRegisterClient: proc(7591/11ba74630), client(1/117497510) 2018-05-04 10:56:59.931: [ CSSD][1029]clssgmRegisterShared: grp DG_LOCAL_DATA, mbr 0, type 1 2018-05-04 10:56:59.931: [ CSSD][1029]clssgmQueueShare: (11a93a690) target local grock DG_LOCAL_DATA member 0 type 1 queued from client (1174aaa90), local grock DG_LOCAL_DATA, refcount 721 2018-05-04 10:56:59.931: [ CSSD][1029]clssgmRegisterShared: local grock DG_LOCAL_DATA member 0 share type 1, refcount 721 2018-05-04 10:56:59.932: [ CSSD][1029]clssgmRegisterShared: grp DBORCL, mbr 1, type 1 2018-05-04 10:56:59.932: [ CSSD][1029]clssgmQueueShare: (11a93ab70) target global grock DBORCL member 1 type 1 queued from client (117497510), global grock DBORCL, refcount 758 2018-05-04 10:56:59.932: [ CSSD][1029]clssgmRegisterShared: global grock DBORCL member 1 share type 1, refcount 758 2018-05-04 10:57:00.194: [GIPCXCPT][1029] gipcmodClscCallback: async request failed req 11730eff0 [00000000e3b63c64] { gipcSendRequest : addr '', data 1172fce90, len 48, olen 0, parentEndp 11abbcef 0, ret gipcretConnectionLost (12), objFlags 0x0, reqFlags 0x224 }, ret gipcretConnectionLost (12) 2018-05-04 10:57:00.195: [GIPCXCPT][1029] gipcmodMuxTransferAccept: internal accept request failed endp 1112a2970, child 11abbcef0, ret gipcretConnectionInvalid (13) 2018-05-04 10:57:00.195: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretConnectionInvalid (13) ] error during accept on endp 1112a2970 2018-05-04 10:57:00.254: [ CSSD][1029]clssscSelect: cookie accept request 11ba4a590 2018-05-04 10:57:00.254: [ CSSD][1029]clssscevtypSHRCON: getting client with cmproc 11ba4a590 2018-05-04 10:57:00.254: [ CSSD][1029]clssgmRegisterClient: proc(7590/11ba4a590), client(2/11764d8f0) 2018-05-04 10:57:00.254: [ CSSD][1029]clssscSelect: cookie accept request 1109c2e00 2018-05-04 10:57:00.254: [ CSSD][1029]clssgmAllocProc: (11bac8dd0) allocated
4. 检查CRS_home空间及文件
目录空间足够。 ls -ld /var/tmp/.oracle drwxrwxrwt 2 root oinstall 256 Nov 23 2014 /var/tmp/.oracle ls -ld /tmp/.oracle drwxrwxrwt 2 root oinstall 4096 Jan 23 01:43 /tmp/.oracle
5. 数据库此刻出现活动回话剧增,459f3z9u4fb3u语句查询字典视图出现(cursor: pin S wait on X)等待事件,且sga频繁收缩和扩展
SHRINK |IMMEDIATE |db_cache_size | 93696| 93184| 93184|COMPLETE |05/03 16:44 | 1 SHRINK |IMMEDIATE |db_cache_size | 93696| 93184| 93184|COMPLETE |05/03 16:44 | 2 SHRINK |IMMEDIATE |db_cache_size | 93696| 93184| 93184|COMPLETE |05/03 16:44 | 2 GROW |IMMEDIATE |shared_pool_size | 32768| 33280| 33280|COMPLETE |05/03 16:44 | 3 GROW |IMMEDIATE |shared_pool_size | 32768| 33280| 33280|COMPLETE |05/03 16:44 | 3 GROW |IMMEDIATE |shared_pool_size | 32768| 33280| 33280|COMPLETE |05/03 16:44 | 2 SHRINK |IMMEDIATE |db_cache_size | 93184| 92672| 92672|COMPLETE |05/03 16:44 | 2 SHRINK |IMMEDIATE |db_cache_size | 93184| 92672| 92672|COMPLETE |05/03 16:44 | 3 SHRINK |IMMEDIATE |db_cache_size | 93184| 92672| 92672|COMPLETE |05/03 16:44 | 3 SHRINK |IMMEDIATE |db_cache_size | 92672| 92160| 92160|COMPLETE |05/03 16:45 | 3 GROW |IMMEDIATE |shared_pool_size | 33280| 33792| 33792|COMPLETE |05/03 16:45 | 3 GROW |DEFERRED |db_cache_size | 92160| 92672| 92672|COMPLETE |05/03 16:55 | 1 SHRINK |DEFERRED |shared_pool_size | 33792| 33280| 33280|COMPLETE |05/03 16:55 | 1 SHRINK |DEFERRED |shared_pool_size | 33280| 32768| 32768|COMPLETE |05/04 09:53 | 0 GROW |DEFERRED |db_cache_size | 92672| 93184| 93184|COMPLETE |05/04 09:53 | 0 GROW |DEFERRED |db_cache_size | 93184| 93696| 93696|COMPLETE |05/04 10:02 | 88 SHRINK |DEFERRED |shared_pool_size | 32768| 32256| 32256|COMPLETE |05/04 10:02 | 88 GROW |DEFERRED |db_cache_size | 93696| 94208| 94208|COMPLETE |05/04 10:53 | 104 SHRINK |DEFERRED |shared_pool_size | 32256| 31744| 31744|COMPLETE |05/04 10:53 | 104 SHRINK |IMMEDIATE |db_cache_size | 94208| 93696| 93696|COMPLETE |05/04 10:54 | 1 GROW |IMMEDIATE |shared_pool_size | 31744| 32256| 32256|COMPLETE |05/04 10:54 | 1 GROW |IMMEDIATE |shared_pool_size | 32256| 32768| 32768|COMPLETE |05/04 10:54 | 7 SHRINK |IMMEDIATE |db_cache_size | 93696| 93184| 93184|COMPLETE |05/04 10:54 | 6 GROW |IMMEDIATE |shared_pool_size | 32256| 32768| 32768|COMPLETE |05/04 10:54 | 6 SHRINK |IMMEDIATE |db_cache_size | 93696| 93184| 93184|COMPLETE |05/04 10:54 | 7 GROW |IMMEDIATE |shared_pool_size | 32768| 33280| 33280|COMPLETE |05/04 10:55 | 1 SHRINK |IMMEDIATE |db_cache_size | 93184| 92672| 92672|COMPLETE |05/04 10:55 | 1 SHRINK |IMMEDIATE |db_cache_size | 92672| 92160| 92160|COMPLETE |05/04 10:55 | 4 SHRINK |IMMEDIATE |db_cache_size | 92672| 92160| 92160|COMPLETE |05/04 10:55 | 1 GROW |IMMEDIATE |shared_pool_size | 33280| 33792| 33792|COMPLETE |05/04 10:55 | 4 GROW |IMMEDIATE |shared_pool_size | 33280| 33792| 33792|COMPLETE |05/04 10:55 | 1 SHRINK |DEFERRED |shared_pool_size | 33792| 33280| 33280|COMPLETE |05/04 11:09 | 85 GROW |DEFERRED |db_cache_size | 92160| 92672| 92672|COMPLETE |05/04 11:09 | 85
Cause 3. ocssd log has "gipcretAuthFail (22)" (文档 ID 1496329.1)
Example: 2012-09-08 05:26:31.168: [ GIPCMUX][1029] gipcmodMuxTransferAccept: EXCEPTION[ ret gipcretAuthFail (22) ] error during accept on endp 111249b70 gipcretAuthFail (22) indicates "general security authorization failure". This could occur for multiple reasons: * if filesystem is full and there is no space to create file under auth directory. Please check if there is sufficient space in CRS_HOME. * Also this issue could occur if /var/tmp/.oracle socket is deleted (/tmp/.oracle on some platforms) . Please check on this too.
核查结果与【Cause 3. ocssd log has "gipcretAuthFail (22)" (文档 ID 1496329.1)】情况一致,但我们数据库软件目录空间足够且.oracle文件存在。
问题分析总结:ORA-01114告警是由于数据库SGA出现抖动引起数据库出现性能问题导致。
处理建议:增加SGA大小132G扩大到180G(v$sga_target_advice建议值)