一.环境
windows
oracle 11.2.0.4 RAC
二.问题现象
1.连接数据库后,无法查询
2.报错信息:ORA-00600: internal error code, arguments: [600], [ORA-00600: internal error code, arguments: [kgl-no-mutex-held], [0x1243958F20], [kglobf0], [0xF12D8D0B0], [], [], [], [], [], [], [], []
三.问题定位
通过alert日志 找到发生问题的最初时间点。信息如下:
ORA-1688: unable to extend table SYS.WRH$_ACTIVE_SESSION_HISTORY partition WRH$_ACTIVE_2596570560_0 by 128 in tablespace SYSAUX
ORA-1688: unable to extend table SYS.WRH$_ACTIVE_SESSION_HISTORY partition WRH$_ACTIVE_2596570560_0 by 8192 in tablespace SYSAUX
Fri Nov 17 15:17:56 2017
Errors in file D:APPADMINISTRATORdiag
dbmsoradboradb1 raceoradb1_ora_23236.trc (incident=1391697):
ORA-00600: 内部错误代码, 参数: [kghfrmrg:nxt], [0xF12C5F000], [], [], [], [], [], [], [], [], [], []
Incident details in: D:APPADMINISTRATORdiag
dbmsoradboradb1incidentincdir_1391697oradb1_ora_23236_i1391697.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Nov 17 15:18:39 2017
Dumping diagnostic data in directory=[cdmp_20171117151839], requested by (instance=1, osid=23236), summary=[incident=1391697].
Fri Nov 17 15:18:40 2017
Sweep [inc][1391697]: completed
Sweep [inc2][1391697]: completed
Fri Nov 17 15:19:55 2017
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x0] [PC:0x140D39A89, kxsGetRuntimeLock()+259]
Errors in file D:APPADMINISTRATORdiag
dbmsoradboradb1 raceoradb1_ora_20684.trc (incident=1391761):
ORA-07445: 出现异常错误: 核心转储 [kxsGetRuntimeLock()+259] [ACCESS_VIOLATION] [ADDR:0x0] [PC:0x140D39A89] [UNABLE_TO_READ] []
Incident details in: D:APPADMINISTRATORdiag
dbmsoradboradb1incidentincdir_1391761oradb1_ora_20684_i1391761.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Nov 17 15:19:57 2017
Dumping diagnostic data in directory=[cdmp_20171117151957], requested by (instance=1, osid=20684), summary=[incident=1391761].
Fri Nov 17 15:19:59 2017
Sweep [inc][1391761]: completed
Sweep [inc2][1391761]: completed
Fri Nov 17 15:20:58 2017
SMON: Parallel transaction recovery tried
Fri Nov 17 15:28:39 2017
Exception [type: ACCESS_VIOLATION, UNABLE_TO_READ] [ADDR:0x68] [PC:0xD698A6, kghalo()+40]
Errors in file D:APPADMINISTRATORdiag
dbmsoradboradb1 raceoradb1_ora_14992.trc (incident=1391769):
ORA-07445: 出现异常错误: 核心转储 [kghalo()+40] [ACCESS_VIOLATION] [ADDR:0x68] [PC:0xD698A6] [UNABLE_TO_READ] []
Incident details in: D:APPADMINISTRATORdiag
dbmsoradboradb1incidentincdir_1391769oradb1_ora_14992_i1391769.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Nov 17 15:28:41 2017
Dumping diagnostic data in directory=[cdmp_20171117152841], requested by (instance=1, osid=14992), summary=[incident=1391769].
Fri Nov 17 15:28:42 2017
Sweep [inc][1391769]: completed
Sweep [inc2][1391769]: completed
Fri Nov 17 15:29:21 2017
SMON: Parallel transaction recovery tried
Fri Nov 17 15:29:43 2017
ORA-1652: unable to extend temp segment by 128 in tablespace TEMP
Fri Nov 17 15:37:00 2017
ORA-1652: unable to extend temp segment by 128 in tablespace TEMP
Fri Nov 17 15:44:02 2017
ORA-1652: unable to extend temp segment by 128 in tablespace TEMP
Fri Nov 17 16:16:41 2017
Errors in file D:APPADMINISTRATORdiag
dbmsoradboradb1 raceoradb1_ora_23712.trc (incident=1391841):
ORA-00600: 内部错误代码, 参数: [17112], [0xF12DA8F80], [], [], [], [], [], [], [], [], [], []
Incident details in: D:APPADMINISTRATORdiag
dbmsoradboradb1incidentincdir_1391841oradb1_ora_23712_i1391841.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Fri Nov 17 16:16:44 2017
Dumping diagnostic data in directory=[cdmp_20171117161644], requested by (instance=1, osid=23712), summary=[incident=1391841].
Fri Nov 17 16:16:46 2017
Sweep [inc][1391841]: completed
Sweep [inc2][1391841]: completed
Fri Nov 17 16:26:45 2017
Errors in file D:APPADMINISTRATORdiag
dbmsoradboradb1 raceoradb1_ora_24872.trc (incident=1391785):
ORA-00600: 内部错误代码, 参数: [kgl-no-mutex-held], [0x1243958F20], [kglobf0], [0xF12D8D0B0], [], [], [], [], [], [], [], []
Incident details in: D:APPADMINISTRATORdiag
dbmsoradboradb1incidentincdir_1391785oradb1_ora_24872_i1391785.trc
Fri Nov 17 16:26:46 2017
猜测,可能是系统表空间不足,触发的bug。
四.解决思路
1.清理 SYS.WRH$_ACTIVE_SESSION_HISTORY
2.向系统表空间增加文件
在数据库当前状态,以上两个方案均无法实施。
3.尝试重启数据库实例,集群服务不动。启动过程中oracle做了大量的清理工作:
Sweep [inc][1401010]: completed
Sweep [inc][1401010]: completed
Sweep [inc][1401009]: completed
Sweep [inc][1401003]: completed
Sweep [inc][1401001]: completed
Sweep [inc][1400987]: completed
Sweep [inc][1400977]: completed
Sweep [inc][1400974]: completed
Sweep [inc][1400973]: completed
猜测是在清理表空间
4.再次连接测试,数据库恢复正常。重启后系统表空间SYSAUX释放了近20G的空间。